20110828 rcu


Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

20110828 rcu

  1. 1. R(ead) C(opy) U(pdate)‏ [email_address]
  2. 2. Agenda <ul><li>What is RCU? Why? </li></ul><ul><li>RCU Primitives </li></ul><ul><li>RCU List Operations </li></ul><ul><li>Sleepable RCU </li></ul><ul><li>User Level RCU </li></ul><ul><li>Q&A </li></ul>
  3. 3. What is RCU? <ul><li>Read-copy-update </li></ul><ul><li>An alternative of rwlock </li></ul><ul><li>Allow low over-head wait-free read </li></ul><ul><li>Update can be expensive: need to maintain old copies if in use </li></ul>
  4. 4. Why RCU? <ul><li>W/o lock, this is broken due to compiler optimization and CPU out-of-order exec </li></ul><ul><li>1 struct foo { </li></ul><ul><li>2 int a; </li></ul><ul><li>3 int b; </li></ul><ul><li>4 int c; </li></ul><ul><li>5 }; </li></ul><ul><li>6 struct foo *gp = NULL; </li></ul><ul><li>7 </li></ul><ul><li>8 /* . . . */ </li></ul><ul><li>9 </li></ul><ul><li>10 p = kmalloc(sizeof(*p), GFP_KERNEL); </li></ul><ul><li>11 p->a = 1; </li></ul><ul><li>12 p->b = 2; </li></ul><ul><li>13 p->c = 3; </li></ul><ul><li>14 gp = p; </li></ul>
  5. 5. Why RCU? <ul><li>Mutex, no concurrent readers </li></ul><ul><li>Spin_lock, ditto </li></ul><ul><li>Rwlock, allow concurrent readers. The right choice? </li></ul>
  6. 6. Why RCU? <ul><li>rwlock is expensive </li></ul><ul><li>Even read_lock has more overhead than spin_lock </li></ul><ul><li>If write_lock is not really rare, rwlock contention is much worse than spin_lock contension </li></ul>
  7. 7. RCU Basis <ul><li>Split update into removal and reclamation phases </li></ul><ul><li>Removal is performed immediately, while reclamation is deferred until all readers active during the removal phase have completed </li></ul><ul><li>Takes advantage of the fact that writes to single aligned pointers are atomic on modern CPUs </li></ul>
  8. 8. RCU Terminology <ul><li>read-side critical sections: code delimited by rcu_read_lock() and rcu_read_unlock(), MUST NOT sleep. </li></ul><ul><li>quiescent state: any code not within an RCU read-side critical section </li></ul><ul><li>grace period: any time period during which each thread resides at least one quiescent state </li></ul>
  9. 9. RCU Terminology <ul><li>More on grace period: after a full grace period, all pre-existing RCU read-side critical sections are completed. </li></ul>
  10. 10. RCU Update Sequence <ul><li>Remove pointers to a data structure, so that subsequent readers cannot gain a reference to it </li></ul><ul><li>Wait for all previous readers to complete their RCU read-side critical sections (AKA, a grace period passes)‏ </li></ul><ul><li>At this point, there cannot be any readers who hold references to the data structure, so it now may safely be reclaimed (e.g., in another thread)‏ </li></ul>
  11. 11. When Grace Period Passes? <ul><li>RCU readers are not permitted to block, switch to user-mode execution, or enter the idle loop. </li></ul><ul><li>As soon as a CPU is seen passing through any of these three states, we know that that CPU has exited any previous RCU read-side critical sections. </li></ul><ul><li>If we remove an item from a linked list, and then wait until all CPUs have switched context, executed in user mode, or executed in the idle loop, we can safely free up that item. </li></ul>
  12. 12. Core RCU APIs <ul><li>rcu_read_lock()‏ </li></ul><ul><li>rcu_read_unlock()‏ </li></ul><ul><li>synchronize_rcu()/call_rcu()‏ </li></ul><ul><li>rcu_assign_pointer()‏ </li></ul><ul><li>rcu_dereference()‏ </li></ul>
  13. 13. Wait for Readers <ul><li>synchronize_rcu(): waits only for all ongoing RCU read-side critical sections to complete </li></ul><ul><li>call_rcu(): registers a function and argument which are invoked after all ongoing RCU read-side critical sections have completed </li></ul>
  14. 14. Assign & Retrieve <ul><li>rcu_assign_pointer(): assign a new value to an RCU-protected pointer </li></ul><ul><li>rcu_dereference(): fetch an RCU-protected pointer, which is safe to use until rcu_read_unlock()‏ </li></ul>
  15. 15. RCU List Insert <ul><li>list_add_rcu() </li></ul><ul><li>list_add_tail_rcu() </li></ul><ul><li>list_replace_rcu() </li></ul><ul><li>Must be protected by some locks. </li></ul>
  16. 16. Sample Code <ul><li>1 struct foo { </li></ul><ul><li>2 struct list_node *list; </li></ul><ul><li>3 int a; </li></ul><ul><li>4 int b; </li></ul><ul><li>5 int c; </li></ul><ul><li>6 }; </li></ul><ul><li>7 LIST_HEAD(head); </li></ul><ul><li>8 </li></ul><ul><li>9 /* . . . */ </li></ul><ul><li>10 p = kmalloc(sizeof(*p), GFP_KERNEL); </li></ul><ul><li>11 p->a = 1; </li></ul><ul><li>12 p->b = 2; </li></ul><ul><li>13 p->c = 3; </li></ul><ul><li>14 spin_lock(&list_lock); </li></ul><ul><li>15 list_add_head_rcu(&p->list, &head); </li></ul><ul><li>16 spin_unlock(&list_lock); </li></ul>
  17. 17. RCU List Transversal <ul><li>list_for_each_entry_rcu()‏ </li></ul><ul><li>rcu_read_lock() and rcu_read_unlock() must be called, but they never spin or block </li></ul><ul><li>Allows list_add_rcu() execute concurrently </li></ul>
  18. 18. RCU List Removal <ul><li>list_del_rcu() removes element from list. Must be protected by some lock </li></ul><ul><li>But when to free it? </li></ul><ul><li>synchronize_rcu() blocks until all read-side critical sections that begin before synchronize_rcu() is completed </li></ul><ul><li>call_rcu() runs after all read-side critical sections that begin before call_rcu() is completed. </li></ul>
  19. 19. Sample Code <ul><li>spin_lock(&mylock); </li></ul><ul><li>p = search(head, key); </li></ul><ul><li>if (p == NULL)‏ </li></ul><ul><li>spin_unlock(&mylock); </li></ul><ul><li>else { </li></ul><ul><li>list_del_rcu(&p->list); </li></ul><ul><li>spin_unlock(&mylock); </li></ul><ul><li>synchronize_rcu(); </li></ul><ul><li>kfree(p); </li></ul><ul><li>} </li></ul>
  20. 20. Sleepable RCU <ul><li>Why? </li></ul><ul><ul><li>the realtime kernels that require spinlock critical sections be preemptible also require that RCU read-side critical sections be preemptible </li></ul></ul>
  21. 21. SRCU Implementation Strategy <ul><li>prevent any given task sleeping in an RCU read-side critical section from getting an unbounded number of RCU callbacks </li></ul><ul><ul><li>refusing to provide asynchronous grace-period interfaces, such as the Classic RCU's call_rcu() API </li></ul></ul><ul><ul><li>isolating grace-period detection within each subsystem using SRCU </li></ul></ul>
  22. 22. SRCU Grace Period? <ul><li>grace periods are detected by counting per-CPU counters. </li></ul><ul><ul><li>readers manipulate CPU-local counters. </li></ul></ul><ul><ul><li>Two sets of per-CPU counters to do read-copy-update </li></ul></ul>
  23. 23. SRCU Data Structure <ul><li>struct srcu_struct { </li></ul><ul><li>int completed; </li></ul><ul><li>struct srcu_struct_array __percpu *per_cpu_ref; </li></ul><ul><li>struct mutex mutex; </li></ul><ul><li>}; </li></ul><ul><li>struct srcu_struct_array { </li></ul><ul><li>int c[2]; </li></ul><ul><li>}; </li></ul>
  24. 24. Wait for Grace Period <ul><li>synchronize_srcu()‏ </li></ul><ul><ul><li>Flip the completed counter. So new readers will be using the other set of per-CPU counters. </li></ul></ul><ul><ul><li>Wait for the old count to drain to zero. </li></ul></ul>
  25. 25. SRCU APIs <ul><li>int init_srcu_struct(struct srcu_struct *sp); </li></ul><ul><li>void cleanup_srcu_struct(struct srcu_struct *sp); </li></ul><ul><li>int srcu_read_lock(struct srcu_struct *sp) __acquires(sp); </li></ul><ul><li>void srcu_read_unlock(struct srcu_struct *sp, int idx); </li></ul><ul><li>void synchronize_srcu(struct srcu_struct *sp); </li></ul><ul><li>void synchronize_srcu_expedited(struct srcu_struct *sp); </li></ul><ul><li>long srcu_batches_completed(struct srcu_struct *sp); </li></ul>
  26. 26. Userspace RCU <ul><li>Available on http://lttng.org/urcu </li></ul><ul><li>git clone git://git.lttng.org/userspace-rcu.git </li></ul><ul><li>Debian: aptitude install liburcu-dev </li></ul><ul><li>Examples </li></ul>
  27. 27. Q & A