Userspace RCU Library: What Linear Multiprocessor Scalability Means for Your        Application   Linux Plumbers Conferenc...
> Mathieu Desnoyers●   Author/maintainer of :    –   LTTV (Linux Trace Toolkit Viewer)         ●   2003-...    –   LTTng (...
> Contributions by●   Paul E. McKenney    –   IBM Linux Technology Center●   Alan Stern    –   Rowland Institute, Harvard ...
> Summary●   RCU Overview●   Kernel vs Userspace RCU●   Userspace RCU Library●   Benchmarks●   RCU-Friendly Applications4
> Linux Kernel RCU Usage5
> RCU Overview●   Relativistic programming    –   Updates seen in different orders by CPUs    –   Tolerates conflicts●   L...
> Schematic of RCU Update          and Read-Side C.S.7
> RCU Linked-List Deletion8
> Kernel vs Userspace RCU●   Quiescent state    –   Kernel threads         ●   Wait for kernel pre-existing RCU read-side ...
> Userspace RCU Library●    QSBR     –   liburcu-qsbr.so●    Generic RCU     –   liburcu-mb.so●    Signal-based RCU     – ...
> QSBR●    Detection of quiescent state:     –   Each reader thread calls         rcu_quiescent_state() periodically.●    ...
> Generic RCU●    Detection of quiescent state:     –   rcu_read_lock()/rcu_read_unlock() mark the         beginning/end o...
> Signal-based RCU●    Same quiescent state detection as     Generic RCU●    Suitable for library use, but reserves a     ...
> call_rcu()●    Eliminates the need to call     synchronize_rcu() after each removal●    Queues RCU callbacks for deferre...
> Example: RCU Read-Side     struct mystruct *rcudata = &somedata;     /* register thread with rcu_register_thread()/rcu_u...
> Example: exchange pointer     struct mystruct *rcudata = &somedata;     void replace_data(struct mystruct data)     {   ...
> Example:compare-and-exchange pointer     struct mystruct *rcudata = &somedata;     /* register thread with rcu_register_...
> Benchmarks●    Read-side Scalability●    Read-side C.S. length impact●    Update Overhead18
> Read-Side Scalability64-cores POWER5+19
> Read-Side C.S. Length Impact64-cores POWER5+, logarithmic scale (x, y)20
> Update Overhead64-cores POWER5+, logarithmic scale (x, y)21
> RCU-Friendly Applications●    Multithreaded applications with read-     often shared data     –   Cache          ●   Nam...
> RCU-Friendly Applications●    Libraries supporting multithreaded     applications     –   Tracing library, e.g. lib UST ...
> RCU-Friendly Applications●    Libraries supporting multithreaded     applications (cont.)     –   Typing/data structure ...
> RCU-Friendly Applications●    Routing tables in userspace●    Userspace network stacks●    Userspace signal-handling    ...
> Info / Download / Contact●    Mathieu Desnoyers     –   Computer and Software Engineering Dpt.,         École Polytechni...
Upcoming SlideShare
Loading in …5
×

Userspace RCU library : what linear multiprocessor scalability means for your application

2,397 views
2,293 views

Published on

RCU is well-known at the kernel-level for providing a way to synchronize shared data structures in read-often, update-rarely scenarios.

The development of a RCU library at the userspace application level has been mainly driven by the need for efficient synchronization of userspace tracing control data structures.

IBM kindly agreed to allow distribution of RCU-related code in a LGPL library, which makes it available for everyone to use. This can have large impact on the design of highly scalable applications performing caching of frequent requests, like domain name servers, proxy and web servers.

This presentation will discuss about the class of applications which could benefit from using the userspace RCU library.

The userspace RCU library is available under the LGPL license at http://www.lttng.org/urcu .

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,397
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Userspace RCU library : what linear multiprocessor scalability means for your application

  1. 1. Userspace RCU Library: What Linear Multiprocessor Scalability Means for Your Application Linux Plumbers Conference 2009 Mathieu Desnoyers École Polytechnique de Montréal
  2. 2. > Mathieu Desnoyers● Author/maintainer of : – LTTV (Linux Trace Toolkit Viewer) ● 2003-... – LTTng (Linux Trace Toolkit Next Generation) ● 2005-... – Immediate Values ● 2007... – Tracepoints ● 2008-... – Userspace RCU Library2 ● 2009-...
  3. 3. > Contributions by● Paul E. McKenney – IBM Linux Technology Center● Alan Stern – Rowland Institute, Harvard University● Jonathan Walpole – Computer Science Department, Portland State University● Michel Dagenais – Computer and Software Engineering Dpt.,3 École Polytechnique de Montréal
  4. 4. > Summary● RCU Overview● Kernel vs Userspace RCU● Userspace RCU Library● Benchmarks● RCU-Friendly Applications4
  5. 5. > Linux Kernel RCU Usage5
  6. 6. > RCU Overview● Relativistic programming – Updates seen in different orders by CPUs – Tolerates conflicts● Linear scalability● Wait-free read-side● Efficient updates – Only a single pointer exchange needs exclusive access6
  7. 7. > Schematic of RCU Update and Read-Side C.S.7
  8. 8. > RCU Linked-List Deletion8
  9. 9. > Kernel vs Userspace RCU● Quiescent state – Kernel threads ● Wait for kernel pre-existing RCU read-side C.S. to complete – User threads ● Wait for process pre-existing RCU read-side C.S. to complete9
  10. 10. > Userspace RCU Library● QSBR – liburcu-qsbr.so● Generic RCU – liburcu-mb.so● Signal-based RCU – liburcu.so● call_rcu() – liburcu-defer.so10
  11. 11. > QSBR● Detection of quiescent state: – Each reader thread calls rcu_quiescent_state() periodically.● Require application modification● Read-side with very low overhead11
  12. 12. > Generic RCU● Detection of quiescent state: – rcu_read_lock()/rcu_read_unlock() mark the beginning/end of the critical sections – Counts nesting level● Suitable for library use● Higher read-side overhead than QSBR due to added memory barriers12
  13. 13. > Signal-based RCU● Same quiescent state detection as Generic RCU● Suitable for library use, but reserves a signal● Read-side close to QSBR performance – Remove memory barriers from rcu_read_lock()/rcu_read_unlock(). – Replaced by memory barriers in signal handler, executed at each update-side memory barrier.13
  14. 14. > call_rcu()● Eliminates the need to call synchronize_rcu() after each removal● Queues RCU callbacks for deferred batched execution● Wait-free unless per-thread queue is full● “Worker thread” executes callbacks periodically● Energy-efficient, uses sys_futex()14
  15. 15. > Example: RCU Read-Side struct mystruct *rcudata = &somedata; /* register thread with rcu_register_thread()/rcu_unregister_thread() */ void fct(void) { struct mystruct *ptr; rcu_read_lock(); ptr = rcu_dereference(rcudata); /* use ptr */ rcu_read_unlock(); }15
  16. 16. > Example: exchange pointer struct mystruct *rcudata = &somedata; void replace_data(struct mystruct data) { struct mystruct *new, *old; new = malloc(sizeof(*new)); memcpy(new, &data, sizeof(*new)); old = rcu_xchg_pointer(&rcudata, new); call_rcu(free, old); }16
  17. 17. > Example:compare-and-exchange pointer struct mystruct *rcudata = &somedata; /* register thread with rcu_register_thread()/rcu_unregister_thread() */ void modify_data(int increment_a, int increment_b) { struct mystruct *new, *old; new = malloc(sizeof(*new)); rcu_read_lock(); /* Ensure pointer is not re-used */ do { old = rcu_dereference(rcudata); memcpy(new, old, sizeof(*new)); new->field_a += increment_a; new->field_b += increment_b; } while (rcu_cmpxchg_pointer(&rcudata, old, new) != old); rcu_read_unlock(); call_rcu(free, old); }17
  18. 18. > Benchmarks● Read-side Scalability● Read-side C.S. length impact● Update Overhead18
  19. 19. > Read-Side Scalability64-cores POWER5+19
  20. 20. > Read-Side C.S. Length Impact64-cores POWER5+, logarithmic scale (x, y)20
  21. 21. > Update Overhead64-cores POWER5+, logarithmic scale (x, y)21
  22. 22. > RCU-Friendly Applications● Multithreaded applications with read- often shared data – Cache ● Name servers ● Proxy ● Web servers with static pages – Configuration ● Low synchronization overhead ● Dynamically modified without restart22
  23. 23. > RCU-Friendly Applications● Libraries supporting multithreaded applications – Tracing library, e.g. lib UST (LTTng port for userspace tracing) ● http://git.dorsal.polymtl.ca/?p=ust.git23
  24. 24. > RCU-Friendly Applications● Libraries supporting multithreaded applications (cont.) – Typing/data structure support ● Typing system – Creation of a class is a rare event – Reading class structure happens at object creation/destruction (_very_ often) – Applies to gobject ● Used by: gtk/gdk/glib/gstreamer... ● Efficient hash tables ● Glib “quarks”24
  25. 25. > RCU-Friendly Applications● Routing tables in userspace● Userspace network stacks● Userspace signal-handling – Signal-safe read-side – Could implement an inter-thread signal multiplexer● Your own ?25
  26. 26. > Info / Download / Contact● Mathieu Desnoyers – Computer and Software Engineering Dpt., École Polytechnique de Montréal● Web site: – http://www.lttng.org/urcu● Git tree – git://lttng.org/userspace-rcu.git● Email – mathieu.desnoyers@polymtl.ca26

×