SlideShare a Scribd company logo
2
Glusterd Thread Synchronization
using user space RCU
Atin Mukherjee
SSE-Red Hat
Gluster Maintainer
IRC : atinm on freenode
Twitter: @mukherjee_atin
3
Agenda
● Introduction to GlusterD
● Big lock in thread synchronization in GlusterD
● Issues with Big Lock approach
● Different locking primitives
● What is RCU
● Advantage of RCU over read-write lock
● RCU mechanisms – Insertion, Deletion, Reader
● URCU flavors
● URCU APIs
● URCU use cases
● Q&A
4
What is GlusterD
● Manages the cluster configuration for Gluster
● Responsible for
– Peer membership management
– Elastic volume management
– Configuration consistency
– Distributed command execution (orchestration)
– Service management (manages GlusterFS
daemons)
5
Thread synchronization in GlusterD
● GlusterD was initially designed as single threaded
● Single threaded → Multi threaded to satisfy usecases
like snapshot
● Big lock
– A coarse grained lock
– Only one transaction can work inside big lock
– Protects all the shared data structures
6
Issues with Big Lock
● Threads contend for even unrelated data
● Can end up in a deadlock
– RPC request's callback also needs big lock
● Shall we release big lock in between a transaction to
get rid of above deadlock? Yes we do, but….
● Here come's the problem - a small window of time
when the shared data structures are prone to updates
leading to inconsistencies
7
Different locking primitives
● Fine grained locks
– Mutex
– Read-write lock
– Spin lock
– Seq lock
– Read-Copy-Update (RCU)
8
What is RCU
● Synchronization mechanism
● Not new, added to Linux Kernel in 2002
● Allows reads to occur concurrently with update
● Maintains multiple version of objects for read
coherency
● Almost zero over heads in read side critical
section
9
Advantages of RCU over read-write
lock
● Concurrent readers & writers – writer writes, readers read
● Wait free reads
– RCU readers have no wait overhead. They can never be blocked by writers
● Existence guarantee
– RCU guarantees that RCU protected data in a readers critical section will remain
in existence till the end of the critical section
● Deadlock immunity
– RCU readers always run in a deterministic time as they never block. This means
that they can never become a part of a deadlock.
● No writer starvation
– As RCU readers don't block, writers can never starve.
10
RCU mechanism
● RCU is made up of three fundamental mechanisms
– Publish-Subscribe Mechanism (for insertion)
– Wait For Pre-Existing RCU Readers to Complete (for
deletion)
– Maintain Multiple Versions of Recently Updated Objects
(for readers)
11
Publish-Subscribe model
● rcu_assign_pointer () for publication
1 struct foo {
2 int a;
3 int b;
4 int c;
5 };
6 struct foo *gp = NULL;
7
8 /* . . . */
9
10 p = malloc (...);
11 p->a = 1;
12 p->b = 2;
13 p->c = 3;
14 gp = p;
1 struct foo {
2 int a;
3 int b;
4 int c;
5 };
6 struct foo *gp = NULL;
7
8 /* . . . */
9
10 p = malloc (...);
11 p->a = 1;
12 p->b = 2;
13 p->c = 3;
14 rcu_assign_pointer(gp, p);
● rcu_dereference () for subscription
1 p = gp;
2 if (p != NULL) {
3 do_something_with(p->a, p->b, p->c);
4 }
1 rcu_read_lock();
2 p = rcu_dereference(gp);
3 if (p != NULL) {
4 do_something_with(p->a, p->b, p->c);
5 }
6 rcu_read_unlock();
12
Publish-Subscribe Model (ii)
● rcu_assign_pointer () & rcu_dereference ()
embedded in special RCU variants of Linux's
list-manipulation API
● rcu_assign_pointer () → list_add_rcu ()
● rcu_dereference () → list_for_each_entry_rcu ()
13
Wait For Pre-Existing RCU Readers to
Complete
● Approach used for deletion
● Synchronous – synchronize_rcu ()
● Asynchronous – call_rcu ()
q = malloc(...);
*q = *p;
q->b = 2;
q->c = 3;
list_replace_rcu(&p->list, &q->list);
synchronize_rcu();
free(p)
q = malloc(...);
*q = *p;
q->b = 2;
q->c = 3;
list_replace_rcu(&p->list, &q->list);
call_rcu (&p->list, cbk); /* cbk will free p */
14
Maintain multiple version objects
● Used for existence gurantee
1. p = search(head, key);
2. list_del_rcu(&p->list);
3. synchronize_rcu();
4. free (p);
1. p = search(head, key);
2. list_del_rcu(&p->list);
3. synchronize_rcu();
4. free (p);
1. p = search(head, key);
2. list_del_rcu(&p->list);
3. synchronize_rcu();
4. free (p);
Maintain multiple version objects
● Used for existence gurantee
1. p = search(head, key);
2. list_del_rcu(&p->list);
3. synchronize_rcu();
4. free (p);
1. p = search(head, key);
2. list_del_rcu(&p->list);
3. synchronize_rcu();
4. free (p);
1. p = search(head, key);
2. list_del_rcu(&p->list);
3. synchronize_rcu();
4. free (p);
15
URCU flavors
● QSBR (quiescent-state-based RCU)
– each thread must periodically invoke rcu_quiescent_state()
– Thread (un)registration required
● Memory-barrier-based RCU
– Preemptible RCU implementation
– Introduces memory barrier in read critical secion, hence high read side
overhead
● “Bullet-proof” RCU (RCU-BP)
– Similar like memory barrier based RCU but thread (un)registration is taken
care
– Primitive overheads but can be used by application without worrying about
thread creation/destruction
16
URCU flavors (ii)
● Signal-based RCU
– Removes memory barrier
– Can be used by library function
– requires that the user application give up a POSIX signal to be
used by synchronize_rcu() in place of the read-side memory
barriers.
– Requires explicit thread registration
● Signal-based RCU using an out-of-tree sys_membarrier() system call
– sys_membarrier() system call instead of POSIX signal
17
URCU APIs
● Atomic-operation and utility APIs
– caa_: Concurrent Architecture Abstraction.
– cmm_: Concurrent Memory Model.
– uatomic_: URCU Atomic Operation.
– https://lwn.net/Articles/573435/
● The URCU APIs
– https://lwn.net/Articles/573439/
● RCU-Protected Lists
– https://lwn.net/Articles/573441
18
When is URCU useful
19
References
● https://lwn.net/Articles/262464/
● https://lwn.net/Articles/263130/
● https://lwn.net/Articles/573424/
● http://www.efficios.com/pub/lpc2011/Presentation-
lpc2011-desnoyers-urcu.pdf
● http://www.rdrop.com/~paulmck/RCU/RCU.IISc-
Bangalore.2013.06.03a.pdf
● http://urcu.so/
20
References
Q&A

More Related Content

What's hot

Glusterfs session #13 replication introduction
Glusterfs session #13   replication introductionGlusterfs session #13   replication introduction
Glusterfs session #13 replication introduction
Pranith Karampuri
 
Clojure concurrency overview
Clojure concurrency overviewClojure concurrency overview
Clojure concurrency overview
Sergey Stupin
 
UDPSRC GStreamer Plugin Session VIII
UDPSRC GStreamer Plugin Session VIIIUDPSRC GStreamer Plugin Session VIII
UDPSRC GStreamer Plugin Session VIII
NEEVEE Technologies
 
Disruptor
DisruptorDisruptor
Disruptor
Larry Nung
 
Glusterfs session #10 locks xlator inodelks
Glusterfs session #10   locks xlator inodelksGlusterfs session #10   locks xlator inodelks
Glusterfs session #10 locks xlator inodelks
Pranith Karampuri
 
Fun with Network Interfaces
Fun with Network InterfacesFun with Network Interfaces
Fun with Network Interfaces
Kernel TLV
 
grsecurity and PaX
grsecurity and PaXgrsecurity and PaX
grsecurity and PaX
Kernel TLV
 
Highload осень 2012 лекция 1
Highload осень 2012 лекция 1Highload осень 2012 лекция 1
Highload осень 2012 лекция 1Technopark
 
Open Social Data (Jaca), Alejandro Rivero
Open Social Data (Jaca), Alejandro RiveroOpen Social Data (Jaca), Alejandro Rivero
Open Social Data (Jaca), Alejandro Rivero
Aragón Open Data
 
The TCP/IP stack in the FreeBSD kernel COSCUP 2014
The TCP/IP stack in the FreeBSD kernel COSCUP 2014The TCP/IP stack in the FreeBSD kernel COSCUP 2014
The TCP/IP stack in the FreeBSD kernel COSCUP 2014Kevin Lo
 
MessagePack - An efficient binary serialization format
MessagePack - An efficient binary serialization formatMessagePack - An efficient binary serialization format
MessagePack - An efficient binary serialization format
Larry Nung
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Knoldus Inc.
 
EROSについて
EROSについてEROSについて
EROSについて
stibear (stibear1996)
 
Non-DIY* Logging
Non-DIY* LoggingNon-DIY* Logging
Non-DIY* Logging
ESUG
 
Introduction to Rust
Introduction to RustIntroduction to Rust
Introduction to Rust
João Oliveira
 
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
Parallel computing in bioinformatics   t.seemann - balti bioinformatics - wed...Parallel computing in bioinformatics   t.seemann - balti bioinformatics - wed...
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
Torsten Seemann
 
Userfaultfd: Current Features, Limitations and Future Development
Userfaultfd: Current Features, Limitations and Future DevelopmentUserfaultfd: Current Features, Limitations and Future Development
Userfaultfd: Current Features, Limitations and Future Development
Kernel TLV
 

What's hot (20)

Glusterfs session #13 replication introduction
Glusterfs session #13   replication introductionGlusterfs session #13   replication introduction
Glusterfs session #13 replication introduction
 
Clojure concurrency overview
Clojure concurrency overviewClojure concurrency overview
Clojure concurrency overview
 
UDPSRC GStreamer Plugin Session VIII
UDPSRC GStreamer Plugin Session VIIIUDPSRC GStreamer Plugin Session VIII
UDPSRC GStreamer Plugin Session VIII
 
Disruptor
DisruptorDisruptor
Disruptor
 
Glusterfs session #10 locks xlator inodelks
Glusterfs session #10   locks xlator inodelksGlusterfs session #10   locks xlator inodelks
Glusterfs session #10 locks xlator inodelks
 
Fun with Network Interfaces
Fun with Network InterfacesFun with Network Interfaces
Fun with Network Interfaces
 
grsecurity and PaX
grsecurity and PaXgrsecurity and PaX
grsecurity and PaX
 
Highload осень 2012 лекция 1
Highload осень 2012 лекция 1Highload осень 2012 лекция 1
Highload осень 2012 лекция 1
 
Open Social Data (Jaca), Alejandro Rivero
Open Social Data (Jaca), Alejandro RiveroOpen Social Data (Jaca), Alejandro Rivero
Open Social Data (Jaca), Alejandro Rivero
 
The TCP/IP stack in the FreeBSD kernel COSCUP 2014
The TCP/IP stack in the FreeBSD kernel COSCUP 2014The TCP/IP stack in the FreeBSD kernel COSCUP 2014
The TCP/IP stack in the FreeBSD kernel COSCUP 2014
 
MessagePack - An efficient binary serialization format
MessagePack - An efficient binary serialization formatMessagePack - An efficient binary serialization format
MessagePack - An efficient binary serialization format
 
Cuda 2
Cuda 2Cuda 2
Cuda 2
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
EROSについて
EROSについてEROSについて
EROSについて
 
More than UI
More than UIMore than UI
More than UI
 
Non-DIY* Logging
Non-DIY* LoggingNon-DIY* Logging
Non-DIY* Logging
 
Introduction to Rust
Introduction to RustIntroduction to Rust
Introduction to Rust
 
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
Parallel computing in bioinformatics   t.seemann - balti bioinformatics - wed...Parallel computing in bioinformatics   t.seemann - balti bioinformatics - wed...
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
 
RCU
RCURCU
RCU
 
Userfaultfd: Current Features, Limitations and Future Development
Userfaultfd: Current Features, Limitations and Future DevelopmentUserfaultfd: Current Features, Limitations and Future Development
Userfaultfd: Current Features, Limitations and Future Development
 

Viewers also liked

EFG Product News 2015
EFG Product News 2015EFG Product News 2015
EFG Product News 2015geoff demarco
 
Carta eliana
Carta elianaCarta eliana
Carta elianaEliana M
 
A aicep Portugal Global | Sessão informativa 'Internacionalizar e as Empresas...
A aicep Portugal Global | Sessão informativa 'Internacionalizar e as Empresas...A aicep Portugal Global | Sessão informativa 'Internacionalizar e as Empresas...
A aicep Portugal Global | Sessão informativa 'Internacionalizar e as Empresas...
Município de Ponte de Lima
 
Estado de espirito
Estado de espiritoEstado de espirito
Estado de espiritoFer Nanda
 
Johnnie walker
Johnnie walkerJohnnie walker
Johnnie walker
santiago beltran
 
Slideshare#1
Slideshare#1Slideshare#1
Slideshare#1assiley
 
Firme fundamento
Firme fundamentoFirme fundamento
Firme fundamentoFer Nanda
 
Certificado(ensayo)
Certificado(ensayo)Certificado(ensayo)
Certificado(ensayo)Eliana M
 
Imagen 1
Imagen  1Imagen  1
Imagen 1
charlyugm
 
INSEME Séniors et numérique
INSEME Séniors et numériqueINSEME Séniors et numérique
INSEME Séniors et numériqueEric Ferrari
 
Atelier Monnaies Complémentaires - Rencontres de Babyloan 2010
Atelier Monnaies Complémentaires - Rencontres de Babyloan 2010Atelier Monnaies Complémentaires - Rencontres de Babyloan 2010
Atelier Monnaies Complémentaires - Rencontres de Babyloan 2010
Baby Loan
 
PancreasCenterNews_spring2016
PancreasCenterNews_spring2016PancreasCenterNews_spring2016
PancreasCenterNews_spring2016Arnetha Whitmore
 
ΠΛΗ20 ΤΕΣΤ 22
ΠΛΗ20 ΤΕΣΤ 22ΠΛΗ20 ΤΕΣΤ 22
ΠΛΗ20 ΤΕΣΤ 22
Dimitris Psounis
 

Viewers also liked (15)

EFG Product News 2015
EFG Product News 2015EFG Product News 2015
EFG Product News 2015
 
Carta eliana
Carta elianaCarta eliana
Carta eliana
 
A aicep Portugal Global | Sessão informativa 'Internacionalizar e as Empresas...
A aicep Portugal Global | Sessão informativa 'Internacionalizar e as Empresas...A aicep Portugal Global | Sessão informativa 'Internacionalizar e as Empresas...
A aicep Portugal Global | Sessão informativa 'Internacionalizar e as Empresas...
 
Estado de espirito
Estado de espiritoEstado de espirito
Estado de espirito
 
Johnnie walker
Johnnie walkerJohnnie walker
Johnnie walker
 
Slideshare#1
Slideshare#1Slideshare#1
Slideshare#1
 
Gracious city 2
Gracious city 2Gracious city 2
Gracious city 2
 
Firme fundamento
Firme fundamentoFirme fundamento
Firme fundamento
 
Certificado(ensayo)
Certificado(ensayo)Certificado(ensayo)
Certificado(ensayo)
 
Excel
Excel Excel
Excel
 
Imagen 1
Imagen  1Imagen  1
Imagen 1
 
INSEME Séniors et numérique
INSEME Séniors et numériqueINSEME Séniors et numérique
INSEME Séniors et numérique
 
Atelier Monnaies Complémentaires - Rencontres de Babyloan 2010
Atelier Monnaies Complémentaires - Rencontres de Babyloan 2010Atelier Monnaies Complémentaires - Rencontres de Babyloan 2010
Atelier Monnaies Complémentaires - Rencontres de Babyloan 2010
 
PancreasCenterNews_spring2016
PancreasCenterNews_spring2016PancreasCenterNews_spring2016
PancreasCenterNews_spring2016
 
ΠΛΗ20 ΤΕΣΤ 22
ΠΛΗ20 ΤΕΣΤ 22ΠΛΗ20 ΤΕΣΤ 22
ΠΛΗ20 ΤΕΣΤ 22
 

Similar to Glusterd_thread_synchronization_using_urcu_lca2016

Gluster d thread_synchronization_using_urcu_lca2016
Gluster d thread_synchronization_using_urcu_lca2016Gluster d thread_synchronization_using_urcu_lca2016
Gluster d thread_synchronization_using_urcu_lca2016
Gluster.org
 
Userspace RCU library : what linear multiprocessor scalability means for your...
Userspace RCU library : what linear multiprocessor scalability means for your...Userspace RCU library : what linear multiprocessor scalability means for your...
Userspace RCU library : what linear multiprocessor scalability means for your...
Alexey Ivanov
 
Thread synchronization in GlusterD using URCU
Thread synchronization in GlusterD using URCUThread synchronization in GlusterD using URCU
Thread synchronization in GlusterD using URCU
Atin Mukherjee
 
Linux Synchronization Mechanism: RCU (Read Copy Update)
Linux Synchronization Mechanism: RCU (Read Copy Update)Linux Synchronization Mechanism: RCU (Read Copy Update)
Linux Synchronization Mechanism: RCU (Read Copy Update)
Adrian Huang
 
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Kernel Recipes 2019 - RCU in 2019 - Joel FernandesKernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Anne Nicolas
 
Yet another introduction to Linux RCU
Yet another introduction to Linux RCUYet another introduction to Linux RCU
Yet another introduction to Linux RCU
Viller Hsiao
 
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Dead Lock Analysis of spin_lock() in Linux Kernel (english)Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Sneeker Yeh
 
Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...
Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...
Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...Andrey Vagin
 
Андрей Вагин. Все что вы хотели знать о Criu, но стеснялись спросить...
Андрей Вагин. Все что вы хотели знать о Criu, но стеснялись спросить...Андрей Вагин. Все что вы хотели знать о Criu, но стеснялись спросить...
Андрей Вагин. Все что вы хотели знать о Criu, но стеснялись спросить...
WG_ Events
 
Checkpoint and Restore In Userspace
Checkpoint and Restore In UserspaceCheckpoint and Restore In Userspace
Checkpoint and Restore In Userspace
OpenVZ
 
Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...
Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...
Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...
OpenVZ
 
Programming with Threads in Java
Programming with Threads in JavaProgramming with Threads in Java
Programming with Threads in Javakoji lin
 
What’s new in 9.6, by PostgreSQL contributor
What’s new in 9.6, by PostgreSQL contributorWhat’s new in 9.6, by PostgreSQL contributor
What’s new in 9.6, by PostgreSQL contributor
Masahiko Sawada
 
Userspace adaptive spinlocks with rseq
Userspace adaptive spinlocks with rseqUserspace adaptive spinlocks with rseq
Userspace adaptive spinlocks with rseq
Igalia
 
Streaming replication in practice
Streaming replication in practiceStreaming replication in practice
Streaming replication in practice
Alexey Lesovsky
 
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
PROIDEA
 
Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2
PgTraining
 
We shall play a game....
We shall play a game....We shall play a game....
We shall play a game....
Sadia Textile
 
Linux Locking Mechanisms
Linux Locking MechanismsLinux Locking Mechanisms
Linux Locking Mechanisms
Kernel TLV
 
Shall we play a game?
Shall we play a game?Shall we play a game?
Shall we play a game?
IngridRivera36
 

Similar to Glusterd_thread_synchronization_using_urcu_lca2016 (20)

Gluster d thread_synchronization_using_urcu_lca2016
Gluster d thread_synchronization_using_urcu_lca2016Gluster d thread_synchronization_using_urcu_lca2016
Gluster d thread_synchronization_using_urcu_lca2016
 
Userspace RCU library : what linear multiprocessor scalability means for your...
Userspace RCU library : what linear multiprocessor scalability means for your...Userspace RCU library : what linear multiprocessor scalability means for your...
Userspace RCU library : what linear multiprocessor scalability means for your...
 
Thread synchronization in GlusterD using URCU
Thread synchronization in GlusterD using URCUThread synchronization in GlusterD using URCU
Thread synchronization in GlusterD using URCU
 
Linux Synchronization Mechanism: RCU (Read Copy Update)
Linux Synchronization Mechanism: RCU (Read Copy Update)Linux Synchronization Mechanism: RCU (Read Copy Update)
Linux Synchronization Mechanism: RCU (Read Copy Update)
 
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Kernel Recipes 2019 - RCU in 2019 - Joel FernandesKernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
 
Yet another introduction to Linux RCU
Yet another introduction to Linux RCUYet another introduction to Linux RCU
Yet another introduction to Linux RCU
 
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Dead Lock Analysis of spin_lock() in Linux Kernel (english)Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
 
Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...
Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...
Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...
 
Андрей Вагин. Все что вы хотели знать о Criu, но стеснялись спросить...
Андрей Вагин. Все что вы хотели знать о Criu, но стеснялись спросить...Андрей Вагин. Все что вы хотели знать о Criu, но стеснялись спросить...
Андрей Вагин. Все что вы хотели знать о Criu, но стеснялись спросить...
 
Checkpoint and Restore In Userspace
Checkpoint and Restore In UserspaceCheckpoint and Restore In Userspace
Checkpoint and Restore In Userspace
 
Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...
Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...
Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...
 
Programming with Threads in Java
Programming with Threads in JavaProgramming with Threads in Java
Programming with Threads in Java
 
What’s new in 9.6, by PostgreSQL contributor
What’s new in 9.6, by PostgreSQL contributorWhat’s new in 9.6, by PostgreSQL contributor
What’s new in 9.6, by PostgreSQL contributor
 
Userspace adaptive spinlocks with rseq
Userspace adaptive spinlocks with rseqUserspace adaptive spinlocks with rseq
Userspace adaptive spinlocks with rseq
 
Streaming replication in practice
Streaming replication in practiceStreaming replication in practice
Streaming replication in practice
 
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
 
Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2
 
We shall play a game....
We shall play a game....We shall play a game....
We shall play a game....
 
Linux Locking Mechanisms
Linux Locking MechanismsLinux Locking Mechanisms
Linux Locking Mechanisms
 
Shall we play a game?
Shall we play a game?Shall we play a game?
Shall we play a game?
 

More from Atin Mukherjee

GlusterD 2.0 - Managing Distributed File System Using a Centralized Store
GlusterD 2.0 - Managing Distributed File System Using a Centralized StoreGlusterD 2.0 - Managing Distributed File System Using a Centralized Store
GlusterD 2.0 - Managing Distributed File System Using a Centralized Store
Atin Mukherjee
 
Ready to go
Ready to goReady to go
Ready to go
Atin Mukherjee
 
Gluster d2.0
Gluster d2.0Gluster d2.0
Gluster d2.0
Atin Mukherjee
 
Manging scalability of distributed system
Manging scalability of distributed systemManging scalability of distributed system
Manging scalability of distributed systemAtin Mukherjee
 
GlusterD - Daemon refactoring
GlusterD - Daemon refactoringGlusterD - Daemon refactoring
GlusterD - Daemon refactoring
Atin Mukherjee
 
Consensus algo with_distributed_key_value_store_in_distributed_system
Consensus algo with_distributed_key_value_store_in_distributed_systemConsensus algo with_distributed_key_value_store_in_distributed_system
Consensus algo with_distributed_key_value_store_in_distributed_system
Atin Mukherjee
 
Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Atin Mukherjee
 

More from Atin Mukherjee (7)

GlusterD 2.0 - Managing Distributed File System Using a Centralized Store
GlusterD 2.0 - Managing Distributed File System Using a Centralized StoreGlusterD 2.0 - Managing Distributed File System Using a Centralized Store
GlusterD 2.0 - Managing Distributed File System Using a Centralized Store
 
Ready to go
Ready to goReady to go
Ready to go
 
Gluster d2.0
Gluster d2.0Gluster d2.0
Gluster d2.0
 
Manging scalability of distributed system
Manging scalability of distributed systemManging scalability of distributed system
Manging scalability of distributed system
 
GlusterD - Daemon refactoring
GlusterD - Daemon refactoringGlusterD - Daemon refactoring
GlusterD - Daemon refactoring
 
Consensus algo with_distributed_key_value_store_in_distributed_system
Consensus algo with_distributed_key_value_store_in_distributed_systemConsensus algo with_distributed_key_value_store_in_distributed_system
Consensus algo with_distributed_key_value_store_in_distributed_system
 
Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015
 

Recently uploaded

ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
ankuprajapati0525
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 

Recently uploaded (20)

ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 

Glusterd_thread_synchronization_using_urcu_lca2016

  • 1.
  • 2. 2 Glusterd Thread Synchronization using user space RCU Atin Mukherjee SSE-Red Hat Gluster Maintainer IRC : atinm on freenode Twitter: @mukherjee_atin
  • 3. 3 Agenda ● Introduction to GlusterD ● Big lock in thread synchronization in GlusterD ● Issues with Big Lock approach ● Different locking primitives ● What is RCU ● Advantage of RCU over read-write lock ● RCU mechanisms – Insertion, Deletion, Reader ● URCU flavors ● URCU APIs ● URCU use cases ● Q&A
  • 4. 4 What is GlusterD ● Manages the cluster configuration for Gluster ● Responsible for – Peer membership management – Elastic volume management – Configuration consistency – Distributed command execution (orchestration) – Service management (manages GlusterFS daemons)
  • 5. 5 Thread synchronization in GlusterD ● GlusterD was initially designed as single threaded ● Single threaded → Multi threaded to satisfy usecases like snapshot ● Big lock – A coarse grained lock – Only one transaction can work inside big lock – Protects all the shared data structures
  • 6. 6 Issues with Big Lock ● Threads contend for even unrelated data ● Can end up in a deadlock – RPC request's callback also needs big lock ● Shall we release big lock in between a transaction to get rid of above deadlock? Yes we do, but…. ● Here come's the problem - a small window of time when the shared data structures are prone to updates leading to inconsistencies
  • 7. 7 Different locking primitives ● Fine grained locks – Mutex – Read-write lock – Spin lock – Seq lock – Read-Copy-Update (RCU)
  • 8. 8 What is RCU ● Synchronization mechanism ● Not new, added to Linux Kernel in 2002 ● Allows reads to occur concurrently with update ● Maintains multiple version of objects for read coherency ● Almost zero over heads in read side critical section
  • 9. 9 Advantages of RCU over read-write lock ● Concurrent readers & writers – writer writes, readers read ● Wait free reads – RCU readers have no wait overhead. They can never be blocked by writers ● Existence guarantee – RCU guarantees that RCU protected data in a readers critical section will remain in existence till the end of the critical section ● Deadlock immunity – RCU readers always run in a deterministic time as they never block. This means that they can never become a part of a deadlock. ● No writer starvation – As RCU readers don't block, writers can never starve.
  • 10. 10 RCU mechanism ● RCU is made up of three fundamental mechanisms – Publish-Subscribe Mechanism (for insertion) – Wait For Pre-Existing RCU Readers to Complete (for deletion) – Maintain Multiple Versions of Recently Updated Objects (for readers)
  • 11. 11 Publish-Subscribe model ● rcu_assign_pointer () for publication 1 struct foo { 2 int a; 3 int b; 4 int c; 5 }; 6 struct foo *gp = NULL; 7 8 /* . . . */ 9 10 p = malloc (...); 11 p->a = 1; 12 p->b = 2; 13 p->c = 3; 14 gp = p; 1 struct foo { 2 int a; 3 int b; 4 int c; 5 }; 6 struct foo *gp = NULL; 7 8 /* . . . */ 9 10 p = malloc (...); 11 p->a = 1; 12 p->b = 2; 13 p->c = 3; 14 rcu_assign_pointer(gp, p); ● rcu_dereference () for subscription 1 p = gp; 2 if (p != NULL) { 3 do_something_with(p->a, p->b, p->c); 4 } 1 rcu_read_lock(); 2 p = rcu_dereference(gp); 3 if (p != NULL) { 4 do_something_with(p->a, p->b, p->c); 5 } 6 rcu_read_unlock();
  • 12. 12 Publish-Subscribe Model (ii) ● rcu_assign_pointer () & rcu_dereference () embedded in special RCU variants of Linux's list-manipulation API ● rcu_assign_pointer () → list_add_rcu () ● rcu_dereference () → list_for_each_entry_rcu ()
  • 13. 13 Wait For Pre-Existing RCU Readers to Complete ● Approach used for deletion ● Synchronous – synchronize_rcu () ● Asynchronous – call_rcu () q = malloc(...); *q = *p; q->b = 2; q->c = 3; list_replace_rcu(&p->list, &q->list); synchronize_rcu(); free(p) q = malloc(...); *q = *p; q->b = 2; q->c = 3; list_replace_rcu(&p->list, &q->list); call_rcu (&p->list, cbk); /* cbk will free p */
  • 14. 14 Maintain multiple version objects ● Used for existence gurantee 1. p = search(head, key); 2. list_del_rcu(&p->list); 3. synchronize_rcu(); 4. free (p); 1. p = search(head, key); 2. list_del_rcu(&p->list); 3. synchronize_rcu(); 4. free (p); 1. p = search(head, key); 2. list_del_rcu(&p->list); 3. synchronize_rcu(); 4. free (p); Maintain multiple version objects ● Used for existence gurantee 1. p = search(head, key); 2. list_del_rcu(&p->list); 3. synchronize_rcu(); 4. free (p); 1. p = search(head, key); 2. list_del_rcu(&p->list); 3. synchronize_rcu(); 4. free (p); 1. p = search(head, key); 2. list_del_rcu(&p->list); 3. synchronize_rcu(); 4. free (p);
  • 15. 15 URCU flavors ● QSBR (quiescent-state-based RCU) – each thread must periodically invoke rcu_quiescent_state() – Thread (un)registration required ● Memory-barrier-based RCU – Preemptible RCU implementation – Introduces memory barrier in read critical secion, hence high read side overhead ● “Bullet-proof” RCU (RCU-BP) – Similar like memory barrier based RCU but thread (un)registration is taken care – Primitive overheads but can be used by application without worrying about thread creation/destruction
  • 16. 16 URCU flavors (ii) ● Signal-based RCU – Removes memory barrier – Can be used by library function – requires that the user application give up a POSIX signal to be used by synchronize_rcu() in place of the read-side memory barriers. – Requires explicit thread registration ● Signal-based RCU using an out-of-tree sys_membarrier() system call – sys_membarrier() system call instead of POSIX signal
  • 17. 17 URCU APIs ● Atomic-operation and utility APIs – caa_: Concurrent Architecture Abstraction. – cmm_: Concurrent Memory Model. – uatomic_: URCU Atomic Operation. – https://lwn.net/Articles/573435/ ● The URCU APIs – https://lwn.net/Articles/573439/ ● RCU-Protected Lists – https://lwn.net/Articles/573441
  • 18. 18 When is URCU useful
  • 19. 19 References ● https://lwn.net/Articles/262464/ ● https://lwn.net/Articles/263130/ ● https://lwn.net/Articles/573424/ ● http://www.efficios.com/pub/lpc2011/Presentation- lpc2011-desnoyers-urcu.pdf ● http://www.rdrop.com/~paulmck/RCU/RCU.IISc- Bangalore.2013.06.03a.pdf ● http://urcu.so/