SlideShare a Scribd company logo
Cacheonix: !
Architecture for !
Strict Data Consistency !
in Distributed Systems !
with Failures!
Slava Imeshev!
simeshev@cacheonix.org!
July 29, 2015!
Agenda

•  Strict data consistency
•  Lessons learned
•  Q&A
Introductions

Slava Imeshev:
•  Management style: my team is my family
•  For fun: sci-fi, hard rock, hiking, camping
•  Hobbies: software development, ham radio
•  E-mail: simeshev@cacheonix.org
Cacheonix!
https://github.com/cacheonix/cacheonix-core

Cacheonix Open Source distributed Java
cache:
–  Strict data consistency
–  Horizontal scalability
–  Fault-tolerance
–  Concurrency
–  Distributed state sharing
–  Coherent front cache
–  Distributed locks
–  Compute grid with data affinity
–  Load balancing
Strict Data Consistency

•  A guaranty that once an update to the key
happened, all members of the cluster will
see the new value
•  Knowing where the key value is at all times
Cacheonix Cluster
Cacheonix Architecture
Architecture for Strict Data
Consistency

These key components working together…
•  Replicated state machine
•  Cluster management protocol
•  Reliable totally-ordered multicast protocol
•  State transfer on join
•  P2P protocol with re-transmits
… allow to know EXACTLY where the data
in the cluster is.
Replicated State Machine

•  Maintains a consistent replicated configuration
of the cluster by:
•  Executing cluster, cache and partition configuration
events
–  On all members of the cluster
–  In the same total order
Cluster Management Protocol

•  Detects nodes joining and failing
•  Maintains replicated cluster view
•  Feeds the cluster events in total order to
reliable totally ordered multicast protocol
Reliable Totally Ordered
Multicast

•  Carries cache member events (leave/join)
•  Carries partition configuration messages
•  Executes replicated bucket ownership
assignment table part of the replicated state
machine
State Transfer on Join

•  When a node joins a cluster, it receives a
replicated state machine from its join
coordinator
•  Total order of events including join / leave
guarantees that events are executed in this
order on all members of the cluster:
•  At t0 there is no new node
•  At t1 there is new member fully aware of cluster
topology, data bucket locations and ready to
operate
•  At t2 replicated state machine begin to execute
repartitioning protocol to move data to the new
member of the cluster
P2P Protocol With Retransmits

•  Carries data modification messages in the
cluster (get, put, execute etc)
•  Automatically resends messages if a partition
undergoing re-configuration (move, replicate,
restore etc)
•  Ensures that reads and writes to a key served
one and only by a guaranteed owner of the
key.
Member Failure Example 

1.  Member fails, then, on all nodes, synchronously:
2.  Cluster management protocol executes command Node
Left of the state machine ClusterView
3.  ClusterView executes Remove Node command of the
state machine BucketOwnershipAssingmentTable
4.  BucketOwnershipAssingmentTable executes the
repartitioning algorithm
5.  Repartitioning algorithm marks buckets as
reconfiguring and sends P2P messages to move
buckets around
6.  P2P messages send a reliable mcast message Move
Complete
7.  BucketOwnershipAssingmentTable marks buckets as
operational
8.  All members of the cluster in the same state.
Lessons Learned

•  Tackle hard problems first:
–  Hard problems define the architecture
–  Hard problems drive the schedule
–  Start with handling failure modes
•  Make unknowns known, do research
Cacheonix Roadmap

•  Fully-replicated cache
•  Weighted partitioning
•  Read/write affinity
•  Cluster-optimized serialization
•  Version-based clustering
•  Off-heap storage
Q&A

Ask me anything!

Slava Imeshev

simeshev@cacheonix.org

More Related Content

What's hot

TSYS Ns2 project Demo
TSYS Ns2 project DemoTSYS Ns2 project Demo
TSYS Ns2 project Demo
tsysglobalsolutions
 
Replication in the Wild
Replication in the WildReplication in the Wild
Replication in the Wild
Ensar Basri Kahveci
 
Synchronization Pradeep K Sinha
Synchronization Pradeep K SinhaSynchronization Pradeep K Sinha
Synchronization Pradeep K Sinha
Jawwad Rafiq
 
Vector clock algorithm
Vector clock algorithmVector clock algorithm
Vector clock algorithm
S. Anbu
 
Ethernet port aggregation and load balancing with ONTAP
Ethernet port aggregation and load balancing with ONTAPEthernet port aggregation and load balancing with ONTAP
Ethernet port aggregation and load balancing with ONTAP
Damien Berezenko
 
PAC 2019 virtual Antoine Toulme
PAC 2019 virtual Antoine ToulmePAC 2019 virtual Antoine Toulme
PAC 2019 virtual Antoine Toulme
Neotys
 
SNAPL Network Verification
SNAPL Network VerificationSNAPL Network Verification
SNAPL Network Verification
aurojit
 
Chapter 6 synchronization
Chapter 6 synchronizationChapter 6 synchronization
Synchronization
SynchronizationSynchronization
Synchronization
Sara shall
 
Raft in details
Raft in detailsRaft in details
Raft in details
Ivan Glushkov
 
3. syncro. in distributed system
3. syncro. in distributed system3. syncro. in distributed system
3. syncro. in distributed system
Gd Goenka University
 
Ds practical file
Ds practical fileDs practical file
Ds practical file
Khushboo Pal
 
Synchronization in distributed systems
Synchronization in distributed systems Synchronization in distributed systems
Synchronization in distributed systems
SHATHAN
 

What's hot (13)

TSYS Ns2 project Demo
TSYS Ns2 project DemoTSYS Ns2 project Demo
TSYS Ns2 project Demo
 
Replication in the Wild
Replication in the WildReplication in the Wild
Replication in the Wild
 
Synchronization Pradeep K Sinha
Synchronization Pradeep K SinhaSynchronization Pradeep K Sinha
Synchronization Pradeep K Sinha
 
Vector clock algorithm
Vector clock algorithmVector clock algorithm
Vector clock algorithm
 
Ethernet port aggregation and load balancing with ONTAP
Ethernet port aggregation and load balancing with ONTAPEthernet port aggregation and load balancing with ONTAP
Ethernet port aggregation and load balancing with ONTAP
 
PAC 2019 virtual Antoine Toulme
PAC 2019 virtual Antoine ToulmePAC 2019 virtual Antoine Toulme
PAC 2019 virtual Antoine Toulme
 
SNAPL Network Verification
SNAPL Network VerificationSNAPL Network Verification
SNAPL Network Verification
 
Chapter 6 synchronization
Chapter 6 synchronizationChapter 6 synchronization
Chapter 6 synchronization
 
Synchronization
SynchronizationSynchronization
Synchronization
 
Raft in details
Raft in detailsRaft in details
Raft in details
 
3. syncro. in distributed system
3. syncro. in distributed system3. syncro. in distributed system
3. syncro. in distributed system
 
Ds practical file
Ds practical fileDs practical file
Ds practical file
 
Synchronization in distributed systems
Synchronization in distributed systems Synchronization in distributed systems
Synchronization in distributed systems
 

Viewers also liked

Manufacturer of Mineral Insulated Thermocouple Cable, Mineral Insulated Therm...
Manufacturer of Mineral Insulated Thermocouple Cable, Mineral Insulated Therm...Manufacturer of Mineral Insulated Thermocouple Cable, Mineral Insulated Therm...
Manufacturer of Mineral Insulated Thermocouple Cable, Mineral Insulated Therm...
Sneha Singh
 
Art4
Art4Art4
Presentation arrow group_2013
Presentation arrow group_2013Presentation arrow group_2013
Presentation arrow group_2013
Urena Nicolas
 
Pearl Waterless Car Care Finland - Customer's Vehicle, Yacht, Motorcycle Trea...
Pearl Waterless Car Care Finland - Customer's Vehicle, Yacht, Motorcycle Trea...Pearl Waterless Car Care Finland - Customer's Vehicle, Yacht, Motorcycle Trea...
Pearl Waterless Car Care Finland - Customer's Vehicle, Yacht, Motorcycle Trea...
Pearl Nano Promotions
 
Brochure2 a
Brochure2 aBrochure2 a
Brochure2 a
Sneha Singh
 
Electric Resistance Heating Alloy RAIL®-145A1
Electric Resistance Heating Alloy RAIL®-145A1Electric Resistance Heating Alloy RAIL®-145A1
Electric Resistance Heating Alloy RAIL®-145A1
Sneha Singh
 
10 things about k12
10 things about k1210 things about k12
10 things about k12Sugar Russ
 
Splunk Sales Presentation Imagemaker 2014
Splunk Sales Presentation Imagemaker 2014Splunk Sales Presentation Imagemaker 2014
Splunk Sales Presentation Imagemaker 2014
Urena Nicolas
 
AppDynamics Sales Presentation Imagemaker 2014
AppDynamics Sales Presentation Imagemaker 2014AppDynamics Sales Presentation Imagemaker 2014
AppDynamics Sales Presentation Imagemaker 2014
Urena Nicolas
 
Electric Resistance Heating Alloy RAIL®-NC 8020 (Brightray® Alloy S) Strip
Electric Resistance Heating Alloy RAIL®-NC 8020 (Brightray® Alloy S) StripElectric Resistance Heating Alloy RAIL®-NC 8020 (Brightray® Alloy S) Strip
Electric Resistance Heating Alloy RAIL®-NC 8020 (Brightray® Alloy S) Strip
Sneha Singh
 
Aportes alfabetizacion alumnos_discapacidad_visual_y_auditiva_final
Aportes alfabetizacion alumnos_discapacidad_visual_y_auditiva_finalAportes alfabetizacion alumnos_discapacidad_visual_y_auditiva_final
Aportes alfabetizacion alumnos_discapacidad_visual_y_auditiva_final
glenda judith temi
 
ISATHERM Plus® and ISATHERM Minus ® Thermocouple Alloys.pdf
ISATHERM Plus® and ISATHERM Minus ® Thermocouple Alloys.pdfISATHERM Plus® and ISATHERM Minus ® Thermocouple Alloys.pdf
ISATHERM Plus® and ISATHERM Minus ® Thermocouple Alloys.pdf
Sneha Singh
 

Viewers also liked (15)

Web innovate
Web innovateWeb innovate
Web innovate
 
Innovate Digital
Innovate DigitalInnovate Digital
Innovate Digital
 
Manufacturer of Mineral Insulated Thermocouple Cable, Mineral Insulated Therm...
Manufacturer of Mineral Insulated Thermocouple Cable, Mineral Insulated Therm...Manufacturer of Mineral Insulated Thermocouple Cable, Mineral Insulated Therm...
Manufacturer of Mineral Insulated Thermocouple Cable, Mineral Insulated Therm...
 
Art4
Art4Art4
Art4
 
Presentation arrow group_2013
Presentation arrow group_2013Presentation arrow group_2013
Presentation arrow group_2013
 
Pearl Waterless Car Care Finland - Customer's Vehicle, Yacht, Motorcycle Trea...
Pearl Waterless Car Care Finland - Customer's Vehicle, Yacht, Motorcycle Trea...Pearl Waterless Car Care Finland - Customer's Vehicle, Yacht, Motorcycle Trea...
Pearl Waterless Car Care Finland - Customer's Vehicle, Yacht, Motorcycle Trea...
 
Brochure2 a
Brochure2 aBrochure2 a
Brochure2 a
 
Electric Resistance Heating Alloy RAIL®-145A1
Electric Resistance Heating Alloy RAIL®-145A1Electric Resistance Heating Alloy RAIL®-145A1
Electric Resistance Heating Alloy RAIL®-145A1
 
10 things about k12
10 things about k1210 things about k12
10 things about k12
 
Splunk Sales Presentation Imagemaker 2014
Splunk Sales Presentation Imagemaker 2014Splunk Sales Presentation Imagemaker 2014
Splunk Sales Presentation Imagemaker 2014
 
AppDynamics Sales Presentation Imagemaker 2014
AppDynamics Sales Presentation Imagemaker 2014AppDynamics Sales Presentation Imagemaker 2014
AppDynamics Sales Presentation Imagemaker 2014
 
Electric Resistance Heating Alloy RAIL®-NC 8020 (Brightray® Alloy S) Strip
Electric Resistance Heating Alloy RAIL®-NC 8020 (Brightray® Alloy S) StripElectric Resistance Heating Alloy RAIL®-NC 8020 (Brightray® Alloy S) Strip
Electric Resistance Heating Alloy RAIL®-NC 8020 (Brightray® Alloy S) Strip
 
Aportes alfabetizacion alumnos_discapacidad_visual_y_auditiva_final
Aportes alfabetizacion alumnos_discapacidad_visual_y_auditiva_finalAportes alfabetizacion alumnos_discapacidad_visual_y_auditiva_final
Aportes alfabetizacion alumnos_discapacidad_visual_y_auditiva_final
 
4
44
4
 
ISATHERM Plus® and ISATHERM Minus ® Thermocouple Alloys.pdf
ISATHERM Plus® and ISATHERM Minus ® Thermocouple Alloys.pdfISATHERM Plus® and ISATHERM Minus ® Thermocouple Alloys.pdf
ISATHERM Plus® and ISATHERM Minus ® Thermocouple Alloys.pdf
 

Similar to Strict-Data-Consistency-in-Distrbuted-Systems-With-Failures

Cassandra
CassandraCassandra
Cassandraexsuns
 
The Power of Determinism in Database Systems
The Power of Determinism in Database SystemsThe Power of Determinism in Database Systems
The Power of Determinism in Database Systems
Daniel Abadi
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rules
Oleg Tsal-Tsalko
 
Cosmos DB at VLDB 2019
Cosmos DB at VLDB 2019Cosmos DB at VLDB 2019
Cosmos DB at VLDB 2019
Dharma Shukla
 
Exactly-once Stream Processing Done Right with Matthias J Sax
Exactly-once Stream Processing Done Right with Matthias J SaxExactly-once Stream Processing Done Right with Matthias J Sax
Exactly-once Stream Processing Done Right with Matthias J Sax
HostedbyConfluent
 
Consensus Algorithms: An Introduction & Analysis
Consensus Algorithms: An Introduction & AnalysisConsensus Algorithms: An Introduction & Analysis
Consensus Algorithms: An Introduction & Analysis
Zak Cole
 
Cluster
ClusterCluster
Cluster
Yasir Wani
 
Chapter Introductionn to distributed system .pptx
Chapter Introductionn to distributed system .pptxChapter Introductionn to distributed system .pptx
Chapter Introductionn to distributed system .pptx
Tekle12
 
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayReal-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Altinity Ltd
 
Galera explained 3
Galera explained 3Galera explained 3
Galera explained 3
Marco Tusa
 
Cassandra
CassandraCassandra
Cassandra
ssuserbad56d
 
6.1-Cassandra.ppt
6.1-Cassandra.ppt6.1-Cassandra.ppt
6.1-Cassandra.ppt
DanBarcan2
 
Everything you always wanted to know about Distributed databases, at devoxx l...
Everything you always wanted to know about Distributed databases, at devoxx l...Everything you always wanted to know about Distributed databases, at devoxx l...
Everything you always wanted to know about Distributed databases, at devoxx l...
javier ramirez
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
SudheerKumar499932
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
Fabio Fumarola
 
Multiprocessor
MultiprocessorMultiprocessor
Multiprocessor
Satvik Khurana
 
Dos unit3
Dos unit3Dos unit3
Dos unit3
JebasheelaSJ
 
Computer organization memory hierarchy
Computer organization memory hierarchyComputer organization memory hierarchy
Computer organization memory hierarchy
AJAL A J
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
lucenerevolution
 

Similar to Strict-Data-Consistency-in-Distrbuted-Systems-With-Failures (20)

Cassandra
CassandraCassandra
Cassandra
 
The Power of Determinism in Database Systems
The Power of Determinism in Database SystemsThe Power of Determinism in Database Systems
The Power of Determinism in Database Systems
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rules
 
Cosmos DB at VLDB 2019
Cosmos DB at VLDB 2019Cosmos DB at VLDB 2019
Cosmos DB at VLDB 2019
 
Exactly-once Stream Processing Done Right with Matthias J Sax
Exactly-once Stream Processing Done Right with Matthias J SaxExactly-once Stream Processing Done Right with Matthias J Sax
Exactly-once Stream Processing Done Right with Matthias J Sax
 
Consensus Algorithms: An Introduction & Analysis
Consensus Algorithms: An Introduction & AnalysisConsensus Algorithms: An Introduction & Analysis
Consensus Algorithms: An Introduction & Analysis
 
Cluster
ClusterCluster
Cluster
 
Chapter Introductionn to distributed system .pptx
Chapter Introductionn to distributed system .pptxChapter Introductionn to distributed system .pptx
Chapter Introductionn to distributed system .pptx
 
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayReal-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
 
Galera explained 3
Galera explained 3Galera explained 3
Galera explained 3
 
6.1-Cassandra.ppt
6.1-Cassandra.ppt6.1-Cassandra.ppt
6.1-Cassandra.ppt
 
Cassandra
CassandraCassandra
Cassandra
 
6.1-Cassandra.ppt
6.1-Cassandra.ppt6.1-Cassandra.ppt
6.1-Cassandra.ppt
 
Everything you always wanted to know about Distributed databases, at devoxx l...
Everything you always wanted to know about Distributed databases, at devoxx l...Everything you always wanted to know about Distributed databases, at devoxx l...
Everything you always wanted to know about Distributed databases, at devoxx l...
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
 
Multiprocessor
MultiprocessorMultiprocessor
Multiprocessor
 
Dos unit3
Dos unit3Dos unit3
Dos unit3
 
Computer organization memory hierarchy
Computer organization memory hierarchyComputer organization memory hierarchy
Computer organization memory hierarchy
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 

Strict-Data-Consistency-in-Distrbuted-Systems-With-Failures

  • 1. Cacheonix: ! Architecture for ! Strict Data Consistency ! in Distributed Systems ! with Failures! Slava Imeshev! simeshev@cacheonix.org! July 29, 2015!
  • 2. Agenda •  Strict data consistency •  Lessons learned •  Q&A
  • 3.
  • 4. Introductions Slava Imeshev: •  Management style: my team is my family •  For fun: sci-fi, hard rock, hiking, camping •  Hobbies: software development, ham radio •  E-mail: simeshev@cacheonix.org
  • 5.
  • 6. Cacheonix! https://github.com/cacheonix/cacheonix-core Cacheonix Open Source distributed Java cache: –  Strict data consistency –  Horizontal scalability –  Fault-tolerance –  Concurrency –  Distributed state sharing –  Coherent front cache –  Distributed locks –  Compute grid with data affinity –  Load balancing
  • 7. Strict Data Consistency •  A guaranty that once an update to the key happened, all members of the cluster will see the new value •  Knowing where the key value is at all times
  • 8.
  • 11.
  • 12. Architecture for Strict Data Consistency These key components working together… •  Replicated state machine •  Cluster management protocol •  Reliable totally-ordered multicast protocol •  State transfer on join •  P2P protocol with re-transmits … allow to know EXACTLY where the data in the cluster is.
  • 13. Replicated State Machine •  Maintains a consistent replicated configuration of the cluster by: •  Executing cluster, cache and partition configuration events –  On all members of the cluster –  In the same total order
  • 14.
  • 15. Cluster Management Protocol •  Detects nodes joining and failing •  Maintains replicated cluster view •  Feeds the cluster events in total order to reliable totally ordered multicast protocol
  • 16. Reliable Totally Ordered Multicast •  Carries cache member events (leave/join) •  Carries partition configuration messages •  Executes replicated bucket ownership assignment table part of the replicated state machine
  • 17.
  • 18. State Transfer on Join •  When a node joins a cluster, it receives a replicated state machine from its join coordinator •  Total order of events including join / leave guarantees that events are executed in this order on all members of the cluster: •  At t0 there is no new node •  At t1 there is new member fully aware of cluster topology, data bucket locations and ready to operate •  At t2 replicated state machine begin to execute repartitioning protocol to move data to the new member of the cluster
  • 19. P2P Protocol With Retransmits •  Carries data modification messages in the cluster (get, put, execute etc) •  Automatically resends messages if a partition undergoing re-configuration (move, replicate, restore etc) •  Ensures that reads and writes to a key served one and only by a guaranteed owner of the key.
  • 20. Member Failure Example 1.  Member fails, then, on all nodes, synchronously: 2.  Cluster management protocol executes command Node Left of the state machine ClusterView 3.  ClusterView executes Remove Node command of the state machine BucketOwnershipAssingmentTable 4.  BucketOwnershipAssingmentTable executes the repartitioning algorithm 5.  Repartitioning algorithm marks buckets as reconfiguring and sends P2P messages to move buckets around 6.  P2P messages send a reliable mcast message Move Complete 7.  BucketOwnershipAssingmentTable marks buckets as operational 8.  All members of the cluster in the same state.
  • 21. Lessons Learned •  Tackle hard problems first: –  Hard problems define the architecture –  Hard problems drive the schedule –  Start with handling failure modes •  Make unknowns known, do research
  • 22. Cacheonix Roadmap •  Fully-replicated cache •  Weighted partitioning •  Read/write affinity •  Cluster-optimized serialization •  Version-based clustering •  Off-heap storage
  • 23.
  • 24. Q&A Ask me anything! Slava Imeshev simeshev@cacheonix.org