SlideShare a Scribd company logo
Data Consistency in
Distributed Systems
with Akka DData
by Dmitry Martyanov
January, 2017
Preamble
• Shared Mutable State
Preamble
• Shared Mutable State
• Locks
Preamble
• Shared Mutable State
• Locks
• ACID
Preamble
• Shared Mutable State
• Locks
• ACID
• Scalability, Elasticity, Remote Deployment and so on
Preamble
• Shared Mutable State
• Locks
• ACID
• Scalability, Elasticity, Remote Deployment and so on
• Eventual Consistency
Preamble
• Shared Mutable State
• Locks
• ACID
• Scalability, Elasticity, Remote Deployment and so on
• Eventual Consistency
• Affinity (sticky sessions) and Read-my-Writes
Preamble
• Shared Mutable State
• Locks
• ACID
• Scalability, Elasticity, Remote Deployment and so on
• Eventual Consistency
• Affinity (sticky sessions) and Read-my-Writes
• Data Management Solutions
Preamble
• Shared Mutable State
• Locks
• ACID
• Scalability, Elasticity, Remote Deployment and so on
• Eventual Consistency
• Affinity (sticky sessions) and Read-my-Writes
• Data Management Solutions
• Event Sourcing
Problem Definition
• cAP is not about consistency
• Even with Immediate consistency Concurrent
Access is essentially clients problem
• In XDC fault tolerant environment assumptions are
very expensive
CRDT
CRDT
Conflict-free replicated data types
• Every operation is a transformation from previous
state to new one
• Data type + rules
CRDT
refresh math
• Commutativity: a + b = b + a
• Associativity: (a + b) + c = a + (b + c)
• Idempotence: f(x) = f(f(x))
CmRDT
commutative
Conditions:
+ Commutativity
+ Associativity
+ ???
- Idempotence x1
x2
x3
f(x1)
f(x2)
f(x3)
g(x2)
g(x2)
g(x2)
CmRDT
commutative
Conditions:
+ Commutativity
+ Associativity
+ Exactly once delivery
- Idempotence x1
x2
x3
f(x1)
f(x2)
f(x3)
g(x2)
g(x2)
g(x2)
CvRDT
convergent
Conditions:
+ Commutativity
+ Associativity
+ Idempotence
- Exactly once delivery
x1
x2
x3
f(x1)
merge
merge
g(x2)
merge
merge
CRDT
examples
• G-Counter (m)
• PN-Counter
• LWW-Register
• MV-Register
• G-Set
• 2P-Set
• OR-Map
• LWW-Set
• OR-Set
• …
CRDT:
Causality tracking
Causality
• Vector Clock
(https://en.wikipedia.org/wiki/Vector_clock)
• Version Vector
(https://en.wikipedia.org/wiki/Version_vector)
• Dotted Version Vector
(https://github.com/ricardobcl/Dotted-Version-
Vectors)
Causality
DVV
Causality vs CRDT
• CRDT to achieve convergence
• Causality to achieve correct overriding
CRDT
implementations
Riak KV
(http://docs.basho.com/riak/kv/2.2.0/developi
ng/data-types/)
Akka Distributed Data
(http://doc.akka.io/docs/akka/2.4.16/scala/dis
tributed-data.html)
Akka Distributed Data
Akka Cluster
• Actors =>
• Akka Remote =>
• Akka Cluster
• Gossip protocol
• Failure detector
• Singleton
• Leader
• Shard Coordinator
1
3
2
5
4
6
7
Akka Distributed Data
• akka.cluster.ddata.Replicator
• akka.cluster.ddata.DistributedData
• akka.cluster.ddata.ReplicatedData
ddata.Replicator
Subscribe[A <: ReplicatedData](key: Key[A],
subscriber: ActorRef)
Changed[A <: ReplicatedData](key: Key[A])(data: A)
Get[A <: ReplicatedData](key: Key[A],
consistency: ReadConsistency,
request: Option[Any] = None)
Update[A <: ReplicatedData](key: Key[A],
writeConsistency: WriteConsistency,
request: Option[Any])(
val modify: Option[A] => A)
Delete[A <: ReplicatedData](key: Key[A],
consistency: WriteConsistency)
ddata.Replicator
class ReplicatorDemo extends Actor with ActorLogging {
val replicator = DistributedData(context.system).replicator
implicit val node = Cluster(context.system)
val DataKey = ORSetKey[String]("key")
replicator ! Subscribe(DataKey, self)
override def receive: Receive = {
case AddString(str) =>
replicator ! Update(DataKey, ORSet.empty[String], WriteLocal)(_ + str)
case RemoveString(str) =>
replicator ! Update(DataKey, ORSet.empty[String], WriteLocal)(_ - str)
case c @ Changed(DataKey) =>
val data = c.get(DataKey)
println(data.elements.mkString("; "))
}
}
ddata.ReplicatedData
trait ReplicatedData {
type T <: ReplicatedData
def merge(that: T): T
}
ddata.ReplicatedData
case class TwoPhaseSet(
adds: GSet[String] = GSet.empty,
removals: GSet[String] = GSet.empty)
extends ReplicatedData {
type T = TwoPhaseSet
def add(element: String): TwoPhaseSet =
copy(adds = adds.add(element))
def remove(element: String): TwoPhaseSet =
copy(removals = removals.add(element))
def elements: Set[String] = adds.elements diff removals.elements
override def merge(that: TwoPhaseSet): TwoPhaseSet =
copy(
adds = this.adds.merge(that.adds),
removals = this.removals.merge(that.removals))
}
Summary
Thanks!
Q & A
dmitry.a.martyanov@gmail.com

More Related Content

What's hot

DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax Academy
 
Cassandra Distributions and Variants
Cassandra Distributions and VariantsCassandra Distributions and Variants
Cassandra Distributions and Variants
Anant Corporation
 
Building a Data Plane with K8ssandra, Apache Cassandra on Kubernetes
Building a Data Plane with K8ssandra, Apache Cassandra on KubernetesBuilding a Data Plane with K8ssandra, Apache Cassandra on Kubernetes
Building a Data Plane with K8ssandra, Apache Cassandra on Kubernetes
Christopher Bradford
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Taking Your Database Global with Kubernetes
Taking Your Database Global with KubernetesTaking Your Database Global with Kubernetes
Taking Your Database Global with Kubernetes
Christopher Bradford
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time Cassandra
Acunu
 
NoSQL and NewSQL: Tradeoffs between Scalable Performance & Consistency
NoSQL and NewSQL: Tradeoffs between Scalable Performance & ConsistencyNoSQL and NewSQL: Tradeoffs between Scalable Performance & Consistency
NoSQL and NewSQL: Tradeoffs between Scalable Performance & Consistency
ScyllaDB
 
Migrating from a Relational Database to Cassandra: Why, Where, When and How
Migrating from a Relational Database to Cassandra: Why, Where, When and HowMigrating from a Relational Database to Cassandra: Why, Where, When and How
Migrating from a Relational Database to Cassandra: Why, Where, When and How
Anant Corporation
 
Apache Cassandra Lunch #78: Deploy Cassandra using DSE Operator to Kubernetes
Apache Cassandra Lunch #78: Deploy Cassandra using DSE Operator to KubernetesApache Cassandra Lunch #78: Deploy Cassandra using DSE Operator to Kubernetes
Apache Cassandra Lunch #78: Deploy Cassandra using DSE Operator to Kubernetes
Anant Corporation
 
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
DataStax
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
DataStax
 
Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Sto...
Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Sto...Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Sto...
Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Sto...
DataStax
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
DataStax Academy
 
Cassandra vs Databases
Cassandra vs Databases Cassandra vs Databases
Cassandra vs Databases
Anant Corporation
 
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
DataStax
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
DataStax Academy
 
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in China
DataStax Academy
 
Webinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache CassandraWebinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache Cassandra
DataStax
 
Using Time Window Compaction Strategy For Time Series Workloads
Using Time Window Compaction Strategy For Time Series WorkloadsUsing Time Window Compaction Strategy For Time Series Workloads
Using Time Window Compaction Strategy For Time Series Workloads
Jeff Jirsa
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
DataStax Academy
 

What's hot (20)

DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and Analytics
 
Cassandra Distributions and Variants
Cassandra Distributions and VariantsCassandra Distributions and Variants
Cassandra Distributions and Variants
 
Building a Data Plane with K8ssandra, Apache Cassandra on Kubernetes
Building a Data Plane with K8ssandra, Apache Cassandra on KubernetesBuilding a Data Plane with K8ssandra, Apache Cassandra on Kubernetes
Building a Data Plane with K8ssandra, Apache Cassandra on Kubernetes
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Taking Your Database Global with Kubernetes
Taking Your Database Global with KubernetesTaking Your Database Global with Kubernetes
Taking Your Database Global with Kubernetes
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time Cassandra
 
NoSQL and NewSQL: Tradeoffs between Scalable Performance & Consistency
NoSQL and NewSQL: Tradeoffs between Scalable Performance & ConsistencyNoSQL and NewSQL: Tradeoffs between Scalable Performance & Consistency
NoSQL and NewSQL: Tradeoffs between Scalable Performance & Consistency
 
Migrating from a Relational Database to Cassandra: Why, Where, When and How
Migrating from a Relational Database to Cassandra: Why, Where, When and HowMigrating from a Relational Database to Cassandra: Why, Where, When and How
Migrating from a Relational Database to Cassandra: Why, Where, When and How
 
Apache Cassandra Lunch #78: Deploy Cassandra using DSE Operator to Kubernetes
Apache Cassandra Lunch #78: Deploy Cassandra using DSE Operator to KubernetesApache Cassandra Lunch #78: Deploy Cassandra using DSE Operator to Kubernetes
Apache Cassandra Lunch #78: Deploy Cassandra using DSE Operator to Kubernetes
 
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
 
Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Sto...
Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Sto...Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Sto...
Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Sto...
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
 
Cassandra vs Databases
Cassandra vs Databases Cassandra vs Databases
Cassandra vs Databases
 
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in China
 
Webinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache CassandraWebinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache Cassandra
 
Using Time Window Compaction Strategy For Time Series Workloads
Using Time Window Compaction Strategy For Time Series WorkloadsUsing Time Window Compaction Strategy For Time Series Workloads
Using Time Window Compaction Strategy For Time Series Workloads
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 

Viewers also liked

Distributed systems and consistency
Distributed systems and consistencyDistributed systems and consistency
Distributed systems and consistency
seldo
 
Consistency in Distributed Systems
Consistency in Distributed SystemsConsistency in Distributed Systems
Consistency in Distributed Systems
DATAVERSITY
 
Storage Consistency for ECE536
Storage Consistency for ECE536Storage Consistency for ECE536
Storage Consistency for ECE536
Husain Al Yusuf
 
Implementing a Distributed Hash Table with Scala and Akka
Implementing a Distributed Hash Table with Scala and AkkaImplementing a Distributed Hash Table with Scala and Akka
Implementing a Distributed Hash Table with Scala and Akka
Tristan Penman
 
Consistency in Distributed Systems
Consistency in Distributed SystemsConsistency in Distributed Systems
Consistency in Distributed Systems
Shane Johnson
 
Brendan O'Bra Scala By the Schuykill
Brendan O'Bra Scala By the SchuykillBrendan O'Bra Scala By the Schuykill
Brendan O'Bra Scala By the Schuykill
Brendan O'Bra
 
Slides - Intro to Akka.Cluster
Slides - Intro to Akka.ClusterSlides - Intro to Akka.Cluster
Slides - Intro to Akka.Cluster
petabridge
 
Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, SparkBuilding Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Todd Fritz
 
Security in distributed systems
Security in distributed systems Security in distributed systems
Security in distributed systems
Haitham Ahmed
 
Akka persistence == event sourcing in 30 minutes
Akka persistence == event sourcing in 30 minutesAkka persistence == event sourcing in 30 minutes
Akka persistence == event sourcing in 30 minutes
Konrad Malawski
 
Fault tolerance in distributed systems
Fault tolerance in distributed systemsFault tolerance in distributed systems
Fault tolerance in distributed systems
sumitjain2013
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Shirshanka Das
 
Architecting a Next Generation Data Platform
Architecting a Next Generation Data PlatformArchitecting a Next Generation Data Platform
Architecting a Next Generation Data Platform
hadooparchbook
 
Akka-chan's Survival Guide for the Streaming World
Akka-chan's Survival Guide for the Streaming WorldAkka-chan's Survival Guide for the Streaming World
Akka-chan's Survival Guide for the Streaming World
Konrad Malawski
 
Building a Big Data Team
Building a Big Data TeamBuilding a Big Data Team
Building a Big Data Team
elephantscale
 

Viewers also liked (15)

Distributed systems and consistency
Distributed systems and consistencyDistributed systems and consistency
Distributed systems and consistency
 
Consistency in Distributed Systems
Consistency in Distributed SystemsConsistency in Distributed Systems
Consistency in Distributed Systems
 
Storage Consistency for ECE536
Storage Consistency for ECE536Storage Consistency for ECE536
Storage Consistency for ECE536
 
Implementing a Distributed Hash Table with Scala and Akka
Implementing a Distributed Hash Table with Scala and AkkaImplementing a Distributed Hash Table with Scala and Akka
Implementing a Distributed Hash Table with Scala and Akka
 
Consistency in Distributed Systems
Consistency in Distributed SystemsConsistency in Distributed Systems
Consistency in Distributed Systems
 
Brendan O'Bra Scala By the Schuykill
Brendan O'Bra Scala By the SchuykillBrendan O'Bra Scala By the Schuykill
Brendan O'Bra Scala By the Schuykill
 
Slides - Intro to Akka.Cluster
Slides - Intro to Akka.ClusterSlides - Intro to Akka.Cluster
Slides - Intro to Akka.Cluster
 
Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, SparkBuilding Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
 
Security in distributed systems
Security in distributed systems Security in distributed systems
Security in distributed systems
 
Akka persistence == event sourcing in 30 minutes
Akka persistence == event sourcing in 30 minutesAkka persistence == event sourcing in 30 minutes
Akka persistence == event sourcing in 30 minutes
 
Fault tolerance in distributed systems
Fault tolerance in distributed systemsFault tolerance in distributed systems
Fault tolerance in distributed systems
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Architecting a Next Generation Data Platform
Architecting a Next Generation Data PlatformArchitecting a Next Generation Data Platform
Architecting a Next Generation Data Platform
 
Akka-chan's Survival Guide for the Streaming World
Akka-chan's Survival Guide for the Streaming WorldAkka-chan's Survival Guide for the Streaming World
Akka-chan's Survival Guide for the Streaming World
 
Building a Big Data Team
Building a Big Data TeamBuilding a Big Data Team
Building a Big Data Team
 

Similar to Data Consistency in Distributed Systems with Akka Distributed Data

Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...
StreamNative
 
Leveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive ClustersLeveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive Clusters
Ran Ziv
 
Scalable Storage for Massive Volume Data Systems
Scalable Storage for Massive Volume Data SystemsScalable Storage for Massive Volume Data Systems
Scalable Storage for Massive Volume Data Systems
Lars Nielsen
 
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsGreg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Flink Forward
 
MariaDB on MS Azure - 2
MariaDB on MS Azure - 2MariaDB on MS Azure - 2
MariaDB on MS Azure - 2
MariaDB plc
 
A closer look to locaweb IaaS
A closer look to locaweb IaaSA closer look to locaweb IaaS
A closer look to locaweb IaaS
Gleicon Moraes
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Helena Edelson
 
Real-Time Analytics with Apache Cassandra and Apache Spark,
Real-Time Analytics with Apache Cassandra and Apache Spark,Real-Time Analytics with Apache Cassandra and Apache Spark,
Real-Time Analytics with Apache Cassandra and Apache Spark,
Swiss Data Forum Swiss Data Forum
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache Spark
Guido Schmutz
 
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
buildacloud
 
Norway VMUG Tour - The Architecture Behind Policy-Driven Data Protection - A ...
Norway VMUG Tour - The Architecture Behind Policy-Driven Data Protection - A ...Norway VMUG Tour - The Architecture Behind Policy-Driven Data Protection - A ...
Norway VMUG Tour - The Architecture Behind Policy-Driven Data Protection - A ...
Chris Wahl
 
Scylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent DatabasesScylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent Databases
ScyllaDB
 
Designing microservices
Designing microservicesDesigning microservices
Designing microservices
Masashi Narumoto
 
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui MengChallenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Databricks
 
Challenging Web-Scale Graph Analytics with Apache Spark
Challenging Web-Scale Graph Analytics with Apache SparkChallenging Web-Scale Graph Analytics with Apache Spark
Challenging Web-Scale Graph Analytics with Apache Spark
Databricks
 
AWS re:Invent 2016: From Monolithic to Microservices: Evolving Architecture P...
AWS re:Invent 2016: From Monolithic to Microservices: Evolving Architecture P...AWS re:Invent 2016: From Monolithic to Microservices: Evolving Architecture P...
AWS re:Invent 2016: From Monolithic to Microservices: Evolving Architecture P...
Amazon Web Services
 
Big Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLBig Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQL
Tugdual Grall
 
Bangalore Meetup - Enable realtime machine learning with streaming data
Bangalore Meetup - Enable realtime machine learning with streaming dataBangalore Meetup - Enable realtime machine learning with streaming data
Bangalore Meetup - Enable realtime machine learning with streaming data
Christina Lin
 
Cloud and Grid Integration OW2 Conference Nov10
Cloud and Grid Integration OW2 Conference Nov10Cloud and Grid Integration OW2 Conference Nov10
Cloud and Grid Integration OW2 Conference Nov10
OW2
 
Loadbalancers: The fabric for your micro services
Loadbalancers: The fabric for your micro servicesLoadbalancers: The fabric for your micro services
Loadbalancers: The fabric for your micro services
Chiradeep Vittal
 

Similar to Data Consistency in Distributed Systems with Akka Distributed Data (20)

Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...
 
Leveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive ClustersLeveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive Clusters
 
Scalable Storage for Massive Volume Data Systems
Scalable Storage for Massive Volume Data SystemsScalable Storage for Massive Volume Data Systems
Scalable Storage for Massive Volume Data Systems
 
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsGreg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
 
MariaDB on MS Azure - 2
MariaDB on MS Azure - 2MariaDB on MS Azure - 2
MariaDB on MS Azure - 2
 
A closer look to locaweb IaaS
A closer look to locaweb IaaSA closer look to locaweb IaaS
A closer look to locaweb IaaS
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
 
Real-Time Analytics with Apache Cassandra and Apache Spark,
Real-Time Analytics with Apache Cassandra and Apache Spark,Real-Time Analytics with Apache Cassandra and Apache Spark,
Real-Time Analytics with Apache Cassandra and Apache Spark,
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache Spark
 
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
 
Norway VMUG Tour - The Architecture Behind Policy-Driven Data Protection - A ...
Norway VMUG Tour - The Architecture Behind Policy-Driven Data Protection - A ...Norway VMUG Tour - The Architecture Behind Policy-Driven Data Protection - A ...
Norway VMUG Tour - The Architecture Behind Policy-Driven Data Protection - A ...
 
Scylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent DatabasesScylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent Databases
 
Designing microservices
Designing microservicesDesigning microservices
Designing microservices
 
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui MengChallenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
 
Challenging Web-Scale Graph Analytics with Apache Spark
Challenging Web-Scale Graph Analytics with Apache SparkChallenging Web-Scale Graph Analytics with Apache Spark
Challenging Web-Scale Graph Analytics with Apache Spark
 
AWS re:Invent 2016: From Monolithic to Microservices: Evolving Architecture P...
AWS re:Invent 2016: From Monolithic to Microservices: Evolving Architecture P...AWS re:Invent 2016: From Monolithic to Microservices: Evolving Architecture P...
AWS re:Invent 2016: From Monolithic to Microservices: Evolving Architecture P...
 
Big Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLBig Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQL
 
Bangalore Meetup - Enable realtime machine learning with streaming data
Bangalore Meetup - Enable realtime machine learning with streaming dataBangalore Meetup - Enable realtime machine learning with streaming data
Bangalore Meetup - Enable realtime machine learning with streaming data
 
Cloud and Grid Integration OW2 Conference Nov10
Cloud and Grid Integration OW2 Conference Nov10Cloud and Grid Integration OW2 Conference Nov10
Cloud and Grid Integration OW2 Conference Nov10
 
Loadbalancers: The fabric for your micro services
Loadbalancers: The fabric for your micro servicesLoadbalancers: The fabric for your micro services
Loadbalancers: The fabric for your micro services
 

Recently uploaded

Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
jpupo2018
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 

Recently uploaded (20)

Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 

Data Consistency in Distributed Systems with Akka Distributed Data

Editor's Notes

  1. Motivation. We all know that the root of most of problems in software is shared mutable state, it impacts on accidental complexity and as the result on scalability, correctness and cost of maintenance. Out of the Tar Pit is a very good paper about it. Recommended.
  2. As this problem appeared because of hardware architecture at the beginning of the era of programming, the most straightforward and durable solution was introduced. This solution is to use locks. There is variety of different locks, they are validated, they work pretty well, on every interview you would be asked about locks and so on.
  3. And the main philosophy in terms data persistence for system with locks is ACID. Atomicity Consistency Isolation Durability, which implies transactional data modification. It is what most of relational databases provide for you.
  4. But for last years there was a significant growth of internet traffic and new requirements have been appeared like scalability, elasticity. This gave birth to the need for distributed systems. And locks in distributed systems doesn’t work so well, they are slow, transactions fail often, overlaps happen - they limit your scalability.
  5. And usually at this moment someone in your company who is very smart and top talented like of course everyone in this room, suggests to use eventual consistency instead. And a lot of teams stop after this change, because it works fine. If your cluster replication happens faster than updates occur everything is fine, I like to call it good weather. But you need to keep in mind that your shared state become distributed state. And this fact come back when you start actually utilise your requirements. In highload environment we see that cluster works even worse than it was before, instead of transaction fails we have data losses and so on. We see some degree of divergence in our data, we are losing trust to our system. To solve it we begin to introduce some ad-hoc solutions which make your system even more complex
  6. Affinity is a tricky thing, in some case it helps you to increase predictivity, but you need to remember as much as it helps you as less your system really distributed
  7. Data management with coordinators has a lack of fault tolerance, limitation in elasticity
  8. Event sourcing is the 3rd solution to deal with distributed state, actually insert only strategy works fine in terms of shared state, but in my practise it works really bad in terms of business compatibility. The gap between you and business will grow significantly if you go with ES. By the way, has anybody already submitted any business application working in live based on event sourcing ? Are you proud of it ?
  9. So problem definition: Eventual Consistency is not about consistency under high load, when we talk about CAP theorem and choose AP we drop consistency at all. It doesn’t mean that we shift from immediate to eventually, we are dropping consistency at all. Even with Immediate consistency concurrency is clients problem In XDC fault tolerant environment assumptions are very expensive
  10. For persons who work in Finance, Banking or Insurance probably it is now new that there is like hype for last years about blockchain as technology. I am personally quite sceptical about it and I would like to share with you why. Blockchain is a technology that guarantees exactly once delivery in distributed environment without any assumption, which is very important. Because it is difficult to guarantee this property in some corner cases, like node initialisation, node showdown and so on, connection lost. But blockchain does it. So blockchain is an ingredient for this formula, if you have other 2 and you need exactly once delivery blockchain, it will help you.
  11. G-Counter - grow only counter like vector clock (counter per actor and merge all counters) PN-Counter - increment/decrement counter is comprised from 2 G-Counter LWW- Register - last write win register, based on timestamp MV-Register(LWWMAp) - multi value register G-Set - grow only set 2P-Set - once add once remove Set (2 G-Set) OR-Set - Observed-Removed Set - add/remove set with causality
  12. Dotted version allow to avoid false positives.
  13. Akka Distributed Data. So there is short disclaimer here, I am not an Akka Cluster advocate, please don’t ask me questions about it :)
  14. Akka Cluster. So for those who are new, akka is a framework that wraps an actor model, it implies the management of some lightweight logical units(actors) which can communicate to each other within message passing(Communicating Sequential Processes by Tony Hoar). All of this allows to achieve highly concurrent environment with minimum of side effects(Sounds like silver bullet - I know :) ). The next step was to create a capability to utilise more than 1 JVM for your actor system, it was introduction of akka remoting. And the current state is Akka Cluster which encapsulate all best practices related to issues existing with akka remote. Gossip protocol - to have all the nodes known about others Failure Detector - to detect whether the node is unreachable or it is just network interruption Singleton - for different single point management entities Leader - to include / exclude nodes from cluster Sharding Coordinator - to coordinate creation of stateful actors within each node of the cluster. This is exactly what we are going to talk about, because the idea is that it is very expensive and slow to communicate with remote node for CRUD operations, we would like to have a local copy with is somehow synchronized between all nodes. Here is a moment when data consistency comes in.
  15. So there is 3 main components for providing an access to DIstributed Data Replicator is an actor providing Access API to distributed KV store DistributedData is an Akka systen extension to create Replicators ReplicatedData is an abstract trait required to be mixed into your data structures to make the convergeable.
  16. Here I show you briefly what are the methods: Subscribe & Change - allows you to subscribe an actor of all the changes within the Key, subscribers receive Change message from replicator when the key was changed within replication. Get & Update & Delete - classical CRUD operations with the key, there is Consistency mode which allow you to leverage how many nodes will be reached. Replicator will reply to any of this message, this is exactly your consistency model.
  17. ReplicatedData is a basic trait for your distributed data types, it defines merge function, which extends your data to convergent CRDT.