SlideShare a Scribd company logo
1 of 22
Download to read offline
Aggregation =
Replication
Carlos Baquero
Universidade do
Minho & INESC
TEC
Aggregation = Replication
Carlos Baquero
Universidade do Minho & INESC TEC
Dagstuhl Seminar 19442, October 2019
Aggregation =
Replication
Carlos Baquero
Universidade do
Minho & INESC
TEC
Dagstuhl Seminar 19442
Aggregation =
Replication
Carlos Baquero
Universidade do
Minho & INESC
TEC
Internet of things (and users)
Context / System model
Constraints
Geo-distribution: Availability Zones, Edge, Things & Users
Asynchrony, Independent failure, Crashes (and possibly recovery)
Aspirations
Scalability in numbers and distances
Partition tolerance, local availability and autonomy
Aggregation =
Replication
Carlos Baquero
Universidade do
Minho & INESC
TEC
Toolbox
Replication and Aggregation
Replication for high availability and low latency
Shared state among replicas. Local uncoordinated updates.
Integration of received remote updates. Distributed logs of
operations. Causal consistency. Conflict free replicated data types.
Distributed data aggregation and summarization
Local sources of data: storage space, load, temperature, . . . . Source
location and time (points or streams). Global aggregates: counts,
maximum, average, top-k, CDF. Global status and prediction.
Aggregation =
Replication
Carlos Baquero
Universidade do
Minho & INESC
TEC
Example Quiz: Temperature control
Can you find: (1) replica state, (2) user input, (3) data aggregate?
Aggregation =
Replication
Carlos Baquero
Universidade do
Minho & INESC
TEC
Example Quiz: Temperature control
Can you find: (1) replica state, (2) user input, (3) data aggregate?
Aggregation =
Replication
Carlos Baquero
Universidade do
Minho & INESC
TEC
Replication
LightKone blog post: Aggregation is not Replication
“. . . in replication there is an abstraction of a single replicated
state that can be updated in the multiple locations where a
replica is present.”
(Creative Commons: https://en.wikipedia.org/wiki/Multiway switching)
Aggregation =
Replication
Carlos Baquero
Universidade do
Minho & INESC
TEC
Aggregation
LightKone blog post: Aggregation is not Replication
“. . . data to be aggregated is often not directly controlled by
users, it usually results from an external physical process or
the result of complex system evolutions.”
(Public Domain: https://en.wikipedia.org/wiki/Global warming)
Aggregation =
Replication
Carlos Baquero
Universidade do
Minho & INESC
TEC
Replication
Conflict Free Replicated Data Types
Operation Based (reliable causal delivery - exactly once)
Classic: Operations are translated into effects, broadcast effects
Delivery log is sequential and causal consistent
Pure: Operations are broadcast as is, adapted on delivery
Delivery log holds causal partial order
State Based (commutative, associative, idempotent)
Classic: Operations mutate local state, ships full state
δ State: Operations are translated into state deltas, ships deltas
Aggregation =
Replication
Carlos Baquero
Universidade do
Minho & INESC
TEC
CRDTs
Operation Based
Only some (sequential) data types have truly commutative operations
Commutative: f (g(x)) = g(f (x))
inc(dec(x)) = dec(inc(x))
add(a, add(b, x)) = add(b, add(a, x))
Operations can be shipped as is
Non commutative: f (g(x)) = g(f (x))
add(v, rmv(v, x)) = rmv(v, add(v, x))
Classic: Translate to embed rmv → add or add → rmv
Pure: Ship as is. Use order info in delivery
Aggregation =
Replication
Carlos Baquero
Universidade do
Minho & INESC
TEC
CRDTs
State Based
State evolution defined by a semi-lattice. Join and partial order
State - operations induce state mutations
X m(X)
E.g. adda over {b, e} mutates it into {a, b, e}
δ State - single out the mutation
m(X) ≡ X mδ
(X)
E.g. addδ
a over {b, e} derives {a}
Do all commutative types lead to trivial state translations?
No, and the cause is not ordering but derives from idempotency
Aggregation =
Replication
Carlos Baquero
Universidade do
Minho & INESC
TEC
CRDTs
δ state based grow-only set
GSet E = P(E)
⊥ = {}
insertδ
i (e, s) = {e}
elements(s) = s
s s = s ∪ s
Sets are “naturally” idempotent
Aggregation =
Replication
Carlos Baquero
Universidade do
Minho & INESC
TEC
CRDTs
δ state based grow-only counter
GCounter = I → N
⊥ = {}
incδ
i (m) = {i → m(i) + 1}
value(m) =
j∈I
m(j)
m m = {j → max(m(j), m (j)) | j ∈ dom m ∪ dom m }
Counter state must be made idempotent
Aggregation =
Replication
Carlos Baquero
Universidade do
Minho & INESC
TEC
Replication
Recap
Support for high availability leads to analysis of data type operations
and their possible adaptations for adequate dissemination under the
chosen network system model
Adaptation for dissemination will also occur in Aggregation
Aggregation =
Replication
Carlos Baquero
Universidade do
Minho & INESC
TEC
Aggregation
Aggregation can be simply defined as:
“the ability to summarize information”
(Renesse, Birman, Vogels. Astrolabe. ACM TOCS, 2013)
Aggregation =
Replication
Carlos Baquero
Universidade do
Minho & INESC
TEC
Aggregation
Problem Definition
An aggregation function f takes a multiset of elements from a
domain I and produces an output of a domain O.
f : NI
→ O
Resulting that:
Order is not relevant
Elements can occur multiple times
E.g. multiset M = {10, 32, 10, 7}, aggregation func: max(M) = 32
Aggregation =
Replication
Carlos Baquero
Universidade do
Minho & INESC
TEC
Aggregation
Typical functions
sum
count
max or min
average
mode
Aggregation =
Replication
Carlos Baquero
Universidade do
Minho & INESC
TEC
Aggregation
Decomposable aggregation functions
Self-decomposable function f
For some merge operator ⊕ and all non-empty multisets X and Y:
f (X Y ) = f (X) ⊕ f (Y )
E.g.
count({x}) = 1
count({X Y }) = count(X) + count(Y )
Aggregation =
Replication
Carlos Baquero
Universidade do
Minho & INESC
TEC
Aggregation
Decomposable aggregation functions
Some functions are not self-decomposable but can be transformed
Decomposable function f
For some function g and a self-decomposable aggregation function h,
it can be expressed as:
f = g ◦ h
E.g.
average(X) = g(h(X))
h({x}) = (x, 1)
h(X Y ) = h(X) + h(Y )
g((s, c)) = s/c
Aggregation =
Replication
Carlos Baquero
Universidade do
Minho & INESC
TEC
Aggregation
Duplicates / Idempotency
Duplicate insensitive
Result only depends on support set of the multiset
E.g.
min({1, 3, 1, 2, 4, 5, 4, 5}) = min({1, 3, 2, 4, 5}) = 1
Duplicate sensitive
Multiplicity is relevant to result
E.g.
8 = count({1, 3, 1, 2, 4, 5, 4, 5}) = count({1, 3, 2, 4, 5}) = 5
Aggregation =
Replication
Carlos Baquero
Universidade do
Minho & INESC
TEC
Aggregation
Adding idempotency
Functions like count and sum can be made duplicate insensitive by a
stochastic transformations
E.g.
Extrema propagation derives a vector of k exponential random
variables that can be aggregated by max and used to approximate the
aggregated sum. Thus relaxing (exactly-once) network assumptions.
Aggregate sums by Extrema Propagation
Node i holds value vi
Initially Vi = rexp(k, vi )
On gossip message Vm from anther node do Vi = max(Vi , Vm)
Aggregate sum estimation value is k−1
Vi
Aggregation =
Replication
Carlos Baquero
Universidade do
Minho & INESC
TEC
Further reference
Search for blog post “Aggregation is not Replication”
Nuno Pregui¸ca, Carlos Baquero, Marc Shapiro:
Conflict-Free Replicated Data Types CRDTs.
Encyclopedia of Big Data Technologies 2019
Paulo Jesus, Carlos Baquero, Paulo S´ergio Almeida:
A Survey of Distributed Data Aggregation Algorithms.
IEEE Communications Surveys and Tutorials 17(1): 381-404
(2015)

More Related Content

What's hot

Gradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboostGradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboostJaroslaw Szymczak
 
Provenance Annotation and Analysis to Support Process Re-Computation
Provenance Annotation and Analysis to Support Process Re-ComputationProvenance Annotation and Analysis to Support Process Re-Computation
Provenance Annotation and Analysis to Support Process Re-ComputationPaolo Missier
 
Quasi-Optimal Recombination Operator
Quasi-Optimal Recombination OperatorQuasi-Optimal Recombination Operator
Quasi-Optimal Recombination Operatorjfrchicanog
 
Parallel Optimization in Machine Learning
Parallel Optimization in Machine LearningParallel Optimization in Machine Learning
Parallel Optimization in Machine LearningFabian Pedregosa
 
第19回ステアラボ人工知能セミナー発表資料
第19回ステアラボ人工知能セミナー発表資料第19回ステアラボ人工知能セミナー発表資料
第19回ステアラボ人工知能セミナー発表資料Takayuki Osogami
 
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLSandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLMLconf
 
Dask glm-scipy2017-final
Dask glm-scipy2017-finalDask glm-scipy2017-final
Dask glm-scipy2017-finalHussain Sultan
 
Approaches to online quantile estimation
Approaches to online quantile estimationApproaches to online quantile estimation
Approaches to online quantile estimationData Con LA
 
Linear Regression (Machine Learning)
Linear Regression (Machine Learning)Linear Regression (Machine Learning)
Linear Regression (Machine Learning)Omkar Rane
 
Map reduce (from Google)
Map reduce (from Google)Map reduce (from Google)
Map reduce (from Google)Sri Prasanna
 
AP Calculus Slides February 14, 2008
AP Calculus Slides February 14, 2008AP Calculus Slides February 14, 2008
AP Calculus Slides February 14, 2008Darren Kuropatwa
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineSoma Boubou
 
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit
 
Mapreduce: Theory and implementation
Mapreduce: Theory and implementationMapreduce: Theory and implementation
Mapreduce: Theory and implementationSri Prasanna
 
rit seminars-privacy assured outsourcing of image reconstruction services in ...
rit seminars-privacy assured outsourcing of image reconstruction services in ...rit seminars-privacy assured outsourcing of image reconstruction services in ...
rit seminars-privacy assured outsourcing of image reconstruction services in ...thahirakabeer
 
Traffic flow modeling on road networks using Hamilton-Jacobi equations
Traffic flow modeling on road networks using Hamilton-Jacobi equationsTraffic flow modeling on road networks using Hamilton-Jacobi equations
Traffic flow modeling on road networks using Hamilton-Jacobi equationsGuillaume Costeseque
 

What's hot (16)

Gradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboostGradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboost
 
Provenance Annotation and Analysis to Support Process Re-Computation
Provenance Annotation and Analysis to Support Process Re-ComputationProvenance Annotation and Analysis to Support Process Re-Computation
Provenance Annotation and Analysis to Support Process Re-Computation
 
Quasi-Optimal Recombination Operator
Quasi-Optimal Recombination OperatorQuasi-Optimal Recombination Operator
Quasi-Optimal Recombination Operator
 
Parallel Optimization in Machine Learning
Parallel Optimization in Machine LearningParallel Optimization in Machine Learning
Parallel Optimization in Machine Learning
 
第19回ステアラボ人工知能セミナー発表資料
第19回ステアラボ人工知能セミナー発表資料第19回ステアラボ人工知能セミナー発表資料
第19回ステアラボ人工知能セミナー発表資料
 
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLSandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
 
Dask glm-scipy2017-final
Dask glm-scipy2017-finalDask glm-scipy2017-final
Dask glm-scipy2017-final
 
Approaches to online quantile estimation
Approaches to online quantile estimationApproaches to online quantile estimation
Approaches to online quantile estimation
 
Linear Regression (Machine Learning)
Linear Regression (Machine Learning)Linear Regression (Machine Learning)
Linear Regression (Machine Learning)
 
Map reduce (from Google)
Map reduce (from Google)Map reduce (from Google)
Map reduce (from Google)
 
AP Calculus Slides February 14, 2008
AP Calculus Slides February 14, 2008AP Calculus Slides February 14, 2008
AP Calculus Slides February 14, 2008
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
 
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
 
Mapreduce: Theory and implementation
Mapreduce: Theory and implementationMapreduce: Theory and implementation
Mapreduce: Theory and implementation
 
rit seminars-privacy assured outsourcing of image reconstruction services in ...
rit seminars-privacy assured outsourcing of image reconstruction services in ...rit seminars-privacy assured outsourcing of image reconstruction services in ...
rit seminars-privacy assured outsourcing of image reconstruction services in ...
 
Traffic flow modeling on road networks using Hamilton-Jacobi equations
Traffic flow modeling on road networks using Hamilton-Jacobi equationsTraffic flow modeling on road networks using Hamilton-Jacobi equations
Traffic flow modeling on road networks using Hamilton-Jacobi equations
 

Similar to Aggregation vs Replication in Distributed Systems

Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationGeoffrey Fox
 
Inria Tech Talk - La classification de données complexes avec MASSICCC
Inria Tech Talk - La classification de données complexes avec MASSICCCInria Tech Talk - La classification de données complexes avec MASSICCC
Inria Tech Talk - La classification de données complexes avec MASSICCCStéphanie Roger
 
A Future for R: Parallel and Distributed Processing in R for Everyone
A Future for R: Parallel and Distributed Processing in R for EveryoneA Future for R: Parallel and Distributed Processing in R for Everyone
A Future for R: Parallel and Distributed Processing in R for Everyoneinside-BigData.com
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine ParallelismSri Prasanna
 
Building data fusion surrogate models for spacecraft aerodynamic problems wit...
Building data fusion surrogate models for spacecraft aerodynamic problems wit...Building data fusion surrogate models for spacecraft aerodynamic problems wit...
Building data fusion surrogate models for spacecraft aerodynamic problems wit...Shinwoo Jang
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzerbutest
 
Compiler Construction for DLX Processor
Compiler Construction for DLX Processor Compiler Construction for DLX Processor
Compiler Construction for DLX Processor Soham Kulkarni
 
RedisConf18 - CRDTs and Redis - From sequential to concurrent executions
RedisConf18 - CRDTs and Redis - From sequential to concurrent executionsRedisConf18 - CRDTs and Redis - From sequential to concurrent executions
RedisConf18 - CRDTs and Redis - From sequential to concurrent executionsRedis Labs
 
Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowDaniel S. Katz
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA Japan
 
The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)Pierre Schaus
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorizationmidi
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersUniversity of Huddersfield
 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientFabian Pedregosa
 
On the Expressiveness of Attribute-based Communication
On the Expressiveness of Attribute-based CommunicationOn the Expressiveness of Attribute-based Communication
On the Expressiveness of Attribute-based CommunicationYehia ABD ALRahman
 
Compiler Construction | Lecture 12 | Virtual Machines
Compiler Construction | Lecture 12 | Virtual MachinesCompiler Construction | Lecture 12 | Virtual Machines
Compiler Construction | Lecture 12 | Virtual MachinesEelco Visser
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonChun-Ming Chang
 

Similar to Aggregation vs Replication in Distributed Systems (20)

Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
Inria Tech Talk - La classification de données complexes avec MASSICCC
Inria Tech Talk - La classification de données complexes avec MASSICCCInria Tech Talk - La classification de données complexes avec MASSICCC
Inria Tech Talk - La classification de données complexes avec MASSICCC
 
A Future for R: Parallel and Distributed Processing in R for Everyone
A Future for R: Parallel and Distributed Processing in R for EveryoneA Future for R: Parallel and Distributed Processing in R for Everyone
A Future for R: Parallel and Distributed Processing in R for Everyone
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
 
Building data fusion surrogate models for spacecraft aerodynamic problems wit...
Building data fusion surrogate models for spacecraft aerodynamic problems wit...Building data fusion surrogate models for spacecraft aerodynamic problems wit...
Building data fusion surrogate models for spacecraft aerodynamic problems wit...
 
slides.07.pptx
slides.07.pptxslides.07.pptx
slides.07.pptx
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzer
 
Compiler Construction for DLX Processor
Compiler Construction for DLX Processor Compiler Construction for DLX Processor
Compiler Construction for DLX Processor
 
RedisConf18 - CRDTs and Redis - From sequential to concurrent executions
RedisConf18 - CRDTs and Redis - From sequential to concurrent executionsRedisConf18 - CRDTs and Redis - From sequential to concurrent executions
RedisConf18 - CRDTs and Redis - From sequential to concurrent executions
 
Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance Workflow
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
 
The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorization
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradient
 
On the Expressiveness of Attribute-based Communication
On the Expressiveness of Attribute-based CommunicationOn the Expressiveness of Attribute-based Communication
On the Expressiveness of Attribute-based Communication
 
Compiler Construction | Lecture 12 | Virtual Machines
Compiler Construction | Lecture 12 | Virtual MachinesCompiler Construction | Lecture 12 | Virtual Machines
Compiler Construction | Lecture 12 | Virtual Machines
 
Matlab integration
Matlab integrationMatlab integration
Matlab integration
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in Python
 

Recently uploaded

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Recently uploaded (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

Aggregation vs Replication in Distributed Systems

  • 1. Aggregation = Replication Carlos Baquero Universidade do Minho & INESC TEC Aggregation = Replication Carlos Baquero Universidade do Minho & INESC TEC Dagstuhl Seminar 19442, October 2019
  • 2. Aggregation = Replication Carlos Baquero Universidade do Minho & INESC TEC Dagstuhl Seminar 19442
  • 3. Aggregation = Replication Carlos Baquero Universidade do Minho & INESC TEC Internet of things (and users) Context / System model Constraints Geo-distribution: Availability Zones, Edge, Things & Users Asynchrony, Independent failure, Crashes (and possibly recovery) Aspirations Scalability in numbers and distances Partition tolerance, local availability and autonomy
  • 4. Aggregation = Replication Carlos Baquero Universidade do Minho & INESC TEC Toolbox Replication and Aggregation Replication for high availability and low latency Shared state among replicas. Local uncoordinated updates. Integration of received remote updates. Distributed logs of operations. Causal consistency. Conflict free replicated data types. Distributed data aggregation and summarization Local sources of data: storage space, load, temperature, . . . . Source location and time (points or streams). Global aggregates: counts, maximum, average, top-k, CDF. Global status and prediction.
  • 5. Aggregation = Replication Carlos Baquero Universidade do Minho & INESC TEC Example Quiz: Temperature control Can you find: (1) replica state, (2) user input, (3) data aggregate?
  • 6. Aggregation = Replication Carlos Baquero Universidade do Minho & INESC TEC Example Quiz: Temperature control Can you find: (1) replica state, (2) user input, (3) data aggregate?
  • 7. Aggregation = Replication Carlos Baquero Universidade do Minho & INESC TEC Replication LightKone blog post: Aggregation is not Replication “. . . in replication there is an abstraction of a single replicated state that can be updated in the multiple locations where a replica is present.” (Creative Commons: https://en.wikipedia.org/wiki/Multiway switching)
  • 8. Aggregation = Replication Carlos Baquero Universidade do Minho & INESC TEC Aggregation LightKone blog post: Aggregation is not Replication “. . . data to be aggregated is often not directly controlled by users, it usually results from an external physical process or the result of complex system evolutions.” (Public Domain: https://en.wikipedia.org/wiki/Global warming)
  • 9. Aggregation = Replication Carlos Baquero Universidade do Minho & INESC TEC Replication Conflict Free Replicated Data Types Operation Based (reliable causal delivery - exactly once) Classic: Operations are translated into effects, broadcast effects Delivery log is sequential and causal consistent Pure: Operations are broadcast as is, adapted on delivery Delivery log holds causal partial order State Based (commutative, associative, idempotent) Classic: Operations mutate local state, ships full state δ State: Operations are translated into state deltas, ships deltas
  • 10. Aggregation = Replication Carlos Baquero Universidade do Minho & INESC TEC CRDTs Operation Based Only some (sequential) data types have truly commutative operations Commutative: f (g(x)) = g(f (x)) inc(dec(x)) = dec(inc(x)) add(a, add(b, x)) = add(b, add(a, x)) Operations can be shipped as is Non commutative: f (g(x)) = g(f (x)) add(v, rmv(v, x)) = rmv(v, add(v, x)) Classic: Translate to embed rmv → add or add → rmv Pure: Ship as is. Use order info in delivery
  • 11. Aggregation = Replication Carlos Baquero Universidade do Minho & INESC TEC CRDTs State Based State evolution defined by a semi-lattice. Join and partial order State - operations induce state mutations X m(X) E.g. adda over {b, e} mutates it into {a, b, e} δ State - single out the mutation m(X) ≡ X mδ (X) E.g. addδ a over {b, e} derives {a} Do all commutative types lead to trivial state translations? No, and the cause is not ordering but derives from idempotency
  • 12. Aggregation = Replication Carlos Baquero Universidade do Minho & INESC TEC CRDTs δ state based grow-only set GSet E = P(E) ⊥ = {} insertδ i (e, s) = {e} elements(s) = s s s = s ∪ s Sets are “naturally” idempotent
  • 13. Aggregation = Replication Carlos Baquero Universidade do Minho & INESC TEC CRDTs δ state based grow-only counter GCounter = I → N ⊥ = {} incδ i (m) = {i → m(i) + 1} value(m) = j∈I m(j) m m = {j → max(m(j), m (j)) | j ∈ dom m ∪ dom m } Counter state must be made idempotent
  • 14. Aggregation = Replication Carlos Baquero Universidade do Minho & INESC TEC Replication Recap Support for high availability leads to analysis of data type operations and their possible adaptations for adequate dissemination under the chosen network system model Adaptation for dissemination will also occur in Aggregation
  • 15. Aggregation = Replication Carlos Baquero Universidade do Minho & INESC TEC Aggregation Aggregation can be simply defined as: “the ability to summarize information” (Renesse, Birman, Vogels. Astrolabe. ACM TOCS, 2013)
  • 16. Aggregation = Replication Carlos Baquero Universidade do Minho & INESC TEC Aggregation Problem Definition An aggregation function f takes a multiset of elements from a domain I and produces an output of a domain O. f : NI → O Resulting that: Order is not relevant Elements can occur multiple times E.g. multiset M = {10, 32, 10, 7}, aggregation func: max(M) = 32
  • 17. Aggregation = Replication Carlos Baquero Universidade do Minho & INESC TEC Aggregation Typical functions sum count max or min average mode
  • 18. Aggregation = Replication Carlos Baquero Universidade do Minho & INESC TEC Aggregation Decomposable aggregation functions Self-decomposable function f For some merge operator ⊕ and all non-empty multisets X and Y: f (X Y ) = f (X) ⊕ f (Y ) E.g. count({x}) = 1 count({X Y }) = count(X) + count(Y )
  • 19. Aggregation = Replication Carlos Baquero Universidade do Minho & INESC TEC Aggregation Decomposable aggregation functions Some functions are not self-decomposable but can be transformed Decomposable function f For some function g and a self-decomposable aggregation function h, it can be expressed as: f = g ◦ h E.g. average(X) = g(h(X)) h({x}) = (x, 1) h(X Y ) = h(X) + h(Y ) g((s, c)) = s/c
  • 20. Aggregation = Replication Carlos Baquero Universidade do Minho & INESC TEC Aggregation Duplicates / Idempotency Duplicate insensitive Result only depends on support set of the multiset E.g. min({1, 3, 1, 2, 4, 5, 4, 5}) = min({1, 3, 2, 4, 5}) = 1 Duplicate sensitive Multiplicity is relevant to result E.g. 8 = count({1, 3, 1, 2, 4, 5, 4, 5}) = count({1, 3, 2, 4, 5}) = 5
  • 21. Aggregation = Replication Carlos Baquero Universidade do Minho & INESC TEC Aggregation Adding idempotency Functions like count and sum can be made duplicate insensitive by a stochastic transformations E.g. Extrema propagation derives a vector of k exponential random variables that can be aggregated by max and used to approximate the aggregated sum. Thus relaxing (exactly-once) network assumptions. Aggregate sums by Extrema Propagation Node i holds value vi Initially Vi = rexp(k, vi ) On gossip message Vm from anther node do Vi = max(Vi , Vm) Aggregate sum estimation value is k−1 Vi
  • 22. Aggregation = Replication Carlos Baquero Universidade do Minho & INESC TEC Further reference Search for blog post “Aggregation is not Replication” Nuno Pregui¸ca, Carlos Baquero, Marc Shapiro: Conflict-Free Replicated Data Types CRDTs. Encyclopedia of Big Data Technologies 2019 Paulo Jesus, Carlos Baquero, Paulo S´ergio Almeida: A Survey of Distributed Data Aggregation Algorithms. IEEE Communications Surveys and Tutorials 17(1): 381-404 (2015)