Blazes: coordination analysis
for distributed programs
Peter Alvaro,
Neil Conway, Joseph M. Hellerstein David Maier
UC Berkeley Portland State
Distributed systems are hard
Asynchrony Partial Failure
Asynchrony isn’t that hard
Logical timestamps
Deterministic interleaving
Ameloriation:
Partial failure isn’t that hard
Replication
Replay
Ameloriation:
Asynchrony * partial failure
is hard2
Logical timestamps
Deterministic interleaving
Replication
Replay
Asynchrony * partial failure
is hard2
Replication
Replay
Today:
Consistency criteria for fault-
tolerant distributed systems
Blazes: analysis and enforcement
This talk is all setup
Frame of mind:
1.  Dataflow: a model of distributed computation
2.  Anomalies: what can go wrong?
3.  Remediation strategies
1.  Component properties
2.  Delivery mechanisms
Framework:
Blazes – coordination analysis and synthesis
Little boxes: the dataflow model
Generalization of distributed services
Components interact via asynchronous calls
(streams)
Components
Input	
  interfaces	
   Output	
  interface	
  
Streams
Nondeterministic order
Example: a join operator
R
S
T
Example: a key/value store
put
get
response
Example: a pub/sub service
publish
subscribe
deliver
Logical dataflow
“Software architecture”
Data source
client
Service X filter cache
c
a
b
Dataflow is compositional
Data source
client
Service X filter aggregator
Dataflow is compositional
Components are recursively defined
Dataflow exhibits self-similarity
c
q r
Buffer
Buffer
group
/count
Dataflow exhibits self-similarity
DB	
   HDFS	
  
Hadoop	
  
Index	
  
Combine	
  
Sta:c	
  
HTTP	
  
App1	
  
App2	
  
Buy	
  
Content	
  
User	
  
requests	
  
App1	
  	
  
answers	
  
App2	
  
answers	
  
Physical dataflow
Physical dataflow
Data source
client
Service X filter aggregator
c
a
b
Physical dataflow
Data source
Service X filter
aggregator
client
“System architecture”
What could go wrong?
Cross-run nondeterminism
Data source
client
Service X filter aggregator
c
a
b
Run 1
Nondeterministic replays
Cross-run nondeterminism
Data source
client
Service X filter aggregator
c
a
b
Run 1
Nondeterministic replays
Cross-run nondeterminism
Data source
client
Service X filter aggregator
c
a
b
Nondeterministic replays
Run 2
Cross-run nondeterminism
Data source
client
Service X filter aggregator
c
a
b
Nondeterministic replays
Run 2
Cross-instance nondeterminism
Data	
  source	
  
Service	
  X	
  
client	
  
Transient replica disagreement
Cross-instance nondeterminism
Data	
  source	
  
Service	
  X	
  
client	
  
Transient replica disagreement
Divergence
Data	
  source	
  
Service	
  X	
  
client	
  
Permanent replica disagreement
Divergence
Data	
  source	
  
Service	
  X	
  
client	
  
Permanent replica disagreement
Divergence
Data	
  source	
  
Service	
  X	
  
client	
  
Permanent replica disagreement
Divergence
Data	
  source	
  
Service	
  X	
  
client	
  
Permanent replica disagreement
Hazards
Data	
  source	
  
client	
  
Service	
  X	
   filter	
   aggregator	
  
c	
  
a	
  
b	
  
Order à Contents?
Preventing the anomalies
1.  Understand component semantics
(And disallow certain compositions)
Component properties
•  Convergence
– Component replicas receiving the same
messages reach the same state
– Rules out divergence
Insert	
   Read	
  
Convergent
data structure
(e.g., Set CRDT)
Convergence
Insert	
   Read	
  
Commutativity
Associativity
Idempotence
Insert	
   Read	
  
Convergent
data structure
(e.g., Set CRDT)
Convergence
Insert	
   Read	
  
Commutativity
Associativity
Idempotence
Insert	
   Read	
  
Convergent
data structure
(e.g., Set CRDT)
Convergence
Insert	
   Read	
  
Commutativity
Associativity
Idempotence
Insert	
   Read	
  
Convergent
data structure
(e.g., Set CRDT)
Convergence
Insert	
   Read	
  
Commutativity
Associativity
Idempotence
Reordering
Batching
Retry/duplication
Tolerant to
Convergence isn’t compositional
Data	
  source	
  
client	
  
Convergent
(identical input contents è identical state)
Convergence isn’t compositional
Data	
  source	
  
client	
  
Convergent
(identical input contents è identical state)
Convergence isn’t compositional
Data	
  source	
  
client	
  
Convergent
(identical input contents è identical state)
Component properties
•  Convergence
– Component replicas receiving the same
messages reach the same state
– Rules out divergence
•  Confluence
– Output streams have deterministic contents
– Rules out all stream anomalies
Confluent è convergent
Confluence
Confluence
Confluence
Confluence
Confluence
=	
  
Confluence
output	
  set	
  =	
  f(input	
  set)	
  	
  	
  
{	
  	
  	
  	
  	
  	
  	
  	
  }	
  
{	
  	
  	
  	
  	
  	
  	
  	
  }	
  
=	
  
Confluence is compositional
output	
  set	
  =	
  f	
  Ÿ	
  g(input	
  set)	
  	
  	
  
Confluence is compositional
output	
  set	
  =	
  f	
  Ÿ	
  g(input	
  set)	
  	
  	
  
Preventing the anomalies
1.  Understand component semantics
(And disallow certain compositions)
2.  Constrain message delivery orders
1.  Ordering
Ordering – global coordination
Determinis:c	
  
outputs	
  
Order-sensitive
Ordering – global coordination
Data	
  source	
  
client	
  
The first principle of successful scalability
is to batter the consistency mechanisms down to a minimum.
– James Hamilton
Preventing the anomalies
1.  Understand component semantics
(And disallow certain compositions)
2.  Constrain message delivery orders
1.  Ordering
2.  Barriers and sealing
Barriers – local coordination
Determinis:c	
  
outputs	
  
Data source
client
Order-sensitive
Barriers – local coordination
Data source
client
Sealing – continuous barriers
Do partitions of (infinite) input streams “end”?
Can components produce deterministic
results given “complete” input partitions?
Sealing: partition barriers for infinite streams
Sealing – continuous barriers
Finite partitions of infinite inputs are common
…in distributed systems
–  Sessions
–  Transactions
–  Epochs / views
…and applications
–  Auctions
–  Chats
–  Shopping carts
Blazes:
consistency analysis
+
coordination selection
Blazes:
Mode 1: Grey boxes
Grey boxes
Example: pub/sub
x = publish
y = subscribe
z = deliver
x	
  
y	
  
z	
  
Determinis:c	
  
but	
  unordered	
  
Severity Label Confluent Stateless
1 CR X X
2 CW X
3 ORgate X
4 OWgate
x->z : CW
y->z : CWT
Grey boxes
Example: key/value store
x = put; y = get;
z = response
x	
  
y	
  
z	
  
Determinis:c	
  
but	
  unordered	
  
Severity Label Confluent Stateless
1 CR X X
2 CW X
3 ORgate X
4 OWgate
x->z : OWkey
y->z : ORT
Label propagation –
confluent composition
CW	
   CR	
  
CR	
  
CR	
  
CR	
  
Label propagation –
confluent composition
CW	
   CR	
  
CR	
  
CR	
  
CR	
  
Determinis:c	
  
outputs	
  
Label propagation –
confluent composition
CW	
   CR	
  
CR	
  
CR	
  
CR	
  
Determinis:c	
  
outputs	
  
CW	
  
Label propagation –
unsafe composition
OW	
   CR	
  
CR	
  
CR	
  
CR	
  
Label propagation –
unsafe composition
OW	
   CR	
  
CR	
  
CR	
  
CR	
  
Tainted	
  
outputs	
  
Label propagation –
unsafe composition
OW	
   CR	
  
CR	
  
CR	
  
CR	
  
Tainted	
  
outputs	
  
Interposi:on	
  
point	
  
Label propagation –
sealing
OWkey	
   CR	
  
CR	
  
CR	
  
CR	
  
Seal(key=x)	
  
Seal(key=x)	
  
Label propagation –
sealing
OWkey	
   CR	
  
CR	
  
CR	
  
CR	
  
Determinis:c	
  
outputs	
  
Seal(key=x)	
  
Seal(key=x)	
  
Label propagation –
sealing
OWkey	
   CR	
  
CR	
  
CR	
  
CR	
  
Determinis:c	
  
outputs	
  
OWkey	
  
Seal(key=x)	
  
Seal(key=x)	
  
Blazes:
Mode 1: White boxes
white boxes
module KVS!
state do!
interface input, :put, [:key, :val]!
interface input, :get, [:ident, :key]!
interface output, :response, ! ! ! ! ! ! !!
! ! ! ![:response_id, :key, :val]!
table :log, [:key, :val]!
end!
bloom do!
log <+ put!
log <- (put * log).rights(:key => :key)!
response <= (log * get).pairs(:key=>:key) do |s,l| !
! ![l.ident, s.key, s.val]!
! end!
end!
end
white boxes
module KVS!
state do!
interface input, :put, [:key, :val]!
interface input, :get, [:ident, :key]!
interface output, :response, ! ! ! ! ! ! !!
! ! ! ![:response_id, :key, :val]!
table :log, [:key, :val]!
end!
bloom do!
log <+ put!
log <- (put * log).rights(:key => :key)!
response <= (log * get).pairs(:key=>:key) do |s,l| !
! ![l.ident, s.key, s.val]!
! end!
end!
end Negation (à order sensitive)
white boxes
module KVS!
state do!
interface input, :put, [:key, :val]!
interface input, :get, [:ident, :key]!
interface output, :response, ! ! ! ! ! ! !!
! ! ! ![:response_id, :key, :val]!
table :log, [:key, :val]!
end!
bloom do!
log <+ put!
log <- (put * log).rights(:key => :key)!
response <= (log * get).pairs(:key=>:key) do |s,l| !
! ![l.ident, s.key, s.val]!
! end!
end!
end Negation (à order sensitive)
Partitioned by :key
white boxes
module KVS!
state do!
interface input, :put, [:key, :val]!
interface input, :get, [:ident, :key]!
interface output, :response, ! ! ! ! ! ! !!
! ! ! ![:response_id, :key, :val]!
table :log, [:key, :val]!
end!
bloom do!
log <+ put!
log <- (put * log).rights(:key => :key)!
response <= (log * get).pairs(:key=>:key) do |s,l| !
! ![l.ident, s.key, s.val]!
! end!
end!
end
put	
  àresponse:	
  OWkey	
  
get	
  à	
  response:	
  ORkey	
  
Negation (à order sensitive)
Partitioned by :key
white boxes
module PubSub!
state do!
interface input, :publish, [:key, :val]!
interface input, :subscribe, [:ident, :key]!
interface output, :response, ! ! ! ! ! ! !!
! ! ! ![:response_id, :key, :val]!
table :log, [:key, :val]!
table :sub_log, [:ident, :key]!
end!
bloom do!
log <= publish!
!sub_log <= subscribe!
!response <= (log * sub_log).pairs(:key=>:key) do |s,l| !
! ![l.ident, s.key, s.val]!
! end!
end!
end
publish	
  à	
  response:	
  CW	
  
subscribe	
  à	
  response:	
  CR	
  
white boxes
module PubSub!
state do!
interface input, :publish, [:key, :val]!
interface input, :subscribe, [:ident, :key]!
interface output, :response, ! ! ! ! ! ! !!
! ! ! ![:response_id, :key, :val]!
table :log, [:key, :val]!
table :sub_log, [:ident, :key]!
end!
bloom do!
log <= publish!
!sub_log <= subscribe!
!response <= (log * sub_log).pairs(:key=>:key) do |s,l| !
! ![l.ident, s.key, s.val]!
! end!
end!
end
The Blazes frame of mind:
•  Asynchronous dataflow model
•  Focus on consistency of data in motion
– Component semantics
– Delivery mechanisms and costs
•  Automatic, minimal coordination
Queries?

Blazes: coordination analysis for distributed programs