Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Blazes: coordination analysis for distributed programs

15,326 views

Published on

Slides from ICDE'14

Published in: Technology

Blazes: coordination analysis for distributed programs

  1. 1. Blazes: coordination analysis for distributed programs Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State
  2. 2. Distributed systems are hard Asynchrony Partial Failure
  3. 3. Asynchrony isn’t that hard Logical timestamps Deterministic interleaving Ameloriation:
  4. 4. Partial failure isn’t that hard Replication Replay Ameloriation:
  5. 5. Asynchrony * partial failure is hard2 Logical timestamps Deterministic interleaving Replication Replay
  6. 6. Asynchrony * partial failure is hard2 Replication Replay Today: Consistency criteria for fault- tolerant distributed systems Blazes: analysis and enforcement
  7. 7. This talk is all setup Frame of mind: 1.  Dataflow: a model of distributed computation 2.  Anomalies: what can go wrong? 3.  Remediation strategies 1.  Component properties 2.  Delivery mechanisms Framework: Blazes – coordination analysis and synthesis
  8. 8. Little boxes: the dataflow model Generalization of distributed services Components interact via asynchronous calls (streams)
  9. 9. Components Input  interfaces   Output  interface  
  10. 10. Streams Nondeterministic order
  11. 11. Example: a join operator R S T
  12. 12. Example: a key/value store put get response
  13. 13. Example: a pub/sub service publish subscribe deliver
  14. 14. Logical dataflow “Software architecture” Data source client Service X filter cache c a b
  15. 15. Dataflow is compositional Data source client Service X filter aggregator
  16. 16. Dataflow is compositional Components are recursively defined
  17. 17. Dataflow exhibits self-similarity c q r Buffer Buffer group /count
  18. 18. Dataflow exhibits self-similarity DB   HDFS   Hadoop   Index   Combine   Sta:c   HTTP   App1   App2   Buy   Content   User   requests   App1     answers   App2   answers  
  19. 19. Physical dataflow
  20. 20. Physical dataflow Data source client Service X filter aggregator c a b
  21. 21. Physical dataflow Data source Service X filter aggregator client “System architecture”
  22. 22. What could go wrong?
  23. 23. Cross-run nondeterminism Data source client Service X filter aggregator c a b Run 1 Nondeterministic replays
  24. 24. Cross-run nondeterminism Data source client Service X filter aggregator c a b Run 1 Nondeterministic replays
  25. 25. Cross-run nondeterminism Data source client Service X filter aggregator c a b Nondeterministic replays Run 2
  26. 26. Cross-run nondeterminism Data source client Service X filter aggregator c a b Nondeterministic replays Run 2
  27. 27. Cross-instance nondeterminism Data  source   Service  X   client   Transient replica disagreement
  28. 28. Cross-instance nondeterminism Data  source   Service  X   client   Transient replica disagreement
  29. 29. Divergence Data  source   Service  X   client   Permanent replica disagreement
  30. 30. Divergence Data  source   Service  X   client   Permanent replica disagreement
  31. 31. Divergence Data  source   Service  X   client   Permanent replica disagreement
  32. 32. Divergence Data  source   Service  X   client   Permanent replica disagreement
  33. 33. Hazards Data  source   client   Service  X   filter   aggregator   c   a   b   Order à Contents?
  34. 34. Preventing the anomalies 1.  Understand component semantics (And disallow certain compositions)
  35. 35. Component properties •  Convergence – Component replicas receiving the same messages reach the same state – Rules out divergence
  36. 36. Insert   Read   Convergent data structure (e.g., Set CRDT) Convergence Insert   Read   Commutativity Associativity Idempotence
  37. 37. Insert   Read   Convergent data structure (e.g., Set CRDT) Convergence Insert   Read   Commutativity Associativity Idempotence
  38. 38. Insert   Read   Convergent data structure (e.g., Set CRDT) Convergence Insert   Read   Commutativity Associativity Idempotence
  39. 39. Insert   Read   Convergent data structure (e.g., Set CRDT) Convergence Insert   Read   Commutativity Associativity Idempotence Reordering Batching Retry/duplication Tolerant to
  40. 40. Convergence isn’t compositional Data  source   client   Convergent (identical input contents è identical state)
  41. 41. Convergence isn’t compositional Data  source   client   Convergent (identical input contents è identical state)
  42. 42. Convergence isn’t compositional Data  source   client   Convergent (identical input contents è identical state)
  43. 43. Component properties •  Convergence – Component replicas receiving the same messages reach the same state – Rules out divergence •  Confluence – Output streams have deterministic contents – Rules out all stream anomalies Confluent è convergent
  44. 44. Confluence
  45. 45. Confluence
  46. 46. Confluence
  47. 47. Confluence
  48. 48. Confluence =  
  49. 49. Confluence output  set  =  f(input  set)       {                }   {                }   =  
  50. 50. Confluence is compositional output  set  =  f  Ÿ  g(input  set)      
  51. 51. Confluence is compositional output  set  =  f  Ÿ  g(input  set)      
  52. 52. Preventing the anomalies 1.  Understand component semantics (And disallow certain compositions) 2.  Constrain message delivery orders 1.  Ordering
  53. 53. Ordering – global coordination Determinis:c   outputs   Order-sensitive
  54. 54. Ordering – global coordination Data  source   client   The first principle of successful scalability is to batter the consistency mechanisms down to a minimum. – James Hamilton
  55. 55. Preventing the anomalies 1.  Understand component semantics (And disallow certain compositions) 2.  Constrain message delivery orders 1.  Ordering 2.  Barriers and sealing
  56. 56. Barriers – local coordination Determinis:c   outputs   Data source client Order-sensitive
  57. 57. Barriers – local coordination Data source client
  58. 58. Sealing – continuous barriers Do partitions of (infinite) input streams “end”? Can components produce deterministic results given “complete” input partitions? Sealing: partition barriers for infinite streams
  59. 59. Sealing – continuous barriers Finite partitions of infinite inputs are common …in distributed systems –  Sessions –  Transactions –  Epochs / views …and applications –  Auctions –  Chats –  Shopping carts
  60. 60. Blazes: consistency analysis + coordination selection
  61. 61. Blazes: Mode 1: Grey boxes
  62. 62. Grey boxes Example: pub/sub x = publish y = subscribe z = deliver x   y   z   Determinis:c   but  unordered   Severity Label Confluent Stateless 1 CR X X 2 CW X 3 ORgate X 4 OWgate x->z : CW y->z : CWT
  63. 63. Grey boxes Example: key/value store x = put; y = get; z = response x   y   z   Determinis:c   but  unordered   Severity Label Confluent Stateless 1 CR X X 2 CW X 3 ORgate X 4 OWgate x->z : OWkey y->z : ORT
  64. 64. Label propagation – confluent composition CW   CR   CR   CR   CR  
  65. 65. Label propagation – confluent composition CW   CR   CR   CR   CR   Determinis:c   outputs  
  66. 66. Label propagation – confluent composition CW   CR   CR   CR   CR   Determinis:c   outputs   CW  
  67. 67. Label propagation – unsafe composition OW   CR   CR   CR   CR  
  68. 68. Label propagation – unsafe composition OW   CR   CR   CR   CR   Tainted   outputs  
  69. 69. Label propagation – unsafe composition OW   CR   CR   CR   CR   Tainted   outputs   Interposi:on   point  
  70. 70. Label propagation – sealing OWkey   CR   CR   CR   CR   Seal(key=x)   Seal(key=x)  
  71. 71. Label propagation – sealing OWkey   CR   CR   CR   CR   Determinis:c   outputs   Seal(key=x)   Seal(key=x)  
  72. 72. Label propagation – sealing OWkey   CR   CR   CR   CR   Determinis:c   outputs   OWkey   Seal(key=x)   Seal(key=x)  
  73. 73. Blazes: Mode 1: White boxes
  74. 74. white boxes module KVS! state do! interface input, :put, [:key, :val]! interface input, :get, [:ident, :key]! interface output, :response, ! ! ! ! ! ! !! ! ! ! ![:response_id, :key, :val]! table :log, [:key, :val]! end! bloom do! log <+ put! log <- (put * log).rights(:key => :key)! response <= (log * get).pairs(:key=>:key) do |s,l| ! ! ![l.ident, s.key, s.val]! ! end! end! end
  75. 75. white boxes module KVS! state do! interface input, :put, [:key, :val]! interface input, :get, [:ident, :key]! interface output, :response, ! ! ! ! ! ! !! ! ! ! ![:response_id, :key, :val]! table :log, [:key, :val]! end! bloom do! log <+ put! log <- (put * log).rights(:key => :key)! response <= (log * get).pairs(:key=>:key) do |s,l| ! ! ![l.ident, s.key, s.val]! ! end! end! end Negation (à order sensitive)
  76. 76. white boxes module KVS! state do! interface input, :put, [:key, :val]! interface input, :get, [:ident, :key]! interface output, :response, ! ! ! ! ! ! !! ! ! ! ![:response_id, :key, :val]! table :log, [:key, :val]! end! bloom do! log <+ put! log <- (put * log).rights(:key => :key)! response <= (log * get).pairs(:key=>:key) do |s,l| ! ! ![l.ident, s.key, s.val]! ! end! end! end Negation (à order sensitive) Partitioned by :key
  77. 77. white boxes module KVS! state do! interface input, :put, [:key, :val]! interface input, :get, [:ident, :key]! interface output, :response, ! ! ! ! ! ! !! ! ! ! ![:response_id, :key, :val]! table :log, [:key, :val]! end! bloom do! log <+ put! log <- (put * log).rights(:key => :key)! response <= (log * get).pairs(:key=>:key) do |s,l| ! ! ![l.ident, s.key, s.val]! ! end! end! end put  àresponse:  OWkey   get  à  response:  ORkey   Negation (à order sensitive) Partitioned by :key
  78. 78. white boxes module PubSub! state do! interface input, :publish, [:key, :val]! interface input, :subscribe, [:ident, :key]! interface output, :response, ! ! ! ! ! ! !! ! ! ! ![:response_id, :key, :val]! table :log, [:key, :val]! table :sub_log, [:ident, :key]! end! bloom do! log <= publish! !sub_log <= subscribe! !response <= (log * sub_log).pairs(:key=>:key) do |s,l| ! ! ![l.ident, s.key, s.val]! ! end! end! end publish  à  response:  CW   subscribe  à  response:  CR  
  79. 79. white boxes module PubSub! state do! interface input, :publish, [:key, :val]! interface input, :subscribe, [:ident, :key]! interface output, :response, ! ! ! ! ! ! !! ! ! ! ![:response_id, :key, :val]! table :log, [:key, :val]! table :sub_log, [:ident, :key]! end! bloom do! log <= publish! !sub_log <= subscribe! !response <= (log * sub_log).pairs(:key=>:key) do |s,l| ! ! ![l.ident, s.key, s.val]! ! end! end! end
  80. 80. The Blazes frame of mind: •  Asynchronous dataflow model •  Focus on consistency of data in motion – Component semantics – Delivery mechanisms and costs •  Automatic, minimal coordination
  81. 81. Queries?

×