Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Kite: efficient and available release consistency for the datacenter

31 views

Published on

Slides presented at PPoPP'20

Published in: Software
  • Be the first to comment

Kite: efficient and available release consistency for the datacenter

  1. 1. Kite: Efficient and Available Release Consistency for the Datacenter Vasilis Gavrielatos, Antonios Katsarakis, Vijay Nagarajan, Boris Grot, Arpit Joshi* University of Edinburgh, *Intel Thanks to:
  2. 2. Key-Value Stores Replicated KVS Characteristics ● Read-Write-RMW API ● Highly Available 2
  3. 3. Key-Value Stores Replicated KVS Availability ≅ Nonblocking Characteristics ● Read-Write-RMW API ● Highly Available 3
  4. 4. Key-Value Stores Replicated KVS Characteristics ● Read-Write-RMW API ● Highly Available ○ Replicated for fault tolerance 4
  5. 5. Key-Value Stores Replicated KVS Replication ⇒ Performance vs Consistency Characteristics ● Read-Write-RMW API ● Highly Available ○ Replicated for fault tolerance 5
  6. 6. Performance Programmability Weak Consistency Strong Consistency Consistency vs Performance 6
  7. 7. Performance Programmability Weak Consistency Strong Consistency ??? Consistency vs Performance 7
  8. 8. Existing Solution: Multiple Consistency Levels (MCL) MCL Replicated KVS Weak Write Strong ReadAmazon DB App Engine PNUTS Manhattan Pileus 8
  9. 9. Amazon DB App Engine PNUTS Manhattan Pileus Existing Solution: Multiple Consistency Levels (MCL) MCL Replicated KVS What about programming patterns? Weak Write Strong Read 9
  10. 10. The problem MCL Replicated KVS Alice 10
  11. 11. The problem MCL Replicated KVS Alice void CreatePlayer( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); /// Player created Write(player_created = true); } 11
  12. 12. The problem MCL Replicated KVS Alice If you can read the flag, then you must be able to see the player! void CreatePlayer( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); /// Player created Write(player_created = true); } “the flag” 12
  13. 13. The problem MCL Replicated KVS void CreatePlayer( ) { Write(name = “Leo”); Write(surname = “Messi”); Write(age = 32); /// Player created Write(player_created = true); } Alice Fine by me! 13
  14. 14. The problem MCL Replicated KVS void CreatePlayer( ) { Write(name = “Leo”); Write(surname = “Messi”); /// Player created Write(player_created = true); Write(age = 32); } Alice No way! 14
  15. 15. The problem MCL Replicated KVS void CreatePlayer( ) { Write(name = “Leo”); Write(surname = “Messi”); /// Player created Write(player_created = true); Write(age = 32); } Alice There is no way to capture this with MCLs! No way! 15
  16. 16. MCL Replicated KVS Alice MCL solution void CreatePlayer( ) { Strong_Write(age = 32); Strong_Write(name = “Leo”); Strong_Write(surname = “Messi”); /// Player created Strong_Write(player_created = true); } 16
  17. 17. MCL Replicated KVS Alice Seems like an overkill... MCL solution void CreatePlayer( ) { Strong_Write(age = 32); Strong_Write(name = “Leo”); Strong_Write(surname = “Messi”); /// Player created Strong_Write(player_created = true); } Missed performance opportunity 17
  18. 18. Shared Memory World Sweet-spot in the Performance-vs-Consistency? 18
  19. 19. Sweet-spot in the Performance-vs-Consistency? Shared Memory World DRF-SC! 19
  20. 20. Programming paradigm: DRF-SC Alice Multiprocessor 20
  21. 21. DRF-SC Programming Paradigm Alice Programming paradigm: DRF-SC Multiprocessor 21
  22. 22. DRF-SC Programming Paradigm Alice Programming paradigm: DRF-SC Multiprocessor Annotated synchronization ⇒ SC 22
  23. 23. Under the hood: Release Consistency Alice Releaseaa Consistency DRF-compliant memory model DRF-SC Programming Paradigm Multiprocessor 23
  24. 24. Under the hood: Release Consistency Alice void CreatePlayer( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); /// Player created Release(player_created = true); } Multiprocessor 24
  25. 25. Under the hood: Release Consistency Alice Multiprocessor void CreatePlayer( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); /// Player created Release(player_created = true); } Invariant: Writes appear to complete before the Release 25
  26. 26. Under the hood: Release Consistency Alice void ReadPlayer( ) { Acquire(player_created); Read(age); Read(name); Read(surname); } Bob Multiprocessor 26
  27. 27. Under the hood: Release Consistency Alice Multiprocessor void ReadPlayer( ) { Acquire(player_created); Read(age); Read(name); Read(surname); } Invariant: Reads appear to complete after the Acquire Bob 27
  28. 28. RC Semantics RC API Ordering Read & Write none Acquire acquire ⇒ all Release all ⇒ release 28
  29. 29. Alice Release Consistency DRF-compliant memory model DRF-SC Programming Paradigm Can we do the same for KVSes? Replicated KVS 29
  30. 30. Kite Kite Replicated KVS A Replicated KVS with ➢ Release Consistency ➢ High Availability 30
  31. 31. Our approach to building Kite Steps 1 API mappings 2 31
  32. 32. Our approach to building Kite Steps 1 API mappings 2 RC Semantics 32
  33. 33. Kite: API - Mappings API Protocol Reads Writes Acquire Releases Read-Modify-Writes (RMWs) 33
  34. 34. Kite: API - Mappings API Protocol Reads Writes 34
  35. 35. Kite: API - Mappings API Protocol Overhead Consistency Availability Reads Zero Eventual Consistency High Writes 1 Broadcast 35
  36. 36. Kite: API - Mappings API Protocol Overhead Consistency Availability Reads Eventual Store Zero Eventual Consistency High Writes 1 Broadcast * * Sebastian Burckhardt. 2014. Principles of Eventual Consistency 36
  37. 37. Kite: API - Mappings API Protocol Overhead Consistency Availability Reads Eventual Store Zero Eventual Consistency High Writes 1 Broadcast Acquire Local Linearizability High Releases 1 Broadcast 37
  38. 38. Kite: API - Mappings API Protocol Overhead Consistency Availability Reads Eventual Store Zero Eventual Consistency High Writes 1 Broadcast Acquire ABD* 1 Broadcast* Linearizability High Releases 2 Broadcasts *N. A. Lynch and A. A. Shvartsman. 1997. Robust emulation of shared memory using dynamic quorum-acknowledged broadcasts 38
  39. 39. Kite: API - Mappings API Protocol Overhead Consistency Availability Reads Eventual Store Zero Eventual Consistency High Writes 1 Broadcast Acquire ABD 1 Broadcast* Linearizability High Releases 2 Broadcasts Read-Modify-Writes (RMWs) 1 Broadcast Consensus High 39
  40. 40. Kite: API - Mappings API Protocol Overhead Consistency Availability Reads Eventual Store Zero Eventual Consistency High Writes 1 Broadcast Acquire ABD 1 Broadcast* Linearizability High Releases 2 Broadcasts Read-Modify-Writes (RMWs) Paxos* 3 Broadcasts Consensus High *Leslie Lamport. 1998. The part-time parliament. 40
  41. 41. Kite: API - Mappings API Overhead Protocol Reads Zero Eventual Store Writes 1 Broadcast Acquire 1 Broadcast* ABD Releases 2 Broadcasts Read-Modify-Writes (RMWs) 3 Broadcasts Paxos Common Case Synchronization Heavy Synchronization 41
  42. 42. Our approach to building Kite Steps 1 API mappings 2 RC Semantics 42
  43. 43. Kite: Fast-path/Slow-path RC API Ordering Read & Write none Acquire acquire ⇒ all Release all ⇒ release RC Semantics
  44. 44. Kite: Fast-path/Slow-path Fast-Path Common operation RC API Ordering Read & Write none Acquire acquire ⇒ all Release all ⇒ release RC Semantics
  45. 45. Kite: Fast-path/Slow-path Fast-Path Slow-Path When slow Common operation RC API Ordering Read & Write none Acquire acquire ⇒ all Release all ⇒ release RC Semantics
  46. 46. Fast-path Alice Bob void CreatePlayer( )( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); Release(player_created = true); } 46 void ReadPlayer( ) { Acquire(player_created); Read(age); Read(name); Read(surname); }
  47. 47. Fast-path Alice Bob (age = 32, name = “Leo”, ….) void CreatePlayer( )( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); Release(player_created = true); } 47
  48. 48. Fast-path Alice Bob ←(ack) ←(ack) ←(ack) ←(ack) void CreatePlayer( )( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); Release(player_created = true); } 48
  49. 49. Alice Bob (Release (player_created = true)) void CreatePlayer( )( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); Release(player_created = true); } 49 Fast-path
  50. 50. Alice Bob (Release (player_created = true)) Before a release, gather all acks for prior writes void CreatePlayer( )( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); Release(player_created = true); } 50 Fast-path
  51. 51. Alice Bob void ReadPlayer( ) { Acquire(player_created); Read(age); Read(name); Read(surname); } 51 Fast-path
  52. 52. Alice Bob (Acquire (player_created)) void ReadPlayer( ) { Acquire(player_created); Read(age); Read(name); Read(surname); } 52 Fast-path
  53. 53. Alice Bob (true)→ (true)→ (true )→ (true) → void ReadPlayer( ) { Acquire(player_created); Read(age); Read(name); Read(surname); } 53 Fast-path
  54. 54. Alice Bob void ReadPlayer( ) { Acquire(player_created); Read(age); Read(name); Read(surname); } Local Reads 54 Fast-path
  55. 55. Alice Bob What if we cannot gather all acks before a release? void ReadPlayer( ) { Acquire(player_created); Read(age); Read(name); Read(surname); } Local Reads 55 Fast-path
  56. 56. Alice Bob Fast-path ⇒ Slow-path 56
  57. 57. Alice Bob void CreatePlayer( )( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); Release(player_created = true); } 57 Fast-path ⇒ Slow-path
  58. 58. Alice Bob (age = 32, name = “Leo”, ….) void CreatePlayer( )( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); Release(player_created = true); } 58 Fast-path ⇒ Slow-path
  59. 59. Alice Bob ←(ack) ←(ack) ←(ack) void CreatePlayer( )( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); Release(player_created = true); } 59 Fast-path ⇒ Slow-path
  60. 60. Alice Bob void CreatePlayer( )( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); Release(player_created = true); } 60 Fast-path ⇒ Slow-path
  61. 61. Alice Bob (Node-5 is delinquent!) void CreatePlayer( )( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); Release(player_created = true); } 61 Fast-path ⇒ Slow-path
  62. 62. Alice Bob ←(ack) ←(ack) ←(ack) 5 = delinquent 5 = delinquent 5 = delinquent void CreatePlayer( )( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); Release(player_created = true); } 62 Fast-path ⇒ Slow-path
  63. 63. Alice Bob (Release (player_created = true)) void CreatePlayer( )( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); Release(player_created = true); } 5 = delinquent 5 = delinquent 5 = delinquent 63 Fast-path ⇒ Slow-path
  64. 64. Alice Bob (Acquire (player_created)) 5 = delinquent 5 = delinquent 5 = delinquent5 = delinquent 5 = delinquent 5 = delinquent void ReadPlayer( ) { Acquire(player_created); Read(age); Read(name); Read(surname); } 64 Fast-path ⇒ Slow-path
  65. 65. Alice Bob (true, delinquent )→ (true, delinquent )→ (true, delinquent ) → 5 = delinquent 5 = delinquent 5 = delinquent5 = delinquent 5 = delinquent 5 = delinquent void ReadPlayer( ) { Acquire(player_created); Read(age); Read(name); Read(surname); } 65 Fast-path ⇒ Slow-path
  66. 66. Alice Bob (Reset Delinquency) Slow-path I am delinquent!5 = delinquent 5 = delinquent 5 = delinquent5 = delinquent 5 = delinquent 5 = delinquent void ReadPlayer( ) { Acquire(player_created); Read(age); Read(name); Read(surname); } 66
  67. 67. Alice Bob (Read age, name, ...) void ReadPlayer( ) { Acquire(player_created); Read(age); Read(name); Read(surname); } I am delinquent! 67 Slow-path ⇒ Fast-path
  68. 68. Alice Bob (32, “Leo”, ...)→ (32, “Leo”, ...)→ (32, “Leo”, ...)→ void ReadPlayer( ) { Acquire(player_created); Read(age); Read(name); Read(surname); } I am delinquent! 68 Slow-path ⇒ Fast-path
  69. 69. Alice Bob I am not delinquent anymore! void ReadPlayer( ) { Acquire(player_created); Read(age); Read(name); Read(surname); } 69 Fast-path
  70. 70. Kite: Fast-path/Slow-path Recap Fast path / Slow path mechanism Before a Release Gather all acks ➢ On timing-out, broadcast the delinquent machines On an Acquire Slow-path read / write 70
  71. 71. Fast path / Slow path mechanism Before a Release Gather all acks ➢ On timing-out, broadcast the delinquent machines On an Acquire Discover delinquency ➢ Slow-path if delinquent Slow-path read / write Kite: Fast-path/Slow-path Recap 71
  72. 72. Fast path / Slow path mechanism Before a Release Gather all acks ➢ On timing-out, broadcast the delinquent machines On an Acquire Discover delinquency ➢ Slow-path if delinquent Slow-path read / write ➢ Add broadcast round ➢ Restore key to fast-path Kite: Fast-path/Slow-path Recap 72
  73. 73. Kite’s Implementation ● RDMA-enabled ● Multi-threaded ● Asynchronous API Infrastructure: ● Servers: 5 x (Intel Xeon E5-2630v4) with 64GB memory ● Network: 5 x 56 Gbit/s Infiniband NICs — 1 x 12-port Infiniband switch Baseline: ● In-house ZAB implementation ○ RDMA-enabled, multi-threaded Workloads: 1. Microbenchmarks 2. Lock-free data structures Experimental Setup 73
  74. 74. Microbenchmarks 74
  75. 75. Microbenchmarks 75
  76. 76. Microbenchmarks 76
  77. 77. Kite Microbenchmarks 77
  78. 78. 5% Sync Microbenchmarks 78
  79. 79. 20% Sync & 5% RMW5% Sync Microbenchmarks 79
  80. 80. 20% Sync & 5% RMW5% Sync Microbenchmarks 80
  81. 81. Lock-free Data Structures 81 OperationspersecondnormalizedtoZAB
  82. 82. Kite: a replicated Key-Value Store with ● High availability & ● Release Consistency Components: ● API mappings: Eventual Store, ABD & Paxos ● RC barrier semantics: Fast / Slow path ○ paper contains proof Implementation features: ● Heavily multi-threaded ● RDMA-enabled ● Asynchronous API ● https://github.com/icsa-caps/Kite Conclusion Kite Replicated KVS RDMA 82
  83. 83. Kite: a replicated Key-Value Store with ● High availability & ● Release Consistency Components: ● API mappings: Eventual Store, ABD & Paxos ● RC barrier semantics: Fast / Slow path ○ paper contains proof Implementation features: ● Heavily multi-threaded ● RDMA-enabled ● Asynchronous API ● https://github.com/icsa-caps/Kite Conclusion Kite Replicated KVS RDMA Thank you! Questions? 83
  84. 84. Back-up slides
  85. 85. 86
  86. 86. 87
  87. 87. Running Code on Kite 88
  88. 88. Write-only Throughput 89
  89. 89. Write-only Throughput with All-Aboard 90

×