Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Datanet
Distributed CRDT Data Synchronization
Ubiquitous Write Through Caches
• Anyone can cache any data in the
entire system
• Write locally, replicate async
• Confli...
Resolving Conflicts: CRDTs
• Conflict-free Replicated Data Types
• Non consensus based distributed algorithms,
for various...
CRDT Simple Example
• (Writer)A & (Writer)B start w/ X=5
• A increments X by 10
• B increments X by 20
• Both increments a...
Datanet CRDTs
• Datanet supports ONLY CRDTs
• CRDT Building Blocks:
1) Key-Value
2) Key-Sequence
Centralized Architecture
• Mesh of DataCenter-Clusters
• Writers (Agents) connect to single DC-
Cluster
• Enables GarbageC...
Mesh of DataCenters
DC1 DC2
DC3
AgentA
AgentB AgentC
AgentD
Mesh of Co-Locations
CoLo CoLo
CoLo
CoLo
CoLo CoLo
CoLo
Datanet Basics
• Document-Store
• Supports JSON
– Numbers, Strings, Dictionaries, & Sequences
• Data disseminated via Delt...
Algorithms
• CRDT algorithms utilize versioning to achieve
uncoordinated consistency
• Versioning systems:
– Agents
– Docu...
Agent UUID
• Agent initialization:
– Central assigns Agents a unique AgentUUID
• AgentUUIDs used in:
– DocumentUUIDs
– Fie...
Document UUID
• Document creation:
– Agents create a DocumentUUID
• Contents:
– AgentUUID
– LocalCounter
– CreationTimesta...
Field UUID
• Field creation:
– Agents create a FieldUUID
• Contents:
– AgentUUID
– LocalCounter
– CreationTimestamp
Delta UUID
• Delta generation:
– Agents create a DeltaUUID
• Contents:
– AgentUUID
– LocalPerDocumentVersion
Sequence Metadata
• Insert element into a sequence:
– Field has positional Metadata to
LeftHandNeighbor
• Contents:
– Left...
CRDT Algorithms
• Straightforward:
Document Create/Remove
Key-Value
• Complex:
Key-Sequence
Document Create & Remove
• Document overwrites resolved via LWW on
CreationTimestamp
• DocumentRemove references a
Documen...
Create Conflict
Create DocumentX
DUUID: 10001|23
TS: 1469940000
Create DocumentX
DUUID: 20002|111
TS: 1469940099
Document ...
Key-Value SET
• KV SET on field creation and overwrite
• Conflicting KV SETs are resolved using LWW on
CreationTimestamp
•...
KV-Set Conflict
SET X.logins = 7
FUUID: 10001|33
TS: 1469940200
SET X.logins = 1
FUUID: 20002|122
TS: 469940111
Field X.lo...
Key-Value INCREMENT
• Delta specifies incremented field and a
positive or negative increment value
• Increment FieldUUID m...
KV Increment Conflict
INCREMENT X.logins
ByValue: 11
FUUID: 10001|33
SET X.logins = 100
FUUID: 10001|55
TS: 1469940250
Fie...
Key-Value DELETE
• Delta specifies deleted field
• Since fields have unique FieldUUID’s, field can
be permanently removed ...
Causal Consistency
• Related to Vector Clocks
• Read and write operations that are causally
related are seen by every node...
Causal Consistency Example
• A Creates DocumentX
• B modifies DocumentX, sets field credits=200
• C modifies DocumentX, in...
Dependency Matrix
• Deltas contain a dependency matrix,
consisting of tuples:
[AgentUUID -> PerDocumentAgentVersion]
• Del...
Dependency Fail
INCREMENT X.credits 22
DUUID: 20002|155
SET X.credits = 200
DUUID: 10001|78
Agent 20002
Agent 10001
CENTRA...
Sequence Insert
• Insert an elements by specifying its
LeftHandNeighbor
• Sequenced elements contain FieldUUIDs plus
Posit...
Internal & External Sequences
1
|
2
|
+------+------+
B A 3
| | |
| +--+--+ |
| | | |
W X Y Z
1
2
B
W
A
X
Y
3
Z
Sequence Conflicts
• Sequence conflicts happen when different
agents concurrently insert an element w/ the
same LeftHandNe...
Sequence Conflict
INSERT
VALUE: A
LHN: 2
TS: 100
V: 1 LHN:  TS: 50
V: 2 LHN: 1 TS: 50
V: 3 LHN: 2 TS: 50
Agent 20002Agent...
Sequence Deletes
• Deleted sequence elements are tombstoned
• Subsequent Deltas referencing tombstoned
LHNs will be correc...
Sequence Delete
DELETE: 2
Agent 10001
V: 1 LHN: 
V: 2 LHN: 1
V: 3 LHN: 2
Sequence
V: 1 LHN: 
V: 2 LHN: 1
V: 3 LHN: 2
Seq...
Distributed Garbage Collection
• DataCenters gossip and elect PrimaryDC
• PrimaryDC drives GarbageCollection
• GC is perfo...
Apply GC Delta
• Both DC-Clusters & Agents apply GC-Deltas
• Apply steps:
– Permanently remove tombstones
– Reorder LeftHa...
GC Step
V: 1 LHN: 
V: 2 LHN: 1
V: 3 LHN: 2
GC-Delta GCV: 6
Remove: 2
Reorder: [3 LHN: 1]
Pre GC GCV:5
V: 1 LHN: 
V: 3 LH...
GC Race Conditions
• Agent creates Delta w/ GCV(X) & PrimaryDC
creates GC-Delta w/ GCV(X+1) at same time
• PrimaryDC recog...
Bi-Directional GC
• GC-steps are versioned and can be applied
both forwards and backwards
• Creating a ReorderDelta requir...
GC Rewind
V: 1 LHN: 
V: 2 LHN: 1
V: 3 LHN: 2
GC-Delta: GCV: 6
Remove: [2 LHN: 1]
Reorder: [3 LHN: 1]
Undo: [3 LHN: 2]
V: ...
Temporal Connectivity Issues
• Agent temporarily loses connectivity
• While offline Agent can still create Deltas
• Reconn...
Full Document Sync
V: 1 LHN: 
V: X LHN: 1
GCV:7
Remove: [2 LHN: 1]
Reorder: [3 LHN: 1]
Undo: [3 LHN: 2]
Remove: [3 LHN: 1...
Full Document Sync Math
• Rewind FullDocumentSync Data to Agent GCV
via GC-Bridge
• Apply Local Deltas to rewound document...
Result: Data - Bridge + LocalDelta
V: 1 LHN: 
V: X LHN: 1
GCV:7
Remove: [2 LHN: 1]
Reorder: [3 LHN: 1]
Undo: [3 LHN: 2]
R...
Agent: Result & GC-WAIT
GCV:5
Result
V: 1 LHN: 
V: 2 LHN: 1
V: C LHN: 2
V: 3 LHN: 2
V: X LHN: 3
GCV: 5
INSERT
V: C
LHN: 2...
Agent GC Wait
• Agent being behind in GCV is OK (tons of
PositionalMetadata)
• Central sends ReorderDeltas for AgentDeltas...
END GC-WAIT: Result+Bridge+Reorder
GCV:5
V: 1 LHN: 
V: 2 LHN: 1
V: C LHN: 2
V: 3 LHN: 2
V: X LHN: 3
Remove: [2 LHN: 1]
Re...
Replication as Routing
• Deltas are sent system-wide to all Subscribers
w/ no checks (think targeted broadcast)
• Minimal ...
End of Interesting Stuff 
Datanet Infrastructure
Datanet Access Privileges
• Data stored in documents
• Documents specify a channel
• Users R/W privileges on channels
Caching Replication
• Documents contain data
• Agents cache Documents
• Datanet insures all modifications (Datanet-
wide) ...
Pub/Sub Replication
• Users stationed on Agents
• Documents specify channel
• Users subscribe to channel
• Datanet insures...
Datanet
Agent 1
Cache Replication Flow
DocX
Agent 2
UserA modifies DocumentX on Agent1
Modification travels to Agent2 (als...
Datanet
Channel 1
Agent 1
PubSub Replication
UserA
Agent 2
UserA
Agent 3
UserA
Agent 10
UserB
Agent 11
UserB
UserA modifie...
Agent
• Agent resides on App-server, in browser,
or in a mobile app
Agent
DB
HTTPS
Client -> Agent
• Client libraries call into Agent to read/write data
• Agent has embedded DB (LMDB), communicates to
Cent...
Client -> Agent -> DC
Agent
DB
C++
HTTPS
Agent forwards Delta to DataCenterCluster
Client
Lua
ClusterNode
DB HTTPS
DC Cluster DB
Cluster
Node
DB
DataCenterClusters store data in pluggable DistributedDB
DB DB DB
Cluster
Node
Cluster
NodeH...
Client -> Agent -> DC -> Subscriber
Agent
DB
C++
Client
Lua
ClusterNode
DB
Subscriber
DB
DataCenterCluster sends Delta to ...
Client -> Agent -> DC -> Subscribers
Agent
DB
Client
ClusterNode DB
Subscriber
DB
Single Delta -> many subscribers/cachers...
Client -> Agent -> DCs-> Subscribers
Agent
DB
Client
ClusterNode DB
Delta is geo-replicated between DataCenterClusters
Clu...
EXTRA
Viral Commutative Replication
• Agent offline robustness
• Cluster-node failure robustness
• Datacenter failure robustness...
API & Data-model
• API: simple JSON
• Reads/Queries
• Advanced Data-structures
API: simple JSON
• Datanet’s API is JSON, it’s dead simple
• Single key isolation is provided
• Additional client librarie...
Reads/Queries in Datanet
• Datanet concerns itself only w/ data
modifications
• All writes go to pluggable databases
(curr...
Advanced Data Structures
• Datanet provides additional data
structures that require server-side
coordination to achieve co...
Upcoming SlideShare
Loading in …5
×

Datanet Tech Details (Talk CF 8/11/16)

404 views

Published on

Datanet tech details plus deep dive into Datanet's Distributed Garbage Collection

Published in: Software
  • Be the first to comment

  • Be the first to like this

Datanet Tech Details (Talk CF 8/11/16)

  1. 1. Datanet Distributed CRDT Data Synchronization
  2. 2. Ubiquitous Write Through Caches • Anyone can cache any data in the entire system • Write locally, replicate async • Conflicts are the norm
  3. 3. Resolving Conflicts: CRDTs • Conflict-free Replicated Data Types • Non consensus based distributed algorithms, for various data types, that automatically resolve conflicts resulting from concurrent writes
  4. 4. CRDT Simple Example • (Writer)A & (Writer)B start w/ X=5 • A increments X by 10 • B increments X by 20 • Both increments are performed (exactly-once, in arbitrary order) at both A & B • Consistent-Result: 35
  5. 5. Datanet CRDTs • Datanet supports ONLY CRDTs • CRDT Building Blocks: 1) Key-Value 2) Key-Sequence
  6. 6. Centralized Architecture • Mesh of DataCenter-Clusters • Writers (Agents) connect to single DC- Cluster • Enables GarbageCollection for Sequences
  7. 7. Mesh of DataCenters DC1 DC2 DC3 AgentA AgentB AgentC AgentD
  8. 8. Mesh of Co-Locations CoLo CoLo CoLo CoLo CoLo CoLo CoLo
  9. 9. Datanet Basics • Document-Store • Supports JSON – Numbers, Strings, Dictionaries, & Sequences • Data disseminated via Deltas
  10. 10. Algorithms • CRDT algorithms utilize versioning to achieve uncoordinated consistency • Versioning systems: – Agents – Documents – Fields – Deltas – GC-steps
  11. 11. Agent UUID • Agent initialization: – Central assigns Agents a unique AgentUUID • AgentUUIDs used in: – DocumentUUIDs – FieldUUIDs – DeltaUUIDs
  12. 12. Document UUID • Document creation: – Agents create a DocumentUUID • Contents: – AgentUUID – LocalCounter – CreationTimestamp
  13. 13. Field UUID • Field creation: – Agents create a FieldUUID • Contents: – AgentUUID – LocalCounter – CreationTimestamp
  14. 14. Delta UUID • Delta generation: – Agents create a DeltaUUID • Contents: – AgentUUID – LocalPerDocumentVersion
  15. 15. Sequence Metadata • Insert element into a sequence: – Field has positional Metadata to LeftHandNeighbor • Contents: – LeftHandNeighbor’s FieldUUID – CreationTimestamp* – Reordered-Ordering*
  16. 16. CRDT Algorithms • Straightforward: Document Create/Remove Key-Value • Complex: Key-Sequence
  17. 17. Document Create & Remove • Document overwrites resolved via LWW on CreationTimestamp • DocumentRemove references a DocumentUUID (Observed-Remove) • CRDT Register
  18. 18. Create Conflict Create DocumentX DUUID: 10001|23 TS: 1469940000 Create DocumentX DUUID: 20002|111 TS: 1469940099 Document X DUUID: 20002|111 TS: 1469940099 Agent 10001 Agent 20002
  19. 19. Key-Value SET • KV SET on field creation and overwrite • Conflicting KV SETs are resolved using LWW on CreationTimestamp • Same algorithm for nested fields • CRDT Register
  20. 20. KV-Set Conflict SET X.logins = 7 FUUID: 10001|33 TS: 1469940200 SET X.logins = 1 FUUID: 20002|122 TS: 469940111 Field X.logins = 7 FUUID: 10001|133 TS: 1469940200 Agent 10001 Agent 20002
  21. 21. Key-Value INCREMENT • Delta specifies incremented field and a positive or negative increment value • Increment FieldUUID must match local value (Observed-Increment) • CRDT PN-Counter
  22. 22. KV Increment Conflict INCREMENT X.logins ByValue: 11 FUUID: 10001|33 SET X.logins = 100 FUUID: 10001|55 TS: 1469940250 Field: X.logins FUUID: 10001|55 Value: 144 INCREMENT X.logins ByValue: 44 FUUID: 10001|55 Agent 20002 Agent 10001
  23. 23. Key-Value DELETE • Delta specifies deleted field • Since fields have unique FieldUUID’s, field can be permanently removed (no tombstone) • Subsequent creation of a field with same name will have a different FieldUUID • CRDT OR-Sets
  24. 24. Causal Consistency • Related to Vector Clocks • Read and write operations that are causally related are seen by every node of the distributed system in the same order
  25. 25. Causal Consistency Example • A Creates DocumentX • B modifies DocumentX, sets field credits=200 • C modifies DocumentX, increments field credits by 22 • Causal Order is A then B then C, any other ordering results in undefined behavior
  26. 26. Dependency Matrix • Deltas contain a dependency matrix, consisting of tuples: [AgentUUID -> PerDocumentAgentVersion] • Delta’s dependency matrix does not match locally applied state -> Queue Delta • Subsequent Deltas replay queued Deltas
  27. 27. Dependency Fail INCREMENT X.credits 22 DUUID: 20002|155 SET X.credits = 200 DUUID: 10001|78 Agent 20002 Agent 10001 CENTRAL Agent 30003 Dependencies Apply BLUE Then GREEN Dependencies 100001|78
  28. 28. Sequence Insert • Insert an elements by specifying its LeftHandNeighbor • Sequenced elements contain FieldUUIDs plus PositionalMetadata • Internal data-structure is a tree
  29. 29. Internal & External Sequences 1 | 2 | +------+------+ B A 3 | | | | +--+--+ | | | | | W X Y Z 1 2 B W A X Y 3 Z
  30. 30. Sequence Conflicts • Sequence conflicts happen when different agents concurrently insert an element w/ the same LeftHandNeighbor • Conflicting elements resolved via LWW- sorting*
  31. 31. Sequence Conflict INSERT VALUE: A LHN: 2 TS: 100 V: 1 LHN:  TS: 50 V: 2 LHN: 1 TS: 50 V: 3 LHN: 2 TS: 50 Agent 20002Agent 10001 INSERT VALUE: B LHN: 2 TS: 111 V: 1 LHN:  TS: 50 V: 2 LHN: 1 TS: 50 V: B LHN: 2 TS: 111 V: A LHN: 2 TS: 100 V: 3 LHN: 2 TS: 50 Sequence
  32. 32. Sequence Deletes • Deleted sequence elements are tombstoned • Subsequent Deltas referencing tombstoned LHNs will be correctly positioned • Tombstones -> Garbage Collection
  33. 33. Sequence Delete DELETE: 2 Agent 10001 V: 1 LHN:  V: 2 LHN: 1 V: 3 LHN: 2 Sequence V: 1 LHN:  V: 2 LHN: 1 V: 3 LHN: 2 Sequence
  34. 34. Distributed Garbage Collection • DataCenters gossip and elect PrimaryDC • PrimaryDC drives GarbageCollection • GC is performed via GC-Deltas • Every GC step -> PerDocument GC-Version
  35. 35. Apply GC Delta • Both DC-Clusters & Agents apply GC-Deltas • Apply steps: – Permanently remove tombstones – Reorder LeftHandNeighbors to maintain correct positioning
  36. 36. GC Step V: 1 LHN:  V: 2 LHN: 1 V: 3 LHN: 2 GC-Delta GCV: 6 Remove: 2 Reorder: [3 LHN: 1] Pre GC GCV:5 V: 1 LHN:  V: 3 LHN: 1 Post GC GCV:6
  37. 37. GC Race Conditions • Agent creates Delta w/ GCV(X) & PrimaryDC creates GC-Delta w/ GCV(X+1) at same time • PrimaryDC recognizes Agent’s Delta is behind CurrentGCV -> issues ReorderDelta • Subscriber’s receiving OOO-GCV Deltas will queue them and apply them together w/ ReorderDelta
  38. 38. Bi-Directional GC • GC-steps are versioned and can be applied both forwards and backwards • Creating a ReorderDelta requires – rewinding GCV – applying Delta – forwarding GCV
  39. 39. GC Rewind V: 1 LHN:  V: 2 LHN: 1 V: 3 LHN: 2 GC-Delta: GCV: 6 Remove: [2 LHN: 1] Reorder: [3 LHN: 1] Undo: [3 LHN: 2] V: 1 LHN:  V: 3 LHN: 1 Pre GC GCV:5 Post GC GCV:6
  40. 40. Temporal Connectivity Issues • Agent temporarily loses connectivity • While offline Agent can still create Deltas • Reconnect, Agent FullDocumentSyncs Documents externally modified while offline • FullDocumentSync includes GC-bridge
  41. 41. Full Document Sync V: 1 LHN:  V: X LHN: 1 GCV:7 Remove: [2 LHN: 1] Reorder: [3 LHN: 1] Undo: [3 LHN: 2] Remove: [3 LHN: 1] Reorder: [X LHN: 1] Undo: [X LHN: 3] Central Data Bridge GCV:6 GCV:7 V: 1 LHN:  V: 2 LHN: 1 V: C LHN: 2 V: 3 LHN: 2 GCV:5 GCV: 5 INSERT V: C LHN: 2 Delta Agent 10001 *INSERT [X LHN: 3] @GCV: 6 *INSERT [C LHN: 2] (while offline) *DELETE [3 LHN: 1] @GCV: 6
  42. 42. Full Document Sync Math • Rewind FullDocumentSync Data to Agent GCV via GC-Bridge • Apply Local Deltas to rewound document • Do not forward GCV until ReorderDeltas for LocalDeltas arrive (GC-WAIT)
  43. 43. Result: Data - Bridge + LocalDelta V: 1 LHN:  V: X LHN: 1 GCV:7 Remove: [2 LHN: 1] Reorder: [3 LHN: 1] Undo: [3 LHN: 2] Remove: [3 LHN: 1] Reorder: [X LHN: 1] Undo: [X LHN: 3] GCV:6 GCV:7 GCV: 5 INSERT V: C LHN: 2 DeltaData Bridge - +
  44. 44. Agent: Result & GC-WAIT GCV:5 Result V: 1 LHN:  V: 2 LHN: 1 V: C LHN: 2 V: 3 LHN: 2 V: X LHN: 3 GCV: 5 INSERT V: C LHN: 2 Delta GC-WAIT & Agent 10001
  45. 45. Agent GC Wait • Agent being behind in GCV is OK (tons of PositionalMetadata) • Central sends ReorderDeltas for AgentDeltas created while offline • End GC-WAIT: use GC-Bridge to forward GCV
  46. 46. END GC-WAIT: Result+Bridge+Reorder GCV:5 V: 1 LHN:  V: 2 LHN: 1 V: C LHN: 2 V: 3 LHN: 2 V: X LHN: 3 Remove: [2 LHN: 1] Reorder: [3 LHN: 1] Undo: [3 LHN: 2] Remove: [3 LHN: 1] Reorder: [X LHN: 1] Undo: [X LHN: 3] GCV:6 GCV:7 Bridge ++ Reorder: [C LHN: 1] Undo: [C LHN: 2] Reorder = GCV:7 V: 1 LHN:  V: C LHN: 1 V: X LHN: 1
  47. 47. Replication as Routing • Deltas are sent system-wide to all Subscribers w/ no checks (think targeted broadcast) • Minimal latency in replication • Made possible via CRDT algorithms • Agent to agent replication is p2p
  48. 48. End of Interesting Stuff 
  49. 49. Datanet Infrastructure
  50. 50. Datanet Access Privileges • Data stored in documents • Documents specify a channel • Users R/W privileges on channels
  51. 51. Caching Replication • Documents contain data • Agents cache Documents • Datanet insures all modifications (Datanet- wide) to Documents cached on an Agent are replicated to the Agent
  52. 52. Pub/Sub Replication • Users stationed on Agents • Documents specify channel • Users subscribe to channel • Datanet insures all modifications to Documents in channels User is subscribed to are replicated to all Agents where User is stationed
  53. 53. Datanet Agent 1 Cache Replication Flow DocX Agent 2 UserA modifies DocumentX on Agent1 Modification travels to Agent2 (also caching DocumentX) DocX
  54. 54. Datanet Channel 1 Agent 1 PubSub Replication UserA Agent 2 UserA Agent 3 UserA Agent 10 UserB Agent 11 UserB UserA modifies Document w/ Channel=1 on Agent1 Modification travels to all Agents (2&3) stationing UserA UserB (subscribed to Channel1) also receives the updates
  55. 55. Agent • Agent resides on App-server, in browser, or in a mobile app Agent DB HTTPS
  56. 56. Client -> Agent • Client libraries call into Agent to read/write data • Agent has embedded DB (LMDB), communicates to Central via HTTPS Agent DB C++ HTTPS Client Lua
  57. 57. Client -> Agent -> DC Agent DB C++ HTTPS Agent forwards Delta to DataCenterCluster Client Lua ClusterNode DB HTTPS
  58. 58. DC Cluster DB Cluster Node DB DataCenterClusters store data in pluggable DistributedDB DB DB DB Cluster Node Cluster NodeHTTPS HTTPS HTTPS
  59. 59. Client -> Agent -> DC -> Subscriber Agent DB C++ Client Lua ClusterNode DB Subscriber DB DataCenterCluster sends Delta to Subscriber HTTPS HTTPS HTTPS
  60. 60. Client -> Agent -> DC -> Subscribers Agent DB Client ClusterNode DB Subscriber DB Single Delta -> many subscribers/cachers Subscriber DB Cacher DB
  61. 61. Client -> Agent -> DCs-> Subscribers Agent DB Client ClusterNode DB Delta is geo-replicated between DataCenterClusters ClusterNode DB Subscriber DB Subscriber DB
  62. 62. EXTRA
  63. 63. Viral Commutative Replication • Agent offline robustness • Cluster-node failure robustness • Datacenter failure robustness • Virality of replication • Low global replication latency
  64. 64. API & Data-model • API: simple JSON • Reads/Queries • Advanced Data-structures
  65. 65. API: simple JSON • Datanet’s API is JSON, it’s dead simple • Single key isolation is provided • Additional client libraries fairly easy to add Browser (JS), Node.js, & OpenResty (Lua) currently available, iOS & Android coming soon
  66. 66. Reads/Queries in Datanet • Datanet concerns itself only w/ data modifications • All writes go to pluggable databases (currently MongoDB, Redis, Memory, SQLITE, LMDB, ngx.shared.DICT, LocalStorage are supported)
  67. 67. Advanced Data Structures • Datanet provides additional data structures that require server-side coordination to achieve correctness • Currently ORDERED-LISTs, CAPPED- LISTs, & LARGE-LISTS (experimental) are supported

×