Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Cross Region
Data Replication
Design Considerations
Itai Friendinger itai@forter.com
Our financial institutions remain strong, and the American
economy will be open for business as well.
2/40
TX Fraud
Decision
100ms
Decision as a Service Example
if isFraud(tx.address,tx.payment) {
return DECLINE;
} else {
return ...
Event Processor
1000ms
Change Account Address
Change Account Payment
Unified People Store
TX
partial update
read
Decision ...
Design ‫בסדר‬ ‫יהיה‬
TX Fraud
Decision
TX Decision
Event
Processor
People
Store
raw event
● No Cross Region Replication
5/...
Design ‫עליי‬
● Cron Sync every 3 hours
● Replication != Reconciliation
● Replication != Backup
TX Fraud
Decision
Event
Pr...
● Read-Only RDS Replica
● Proxying data into a single Data Center
● Requires quarterly failover drills
● Cannot stand a re...
Design ‫אחד‬ ‫במחיר‬ ‫שניים‬
● CloudEndure DRaaS
● Point In Time Recovery
● Requires quarterly failover drills
● For exist...
Design ‫חכה‬ ‫חכה‬
● Google Cloud Spanner Is Here
Geo Distributed Transactions Is Coming
● For green-field apps (Startups)...
Design ‫סמוך‬
● Out-Of-The-Box
Real-Time
Bi-Directional
Data-Center Aware
Replication
● Write Conflict resolution
TX Fraud...
Design ‫שלה‬ ‫אחות‬
● Replication of Raw Events
● State Divergence
TX Fraud
Decision
TX Decision
Event
Processor
People
St...
Read Consistency Guarantees
Loosely based on Consistency Explained Through Baseball by Doug Terry
● Strong ⇒ 2:2
○ See all...
Hello Couchbase
read-mutate-write of entire state
Client reaches cluster’s primary node
Conflict Prevention CAS
Optimizati...
Hello Couchbase
XDCR replicates entire state between clusters
Optimizations: dedup by key, metadata first
Strong
Monotonic...
Couchbase Last Write Wins
Conflict Resolution - LWW erases losing side
Remember: NTP, no “sudo date”
Document Version =
re...
Hello Cassandra
node
us-west-2b
node
us-west-2a
node
us-west-2c
Event Processor
(partial update)
node
us-east-1b
TX Decisi...
Cassandra Last Write Wins per Column
Two clients update payment and address
of same person with exactly same client timest...
Cassandra Multi Value per Column
Update different columns of same person
Conflict resolution in TX Decision (on read)
(?) ...
Kafka
Kafka
us-west-2
Event Source
(insert)
Kafka
us-east-1
TX Decision
(read)
Event
Processor
Event
Processor
S3 versione...
Converging events into state
● Duplicate events
○ Idempotent compare-and-set(x, 2, 5)
○ De-duplication 2 +3 +3 = 5
○ Rollb...
Kafka Streams API - zooming in
Kafka
us-west-2
Event Source
(insert)
Kafka
us-east-1
TX Decision
(read)
Event
Processor
Ev...
Kafka Streams API
Kafka
MirrorMaker
(?)
Kafka
S3 Connector
Kafka Stream API
‫סמוך‬
Design
Event Source
(insert)
builder.st...
Kafka Processor API and Local Store
Kafka
MirrorMaker
(?)
Kafka
S3 Connector
Kafka Stream API
‫סמוך‬
Design
Event Source
(...
CRDT Graph Model
Conflict-free Replicated Data Type
Idempotent, Commutative, Associative
● Insert Only Graph
● Address / P...
G-Set: Growing Set CRDT
Conflict-free Replicated Data Type
Idempotent, Commutative, Associative
A B
us-west-2 event us-eas...
G-Set: Growing Set CRDT
Conflict resolution method: merge sets
A
C
B
us-west-2 event us-east-1 state
{A,B} {A,B}
{A,C} {A,...
Comprised of two G-Sets (added and tombstone)
A B
us-west-2 event us-east-1 state
add: {A,B}
rmv: {A}
add: {A,B}
rmv: {A}
...
A
C
B
us-west-2 event us-east-1 state
add: {A,B}
rmv: {A}
add: {A,B}
rmv: {A}
add: {A,C}
rmv: {B,D}
add: {A,B,C}
rmv: {A,B...
D
A
C
B
us-west-2 event us-east-1 state
add: {A,B}
rmv: {A}
add: {A,B}
rmv: {A}
add: {A,C}
rmv: {B,D}
add: {A,B,C}
rmv: {A...
A
C
B
us-west-2 event us-east-1 state
add_v: {A,B,C}
rmv_v: {}
add_e: {AB,AC,BC}
rmv_e: {}
add_v: {A,B,C}
rmv_v: {}
add_e:...
A
C
B
us-west-2 event us-east-1 state
add_v: {A,B,C}
rmv_v: {}
add_e: {AB,AC,BC}
rmv_e: {}
add_v: {A,B,C}
rmv_v: {}
add_e:...
A
C
B
us-west-2 event us-east-1 state
add_v: {A,B,C}
rmv_v: {}
add_e: {AB,AC,BC}
rmv_e: {}
add_v: {A,B,C}
rmv_v: {}
add_e:...
AD
C
B
us-west-2 event us-east-1 state
add_v: {A,B,C}
rmv_v: {}
add_e: {AB,AC,BC}
rmv_e: {}
add_v: {A,B,C}
rmv_v: {}
add_e...
AD
C
B
us-west-2 event us-east-1 state
add_v: {A,B,C}
rmv_v: {}
add_e: {AB,AC,BC}
rmv_e: {}
add_v: {A,B,C}
rmv_v: {}
add_e...
Sometimes the state won't converge easily
● Missing events (broken links)
○ integrity checks
○ repair
● Rerunning bulk eve...
Background Reconciliator
Reconciliation: Compare hash (Merkle) trees
Compensation: Merge CRDT states
client2 (read)
us-wes...
Takeaways
● Define business need for cross region
Availability, Latency, Residency, Analytics
● Know your NoSQL
Couchbase ...
“The Internet was designed to be an academic medium.
It was not designed to handle this level of transactions”
Fred Mattes...
Advanced Topics
● ‫מרקחת‬ ‫לבית‬ ‫מאשר‬ ‫מטבחים‬ ‫לבית‬ ‫דומה‬ ‫יותר‬ ‫האמתי‬ ‫העולם‬
● Multi Data Center Topologies
○ Sta...
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next

of

YouTube videos are no longer supported on SlideShare

View original on YouTube

Reversim 2017   cross region data replication design considerations Slide 2 Reversim 2017   cross region data replication design considerations Slide 3 Reversim 2017   cross region data replication design considerations Slide 4 Reversim 2017   cross region data replication design considerations Slide 5 Reversim 2017   cross region data replication design considerations Slide 6 Reversim 2017   cross region data replication design considerations Slide 7 Reversim 2017   cross region data replication design considerations Slide 8 Reversim 2017   cross region data replication design considerations Slide 9 Reversim 2017   cross region data replication design considerations Slide 10 Reversim 2017   cross region data replication design considerations Slide 11 Reversim 2017   cross region data replication design considerations Slide 12 Reversim 2017   cross region data replication design considerations Slide 13 Reversim 2017   cross region data replication design considerations Slide 14 Reversim 2017   cross region data replication design considerations Slide 15 Reversim 2017   cross region data replication design considerations Slide 16 Reversim 2017   cross region data replication design considerations Slide 17 Reversim 2017   cross region data replication design considerations Slide 18 Reversim 2017   cross region data replication design considerations Slide 19 Reversim 2017   cross region data replication design considerations Slide 20 Reversim 2017   cross region data replication design considerations Slide 21 Reversim 2017   cross region data replication design considerations Slide 22 Reversim 2017   cross region data replication design considerations Slide 23 Reversim 2017   cross region data replication design considerations Slide 24 Reversim 2017   cross region data replication design considerations Slide 25 Reversim 2017   cross region data replication design considerations Slide 26 Reversim 2017   cross region data replication design considerations Slide 27 Reversim 2017   cross region data replication design considerations Slide 28 Reversim 2017   cross region data replication design considerations Slide 29 Reversim 2017   cross region data replication design considerations Slide 30 Reversim 2017   cross region data replication design considerations Slide 31 Reversim 2017   cross region data replication design considerations Slide 32 Reversim 2017   cross region data replication design considerations Slide 33 Reversim 2017   cross region data replication design considerations Slide 34 Reversim 2017   cross region data replication design considerations Slide 35 Reversim 2017   cross region data replication design considerations Slide 36 Reversim 2017   cross region data replication design considerations Slide 37 Reversim 2017   cross region data replication design considerations Slide 38 Reversim 2017   cross region data replication design considerations Slide 39 Reversim 2017   cross region data replication design considerations Slide 40
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

Reversim 2017 cross region data replication design considerations

Download to read offline

Different requirements (high availability, data residency) and high level designs for aws cross region data replication (S3 vs dynamodb vs kinesis vs couchbase vs cassandra). This talk will focus on requirements, data consistency and write conflicts (CRDT example). It is a "theoretical" talk in the sense that no Forter specific design is presented, and should guide architects that want to design their service with "cross-region" in mind.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Reversim 2017 cross region data replication design considerations

  1. 1. Cross Region Data Replication Design Considerations Itai Friendinger itai@forter.com
  2. 2. Our financial institutions remain strong, and the American economy will be open for business as well. 2/40
  3. 3. TX Fraud Decision 100ms Decision as a Service Example if isFraud(tx.address,tx.payment) { return DECLINE; } else { return APPROVE; } TX Decision 3/40
  4. 4. Event Processor 1000ms Change Account Address Change Account Payment Unified People Store TX partial update read Decision as a Service Example TX Fraud Decision 100ms TX Decision 4/40
  5. 5. Design ‫בסדר‬ ‫יהיה‬ TX Fraud Decision TX Decision Event Processor People Store raw event ● No Cross Region Replication 5/40
  6. 6. Design ‫עליי‬ ● Cron Sync every 3 hours ● Replication != Reconciliation ● Replication != Backup TX Fraud Decision Event Processor People Store TX Fraud Decision TX Decision Event Processor People Store raw event Cron Sync raw event TXDecision 6/40
  7. 7. ● Read-Only RDS Replica ● Proxying data into a single Data Center ● Requires quarterly failover drills ● Cannot stand a real disaster for long Design ‫פסדר‬ ‫יאללה‬ TX Fraud Decision Event Forwarder People Store TX Fraud Decision TX Decision Event Processor People Store raw event RDS Replication raw event TX Forwarding Decision 7/40
  8. 8. Design ‫אחד‬ ‫במחיר‬ ‫שניים‬ ● CloudEndure DRaaS ● Point In Time Recovery ● Requires quarterly failover drills ● For existing apps (Enterprises) People Store TX Fraud Decision TX Decision Event Processor People Store raw event Block Device Replication 8/40
  9. 9. Design ‫חכה‬ ‫חכה‬ ● Google Cloud Spanner Is Here Geo Distributed Transactions Is Coming ● For green-field apps (Startups) TX Fraud Decision Event Processor People Store TX Fraud Decision TX Decision Event Processor People Store raw event Transactions raw event TXDecision 9/40
  10. 10. Design ‫סמוך‬ ● Out-Of-The-Box Real-Time Bi-Directional Data-Center Aware Replication ● Write Conflict resolution TX Fraud Decision TX Decision Event Processor People Store raw event 2Way Replication TX Fraud Decision Event Processor People Store raw event TXDecision 10/40
  11. 11. Design ‫שלה‬ ‫אחות‬ ● Replication of Raw Events ● State Divergence TX Fraud Decision TX Decision Event Processor People Store raw event 2way Replication TX Fraud Decision Event Processor People Store raw event TXDecision 11/40
  12. 12. Read Consistency Guarantees Loosely based on Consistency Explained Through Baseball by Doug Terry ● Strong ⇒ 2:2 ○ See all previous writes ● Read own Writes ○ See all writes performed by reader ● Monotonic ⇒ 2:1 ○ See all writes since the beginning till N seconds ago ● Eventual ⇒ 1:2 ○ See the writes in different order (some still missing) time partial update state 15m Hapoel =1 1:0 32m Maccabi =1 1:1 89m Hapoel =2 2:1 91m Maccabi =2 2:2 14/40
  13. 13. Hello Couchbase read-mutate-write of entire state Client reaches cluster’s primary node Conflict Prevention CAS Optimizations: subdocument API Strong node us-west-2b node us-west-2c Event Processor (read/m/write) TX Decision (read) Strong 16/40
  14. 14. Hello Couchbase XDCR replicates entire state between clusters Optimizations: dedup by key, metadata first Strong Monotonic XDCR node us-west-2b node us-west-2c Event Processor (read/m/write) node us-east-1c node us-east-1b TX Decision (read) TX Decision (read) Strong 17/40
  15. 15. Couchbase Last Write Wins Conflict Resolution - LWW erases losing side Remember: NTP, no “sudo date” Document Version = read-own-writes Monotonic XDCR node us-west-2b node us-west-2c Event Processor (read/m/write) node us-east-1c node us-east-1b TX Decision (read) TX Decision (read) Monotonic read-own-writes Event Processor (read/m/write) ‫סמוך‬ Design Conflict Resolution 48bit timestamp Conflict Prevention 16bit CAS 19/40
  16. 16. Hello Cassandra node us-west-2b node us-west-2a node us-west-2c Event Processor (partial update) node us-east-1b TX Decision (read) Client reaches closest node, blocks until LOCAL_QUARUM No Conflict Prevention ⇒ Use partial updates or inserts Strong (?) node us-east-1c node us-east-1a TX Decision (read) 21/40
  17. 17. Cassandra Last Write Wins per Column Two clients update payment and address of same person with exactly same client timestamps. (?) (?) update payment wins update address wins node us-west-2b node us-west-2a node us-west-2c Event Processor (partial update) node us-east-1c node us-east-1a node us-east-1b TX Decision (read) TX Decision (read) Event Processor (partial update) ‫סמוך‬ Design 23/40
  18. 18. Cassandra Multi Value per Column Update different columns of same person Conflict resolution in TX Decision (on read) (?) (?) update payment1, address1 update payment2, address2 node us-west-2b node us-west-2a node us-west-2c Event Processor (partial update) node us-east-1c node us-east-1a node us-east-1b TX Decision (read) TX Decision (read) Event Processor (partial update) ‫סמוך‬ Design 25/40
  19. 19. Kafka Kafka us-west-2 Event Source (insert) Kafka us-east-1 TX Decision (read) Event Processor Event Processor S3 versioned us-east-1 TX Decision (read) S3 versioned us-west-2 (?) (?) Event Source (insert) mirror(s) us-west mirror(s) us-west mirror(s) us-west mirror(s) us-west mirror(s) us-west mirror(s) us-east inserts Conflict resolution in Event Processor Will both regions converge into the same state? ‫שלו‬ ‫אח‬ Design 27
  20. 20. Converging events into state ● Duplicate events ○ Idempotent compare-and-set(x, 2, 5) ○ De-duplication 2 +3 +3 = 5 ○ Rollback ● Unordered events ○ Commutative 2+3=3+2 ○ reordering window (requires state) ● Bulk/Parallel event processing ○ Associative (2+3)+4 = 2+(3+4) 29/40
  21. 21. Kafka Streams API - zooming in Kafka us-west-2 Event Source (insert) Kafka us-east-1 TX Decision (read) Event Processor Event Processor S3 versioned us-east-1 TX Decision (read) S3 versioned us-west-2 (?) (?) Event Source (insert) mirror(s) us-west mirror(s) us-west mirror(s) us-west mirror(s) us-west mirror(s) us-west mirror(s) us-east inserts ‫שלו‬ ‫אח‬ Design
  22. 22. Kafka Streams API Kafka MirrorMaker (?) Kafka S3 Connector Kafka Stream API ‫סמוך‬ Design Event Source (insert) builder.stream("kstream1","kstream2") .filter(predicate) .transform(processor) .to("ktable") S3 kstream1 kstream2 ktable 30/40
  23. 23. Kafka Processor API and Local Store Kafka MirrorMaker (?) Kafka S3 Connector Kafka Stream API ‫סמוך‬ Design Event Source (insert) kstream1 kstream2 ktable Map process(Map event) { Map state = kvStore.get(event.key); state.putAll(event); // not commutative (order matters) kvStore.put(event.key, state); return state; } S3 32/40
  24. 24. CRDT Graph Model Conflict-free Replicated Data Type Idempotent, Commutative, Associative ● Insert Only Graph ● Address / Payment / Person Objects
  25. 25. G-Set: Growing Set CRDT Conflict-free Replicated Data Type Idempotent, Commutative, Associative A B us-west-2 event us-east-1 state {A,B} {A,B}
  26. 26. G-Set: Growing Set CRDT Conflict resolution method: merge sets A C B us-west-2 event us-east-1 state {A,B} {A,B} {A,C} {A,B,C}
  27. 27. Comprised of two G-Sets (added and tombstone) A B us-west-2 event us-east-1 state add: {A,B} rmv: {A} add: {A,B} rmv: {A} 2P-Set: Two Phase Set CRDT
  28. 28. A C B us-west-2 event us-east-1 state add: {A,B} rmv: {A} add: {A,B} rmv: {A} add: {A,C} rmv: {B,D} add: {A,B,C} rmv: {A,B,D} Always grows Garbage Collection algorithms exist. 2P-Set: Two Phase Set CRDT
  29. 29. D A C B us-west-2 event us-east-1 state add: {A,B} rmv: {A} add: {A,B} rmv: {A} add: {A,C} rmv: {B,D} add: {A,B,C} rmv: {A,B,D} add: {D} add: {A,B,C,D} rmv: {A,B,D} Always grows Garbage Collection algorithms exist. 2P-Set: Two Phase Set CRDT
  30. 30. A C B us-west-2 event us-east-1 state add_v: {A,B,C} rmv_v: {} add_e: {AB,AC,BC} rmv_e: {} add_v: {A,B,C} rmv_v: {} add_e: {AB,AC,BC} rmv_e: {} 2P2P-Graph CRDT 2P-Set for vertices, 2P-Set for edges resolution method: remove wins
  31. 31. A C B us-west-2 event us-east-1 state add_v: {A,B,C} rmv_v: {} add_e: {AB,AC,BC} rmv_e: {} add_v: {A,B,C} rmv_v: {} add_e: {AB,AC,BC} rmv_e: {} add_v: {} rmv_v: {A} add_e: {} rmv_e: {} 2P2P-Graph CRDT 2P-Set for vertices, 2P-Set for edges resolution method: remove wins
  32. 32. A C B us-west-2 event us-east-1 state add_v: {A,B,C} rmv_v: {} add_e: {AB,AC,BC} rmv_e: {} add_v: {A,B,C} rmv_v: {} add_e: {AB,AC,BC} rmv_e: {} add_v: {} rmv_v: {A} add_e: {} rmv_e: {} add_v: {A,B,C} rmv_v: {A} add_e: {AB,AC,BC} rmv_e: {AB,AC} 2P2P-Graph CRDT 2P-Set for vertices, 2P-Set for edges resolution method: remove wins
  33. 33. AD C B us-west-2 event us-east-1 state add_v: {A,B,C} rmv_v: {} add_e: {AB,AC,BC} rmv_e: {} add_v: {A,B,C} rmv_v: {} add_e: {AB,AC,BC} rmv_e: {} add_v: {} rmv_v: {A} add_e: {} rmv_e: {} add_v: {A,B,C} rmv_v: {A} add_e: {AB,AC,BC} rmv_e: {AB,AC} add_v: {D} rmv_v: {} add_e: {AD} rmv_e: {} 2P2P-Graph CRDT 2P-Set for vertices, 2P-Set for edges resolution method: remove wins
  34. 34. AD C B us-west-2 event us-east-1 state add_v: {A,B,C} rmv_v: {} add_e: {AB,AC,BC} rmv_e: {} add_v: {A,B,C} rmv_v: {} add_e: {AB,AC,BC} rmv_e: {} add_v: {} rmv_v: {A} add_e: {} rmv_e: {} add_v: {A,B,C} rmv_v: {A} add_e: {AB,AC,BC} rmv_e: {AB,AC} add_v: {D} rmv_v: {} add_e: {AD} rmv_e: {} add_v: {A,B,C,D} rmv_v: {A} add_e: {AB,AC,BC,AD} rmv_e: {AB,AC,AD} 2P2P-Graph CRDT 2P-Set for vertices, 2P-Set for edges resolution method: remove wins
  35. 35. Sometimes the state won't converge easily ● Missing events (broken links) ○ integrity checks ○ repair ● Rerunning bulk events after downtime ○ Clocks: Event vs. Ingestion vs. Processor vs. Logical ○ Enrichment: IP address reputation changes daily 37/40
  36. 36. Background Reconciliator Reconciliation: Compare hash (Merkle) trees Compensation: Merge CRDT states client2 (read) us-west-2a S3 versioned us-west-2 client1 (read) us-east-1b S3 versioned us-east-1 Background Reconciliator 38/40
  37. 37. Takeaways ● Define business need for cross region Availability, Latency, Residency, Analytics ● Know your NoSQL Couchbase != Cassandra != Kafka ● Ask about CRDTs LWW-Register, MV-Register, 2P-Sets, 2P2P-Graphs ● Use Reconciliation ● Dedicated Fiber and Atomic clocks ARE COMING 40/40
  38. 38. “The Internet was designed to be an academic medium. It was not designed to handle this level of transactions” Fred Matteson @ schwab.com 1999
  39. 39. Advanced Topics ● ‫מרקחת‬ ‫לבית‬ ‫מאשר‬ ‫מטבחים‬ ‫לבית‬ ‫דומה‬ ‫יותר‬ ‫האמתי‬ ‫העולם‬ ● Multi Data Center Topologies ○ Star (SPOF, simple) ○ Ring (TLV ←→ Eilat ←→ Jerusalem←→ TLV) ○ Mesh (resilient, complex) ● Data Residency ○ Separate PII from data ○ Peek at other data centers ad-hoc
  • NathanelSulimanov

    Jan. 16, 2019
  • cyliu7

    Aug. 15, 2018

Different requirements (high availability, data residency) and high level designs for aws cross region data replication (S3 vs dynamodb vs kinesis vs couchbase vs cassandra). This talk will focus on requirements, data consistency and write conflicts (CRDT example). It is a "theoretical" talk in the sense that no Forter specific design is presented, and should guide architects that want to design their service with "cross-region" in mind.

Views

Total views

6,167

On Slideshare

0

From embeds

0

Number of embeds

5,760

Actions

Downloads

13

Shares

0

Comments

0

Likes

2

×