Brian Hess & Cliff Gilmore
DataStax Advanced Replication
Why Advanced Replication
• Standard Cassandra replication has its limits
• Lots of disconnected “edge” nodes/data centers/clusters
• Replicating to central “mother ship” for aggregating
• Inconsistent connectivity
• All data centers are read-write – no read-only DCs
2© 2016 DataStax, All Rights Reserved.
What is Advanced Replication
• Advanced Replication supports:
• Many edge clusters replicating to a central hub
• Consistent or sporadic connectivity – “store and forward”
• Prioritized streams for limited bandwidth situations
• One-way replication
• Active queries at the edge, as well as replicating to the hub
• Search/Analytics supported at edge and hub clusters
3© 2016 DataStax, All Rights Reserved.
Company Confidential
“What was Brian’s
average purchase
per store this
week?”
Analytics Over
All Data
“What did Brian buy
today across all
stores?”
Can Query
Global Sales
“What was the
hottest product
here this week?”
Analytics of
Local Sales
“What did Brian buy
here today?”
Can Query
Local Sales
Each Store Central Hub
Example: Retail Sales
© 2016 DataStax, All Rights Reserved.
Company Confidential
Key Verticals
© 2016 DataStax, All Rights Reserved.
Advanced Replication Key Terminology
• Edge – DSE Cluster that is the source of change events
• Hub – DSE Cluster that receives change events
• Replication Log – A table on the edge cluster that stores changes
• Channel – Defined replication configuration between an edge and hub table
• Collection Agent – Captures change events to the replication log table
• Replication Agent – Reads replication log and writes to the hub
6© 2016 DataStax, All Rights Reserved.
Architecture – Edge View
7
Client
Edge
Replication
Log
Collection
Agent Table
Replication
Agent
Hub Cluster
Table
© 2016 DataStax, All Rights Reserved.
Architecture – Edge View
8
Client
Edge
Replication
Log
Collection
Agent Table
Replication
Agent
Hub Cluster
Table
Normal CQL
Operation
CQL Trigger
captures
mutation
Maintained in C*
table for Fault
Tolerance
Pulls from
Replication Log in
priority/time order
Replicates to Hub
via normal CQL
driver
High Priority mutations
opportunistically sent to
Hub asynchronously
© 2016 DataStax, All Rights Reserved.
Points of Nuance
• Does it handle TTLs?
• The edge cluster will NOT capture the TTL of of the base record
• The hub table can have default TTL that is different than edge table
• Can I repair from edge to hub?
• Because these are separate clusters there is no repair mechanism
• Replication mechanism ensures writes make it to hub eventually
• This looks like Hints!
• More robust than Hinted Handoff
9© 2016 DataStax, All Rights Reserved.
Topology
10© 2016 DataStax, All Rights Reserved.
West
East
Store #1
Store #7
Store #2
Store #6
Store #5
Store #4
Store #3
Store #8
Store #9
Store #10
Store #11
Questions?
Brian Hess – brian.hess@datastax.com
Cliff Gilmore – cgilmore@datastax.com

DataStax | DataStax Enterprise Advanced Replication (Brian Hess & Cliff Gilmore) | Cassandra Summit 2016

  • 1.
    Brian Hess &Cliff Gilmore DataStax Advanced Replication
  • 2.
    Why Advanced Replication •Standard Cassandra replication has its limits • Lots of disconnected “edge” nodes/data centers/clusters • Replicating to central “mother ship” for aggregating • Inconsistent connectivity • All data centers are read-write – no read-only DCs 2© 2016 DataStax, All Rights Reserved.
  • 3.
    What is AdvancedReplication • Advanced Replication supports: • Many edge clusters replicating to a central hub • Consistent or sporadic connectivity – “store and forward” • Prioritized streams for limited bandwidth situations • One-way replication • Active queries at the edge, as well as replicating to the hub • Search/Analytics supported at edge and hub clusters 3© 2016 DataStax, All Rights Reserved.
  • 4.
    Company Confidential “What wasBrian’s average purchase per store this week?” Analytics Over All Data “What did Brian buy today across all stores?” Can Query Global Sales “What was the hottest product here this week?” Analytics of Local Sales “What did Brian buy here today?” Can Query Local Sales Each Store Central Hub Example: Retail Sales © 2016 DataStax, All Rights Reserved.
  • 5.
    Company Confidential Key Verticals ©2016 DataStax, All Rights Reserved.
  • 6.
    Advanced Replication KeyTerminology • Edge – DSE Cluster that is the source of change events • Hub – DSE Cluster that receives change events • Replication Log – A table on the edge cluster that stores changes • Channel – Defined replication configuration between an edge and hub table • Collection Agent – Captures change events to the replication log table • Replication Agent – Reads replication log and writes to the hub 6© 2016 DataStax, All Rights Reserved.
  • 7.
    Architecture – EdgeView 7 Client Edge Replication Log Collection Agent Table Replication Agent Hub Cluster Table © 2016 DataStax, All Rights Reserved.
  • 8.
    Architecture – EdgeView 8 Client Edge Replication Log Collection Agent Table Replication Agent Hub Cluster Table Normal CQL Operation CQL Trigger captures mutation Maintained in C* table for Fault Tolerance Pulls from Replication Log in priority/time order Replicates to Hub via normal CQL driver High Priority mutations opportunistically sent to Hub asynchronously © 2016 DataStax, All Rights Reserved.
  • 9.
    Points of Nuance •Does it handle TTLs? • The edge cluster will NOT capture the TTL of of the base record • The hub table can have default TTL that is different than edge table • Can I repair from edge to hub? • Because these are separate clusters there is no repair mechanism • Replication mechanism ensures writes make it to hub eventually • This looks like Hints! • More robust than Hinted Handoff 9© 2016 DataStax, All Rights Reserved.
  • 10.
    Topology 10© 2016 DataStax,All Rights Reserved. West East Store #1 Store #7 Store #2 Store #6 Store #5 Store #4 Store #3 Store #8 Store #9 Store #10 Store #11
  • 12.
    Questions? Brian Hess –brian.hess@datastax.com Cliff Gilmore – cgilmore@datastax.com

Editor's Notes

  • #5 This slide represents an example of Retail Point-of-Sale Transactions.
  • #6 Oil and Gas Industrial IoT Retail Banking, Finance Telecommunications Transportation Mobile deployments or deployments with poor connectivity Oil rigs, mining, cruise ships, planes, etc.
  • #10 Hint like mechanism, no repair capability, ttl