Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Scaling Beyond a Billion Transactions Per Day with Sub-Second Responses

306 views

Published on

SpringOne Platform 2019
Session Title: Scaling Beyond a Billion Transactions Per Day with Sub-Second Responses
Speaker: Andrey Zolotov, Lead Software Engineer, Mastercard; Gideon Low, Principal Data Transformation Architect, Pivotal
Youtube:

Published in: Software
  • Be the first to comment

  • Be the first to like this

Scaling Beyond a Billion Transactions Per Day with Sub-Second Responses

  1. 1. scaling beyond a billion transactions per day with strict SLAs Andrey Zolotov, Lead Software Engineer, Mastercard Gideon Low, Principal Data Transformation Architect, Pivotal October 9, 2019 Austin Convention Center
  2. 2. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Agenda Decision Management Platform (DMP) our challenges scalability design patterns growing data volume maximizing transaction throughput optimizing latency
  3. 3. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Decision Management Platform DMP is a plugin-based transaction processing engine powers over 20 Mastercard products throughput over 60,000 TPS average response time under 100ms distributed processing on commodity servers 3
  4. 4. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Decision Management Platform aggregates transaction history on many levels consumes various data to feed models evaluates risk for every transaction blocks and/or alerts on fraudulent transactions protects issuers and acquirers from widespread attacks 4
  5. 5. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Dual challenge low latency • 50 millisecond responses high throughput • over 60,000 per second 5
  6. 6. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Data Challenges 6 large data set • 30+ billion time-aware aggregates fast data access • sub-millisecond distributed reads fast, atomic delta updates • 3 millisecond distributed compute many real-time data operations per transaction • hundreds of reads and writes
  7. 7. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Data Solution – Geode / GemFire 7 in-memory scalable distributed
  8. 8. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Distributed data challenges 8 data access scalability through co-location balanced data distribution concurrent data operations latency consistency is still an issue
  9. 9. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Geode Data Partitioning Layered View 9 Physical HW Pool Geode Server Pool PR Bucket Pool App “Groups” 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 . . .
  10. 10. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Custom Partitioning by Logical “GroupId” How are Data Entries mapped to “buckets”? • First, the key is first used to extract a hashcode: • For default partitioning behavior, Geode simply uses key.hashCode() • For custom partitioning, uses key.<customGroupId>.hashCode(), implemented via the PartitionResolver interface or configured via StringPrefixPartitionResolver (if key is a String with the groupId prepended). • Then we apply the modulus arithmetic operator to the hashCode using the fixed bucket count: <hashCode> % <bucketCount> = bucketId 1 0
  11. 11. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Custom Partitioning Metadata Buckets to Servers Mapping: • The allocation of buckets to servers is dymamic. Partitioning metadata Map<bucketId, serverId> is shared across all Servers and Clients. • This metadata enables optimized routing of data op requests. For large clusters with computational workloads, choosing the right custom grouping and the right bucket count has a major impact . . . Bucket ID Server ID 1 Server1 2 Server1 3 Server2 4 Server3 Partitioning Metadata: 1 1
  12. 12. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Data access scalability – challenge 1 2 • getAll of 100 keys would result in up to 100 requests in parallel • This is not scalable, since number of requests grows as your cluster grows getAll({{user:1, txn:1}, {user:1, txn:2}, {user:1, txn:3}, {user:1, txn:4}, {user:1, txn:5}, {user:1, txn:6}});
  13. 13. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Data access scalability – solution 1 3 • co-locate the regions with same keys • co-locate related data within regions using Partition Resolver • data entries for single account should be in the same bucket getAll({{user:1, txn:1}, {user:1, txn:2}, {user:1, txn:3}, {user:1, txn:4}, {user:1, txn:5}, {user:1, txn:6}});
  14. 14. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Data co-location – results 1 4 one of the most important things to implement for scalability allowed us to reach over 8 million entry reads per second added ability to grow the cluster without increasing number of calls
  15. 15. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Balanced key distribution 1 5 key hashCodes should have uniform distribution number of buckets large enough so difference of one bucket does not skew distribution prime number of buckets avoids unbalanced buckets
  16. 16. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Calculate number of buckets for even distribution 1 6 max 12 nodes 5% acceptable variance max 500GB primary data max bucket size of 500MB buckets = nextprime((12/0.05) ∨ (500GB/500MB)) ∨ 113 buckets = nextprime(240 ∨ 1024) ∨ 113 = nextprime(1024) buckets = 1031
  17. 17. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Large data set – challenge 1 7 Many terabytes of shared data are now in RAM Data has to be distributed Too many data nodes is not performant Traditional GC pauses way too long with large heaps
  18. 18. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Jumbo-Heap Geode Java Servers • Production Geode Servers with 48+ or 100’s of GB’s are common. Tuning Basics: • Avoid the obvious pitfalls: Swapping, System.gc(), size variance • Overall Headroom and ample Young Gen space. Geode Factors that support larger heap sizes: • Minimize Impact of Short-lived objects: • Keep them in Stack Memory: Method in-lining, thread model. • Reduce Object Graph Complexity: Byte Array Storage
  19. 19. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Jumbo-Heap Geode Java Servers Without Server-Side Custom Code: • Client-Driven Read/Writes cause no Server-Side Deserialization. • Byte Arrays are easy on Garbage Collectors • Byte Arrays are already optimized for I/O (Disk, Network) With Server-Side Custom Code: • Objects Deserialized to Drive Calculations or other logic • But, Deserialization on-demand is expensive . . . • Business Logic code is faster with POJO’s • Or keep POJO’s in-memory . . But uses more and impacts GC Pauses
  20. 20. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Jumbo-Heap Geode Java Servers What is the “Worst Case Scenario”? • Variance in cached object sizes — Fragmentation Pressure. • High Object Churn Rate — Most direct impact on GC workload. • Read/Modify/Write in a group. • Size of the Heap? No! • Size and Complexity of Java Object Graph: • Non-Concurrent Marking Phase pause grows as O(n) • Object Reference management during promotion. • Server-side business logic execution: Store as POJO’s or Deserialize-on- demand; either one adds heavy additional overhead.
  21. 21. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Jumbo-Heap Geode Java Servers What is the “Worst Case Scenario”? • Variance in cached object sizes — Fragmentation Pressure. • High Object Churn Rate — Most direct impact on GC workload. • Read/Modify/Write in a group. • Size of the Heap? No! • Size and Complexity of Java Object Graph: • Non-Concurrent Marking Phase pause grows as O(n) • Object Reference management during promotion. • Server-side business logic execution: Store as POJO’s or Deserialize-on- demand; either one adds heavy additional overhead.
  22. 22. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Large data set – solutions 2 2 Scale horizontally (add Nodes) Scale each node vertically (add RAM) Options to tame GC pauses on large nodes: • LRU overflow to disk + large file system cache • GemFire’s off-heap storage • Pauseless GC JVM with terabyte+ heaps
  23. 23. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Large data set – solutions 2 3 Heap Sizes Scaling Solution SPEED RELIABILITY LATENCY CONSISTENCY COMPLEXITY COST LARGE None LOW (long pauses) LOW LOWEST LOWEST MEDIUM SMALL Many JVMs HIGH LOW HIGH HIGH MEDIUM SMALL LRU Overflow LOW (disk access) MEDIUM MEDIUM LOW LOW SMALL Overflow + File Cache MEDUIM MEDIUM MEDIUM MEDIUM MEDIUM SMALL GemFire off-heap MEDUIM MEDIUM MEDIUM MEDIUM MEDIUM LARGE Pauseless JVM HIGH HIGH HIGHEST LOW HIGH
  24. 24. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Large data set – solutions 2 4 Heap Sizes Scaling Solution SPEED RELIABILITY LATENCY CONSISTENCY COMPLEXITY COST LARGE None LOW (long pauses) LOW LOWEST LOWEST MEDIUM SMALL Many JVMs HIGH LOW HIGH HIGH MEDIUM SMALL LRU Overflow LOW (disk access) MEDIUM MEDIUM LOW LOW SMALL Overflow + File Cache MEDUIM MEDIUM MEDIUM MEDIUM MEDIUM SMALL GemFire off-heap MEDUIM MEDIUM MEDIUM MEDIUM MEDIUM LARGE Pauseless JVM HIGH HIGH HIGHEST LOW HIGH
  25. 25. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Large data set – solutions 2 5 Heap Sizes Scaling Solution SPEED RELIABILITY LATENCY CONSISTENCY COMPLEXITY COST LARGE None LOW (long pauses) LOW LOWEST LOWEST MEDIUM SMALL Many JVMs HIGH LOW HIGH HIGH MEDIUM SMALL LRU Overflow LOW (disk access) MEDIUM MEDIUM LOW LOW SMALL Overflow + File Cache MEDUIM MEDIUM MEDIUM MEDIUM MEDIUM SMALL GemFire off-heap MEDUIM MEDIUM MEDIUM MEDIUM MEDIUM LARGE Pauseless JVM HIGH HIGH HIGHEST LOW HIGH
  26. 26. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Large data set – solutions 2 6 Heap Sizes Scaling Solution SPEED RELIABILITY LATENCY CONSISTENCY COMPLEXITY COST LARGE None LOW (long pauses) LOW LOWEST LOWEST MEDIUM SMALL Many JVMs HIGH LOW HIGH HIGH MEDIUM SMALL LRU Overflow LOW (disk access) MEDIUM MEDIUM LOW LOW SMALL Overflow + File Cache MEDUIM MEDIUM MEDIUM MEDIUM MEDIUM SMALL GemFire off-heap MEDUIM MEDIUM MEDIUM MEDIUM MEDIUM LARGE Pauseless JVM HIGH HIGH HIGHEST LOW HIGH
  27. 27. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Large data set – solutions 2 7 Heap Sizes Scaling Solution SPEED RELIABILITY LATENCY CONSISTENCY COMPLEXITY COST LARGE None LOW (long pauses) LOW LOWEST LOWEST MEDIUM SMALL Many JVMs HIGH LOW HIGH HIGH MEDIUM SMALL LRU Overflow LOW (disk access) MEDIUM MEDIUM LOW LOW SMALL Overflow + File Cache MEDUIM MEDIUM MEDIUM MEDIUM MEDIUM SMALL GemFire off-heap MEDUIM MEDIUM MEDIUM MEDIUM MEDIUM LARGE Pauseless JVM HIGH HIGH HIGHEST LOW HIGH
  28. 28. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Large data set – solutions 2 8 Heap Sizes Scaling Solution SPEED RELIABILITY LATENCY CONSISTENCY COMPLEXITY COST LARGE None LOW (long pauses) LOW LOWEST LOWEST MEDIUM SMALL Many JVMs HIGH LOW HIGH HIGH MEDIUM SMALL LRU Overflow LOW (disk access) MEDIUM MEDIUM LOW LOW SMALL Overflow + File Cache MEDUIM MEDIUM MEDIUM MEDIUM MEDIUM SMALL GemFire off-heap MEDUIM MEDIUM MEDIUM MEDIUM MEDIUM LARGE Pauseless JVM HIGH HIGH HIGHEST LOW HIGH
  29. 29. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Large data set – solutions 2 9 Heap Sizes Scaling Solution SPEED RELIABILITY LATENCY CONSISTENCY COMPLEXITY COST LARGE None LOW (long pauses) LOW LOWEST LOWEST MEDIUM SMALL Many JVMs HIGH LOW HIGH HIGH MEDIUM SMALL LRU Overflow LOW (disk access) MEDIUM MEDIUM LOW LOW SMALL Overflow + File Cache MEDUIM MEDIUM MEDIUM MEDIUM MEDIUM SMALL GemFire off-heap MEDUIM MEDIUM MEDIUM MEDIUM MEDIUM LARGE Pauseless JVM HIGH HIGH HIGHEST LOW HIGH
  30. 30. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Large data set challenge – result 3 0 Scaled each node vertically Scaled horizontally by adding nodes Implemented a solution to tame GCs Enabled cluster sizes of 40 Terabytes with 600 GB heaps
  31. 31. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Large entries 3 1 too long to transfer to clients take up client CPU to deserialize slow to run business logic on multiple entries
  32. 32. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Large entries – solution 3 2 distribute business logic to the Geode cluster pull the data needed for the transaction parallelize processing reduce client load minimize network utilization
  33. 33. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Large entries – solution 3 3 distribute business logic to the Geode cluster send the compute requests to the data use functions to run code on the data nodes parallelize processing reduce client load minimize network utilization Fx
  34. 34. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Geode Function Execution Service Executing Business Logic in a Data Context: Targets either: • An entire “Region” data set via “onRegion(<name>)” or: • A sub-set of application partitions via “onRegion().withFilter(<groupIds . . .>) 3 4
  35. 35. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Implementing the Geode Function Interface Once Inside the “data aware” Invocation context: • Access the data-specific context: Region interface & filter key set: RegionFunctionContext rfc = (RegionFunctionContext)context; Region<K,V> localFilterData = PartitionedRegionHelper.getLocalDataForContext(rfc); Collection filterKeys = rfc.getFilter(); • Every data read operation using a filter key local & optimized. • Implement “optimizeForWrite()” on Function implementation, and write operations are also optimized . . . • This ensures the Fn execution data context only targets Primary bucket, thus minimizing update replication to a single network RT. 3 5
  36. 36. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Different deltas for each key 3 6 use function arguments? • all involved nodes will receive every delta request • server has to find the applicable requests in arguments • extremely inefficient for network, clients, and servers use filter to send requests • implement Partition Resolver on the request and return the key • pass the set of your request to the execution filter
  37. 37. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Distributed function example – client 3 7
  38. 38. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Distributed function example – request 3 8
  39. 39. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Distributed function example – execute 3 9
  40. 40. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Concurrent updates – challenge 4 0 clients update distributed data for every transaction clients submit delat updates in parallel this results in concurrent updates to same entries
  41. 41. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Distributed function example – update 4 1
  42. 42. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Concurrent update – solution 4 2 use distributed functions to process in parallel send the requests in the filter to minimize load lock the key inside the function utilize delta propagation for redundancy Fx
  43. 43. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Geode Delta API Primer 4 3 Optimizes partial updates to large objects: • Allows developer to control on-the-wire update size via Delta implementation. • Efficiency similar SQL “update” of only specific fields on a single PK. • The larger the Full Object to delta ratio, the greater the efficiency gain. • Optionally reduces Java memory churn via update- in-place instead of making a copy-on-read first. • But then developer is responsible for possible contention . . .
  44. 44. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Delta Propagation example 4 4
  45. 45. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Delta Value example 4 5
  46. 46. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Data-aware request distribution – results 4 6 implementing filtered requests is vital to cluster scalability 95% reduction in network traffic 50% latency reduction for function execution 40% CPU reduction on server nodes delta propagation is essential for updating large values 50% reduction in network traffic
  47. 47. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ clients geode node zone b geode node zone a delta propagation synchronousreplication strongconsistency
  48. 48. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ CAP Theorem CAP Theorem states that we can only have Two: • Consistency • Availability • Partition Tolerance (split-brain). Geode Redundant Partitioned Regions implement CP, prioritizing data consistency and protection from network partitioning events. 4 8
  49. 49. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ CAP Theorem and Geode Regions • Redundant Partitioned Regions are always CP • Replicated Regions optionally sacrifice total consistency for higher update rates by allowing asynchronous updates to replicas • Local regions do not distribute • May still be defined on every server • Accessible from clients • Accessible from server-side code 4 9
  50. 50. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Extremely concurrent deltas 5 0 single entry update rate of thousands per second delta propagation latency within lock does not scale strong consistency is not useful when it prevents scaling can we trade losing updates at node failure for a gain in throughput?
  51. 51. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Extremely concurrent deltas – solution 5 1 detect when an entry gets “hot” by tracking TPS in each entry put the “hot” entries into a local region after applying the update non-replicated, non-partitioned, non-persistent periodically copy updated entries to the partitioned region remove entries from hot region when no longer hot
  52. 52. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ clients geode node zone b geode node zone a redundant puts asyncreplication eventualconsistency
  53. 53. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Extremely concurrent deltas – results 5 3 delay replication when applying many deltas to the same entry enabled scaling single-entry operations from under 1000 updates per second to over 100,000 updates per second
  54. 54. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ SUMMARY scalability design patterns scaling for large data sets co-locating data for scalability functions to distribute compute low latency with delta propagation extreme updates by delaying replication
  55. 55. Q & A Andrey @zdre Gideon glow@pivotal.io careers.mastercard.com #springone@s1p

×