The lay of the land:The main architectural   constructs in the  database industry
Shared Disk
ms          μs           ns             ps 1MB Disk/Network                  1MB Main Memory           0.000,000,000,000Cr...
Taken from “OLTP Throughthe Looking Glass, andWhat We Found There”Harizopoulos et al
In-Memory     TimesTen,                   Database     HSQL, KD                                    B                   Sha...
Distributed    ArchitectureSimplify the Contract   Stick to RAM
450 processes         2TB of RAM                                             Oracle                                       ...
Access Layer   Jav    Jav                a      a               clie   clie                nt               API     nt    ...
IndexingPartitioning              Replication
But your storage is limited bythe memory on a node
Associating data indifferent partitions impliesmoving it.Scalable storage, bandwidthand processing
Trade    r            Part        Version 1                  y          Trad           e       Trade                     r...
Trad          Trade       Part e              r        y       Part   TradeTrad    y       r e       Part        y     Tra...
So we want to hold   entities separately(normalised) to alleviate    concerns around consistency and space          usage
Independently                                  Data is  Versioned             Trade                                       ...
Trade                 r            Part                               y                       Trad                        ...
This is what using Snowflake Schemas and  the Connected Replication pattern is all                   about!
Crosscuttin    g   KeysCommon Keys
ReplicatedTrader                 Party         Trade                         Partitioned
Facts:=>Big,commonkeysDimensions=>Small,crosscuttingKeys
Use a Key                   AssignmentTrade                     Policy           MTMs   (e.g. KeyAssociation  s           ...
ReplicatedTrader                 Party         Trade                         Partitioned
Query LayerTrader             Party         Trade                        Transactions                                     ...
Dimension                       s                  (repliacte)   Transactions                     Facts        Mtms     Ca...
Facts:=>Big=>Distribut     eDimensions=>Small=> Replicate
We use a variant on a   Snowflake Schema topartition big entities that canbe related via a partitioningkey and replicate s...
ReplicateDistribute
Select Transaction, MTM,RefrenceData From MTM,Transaction, Ref Where Cost Centre= ‘CC1’
Get      Get    Get    Get      Get   Get      Get Cost    Ledger Source Transa   MTMs   Legs     CostCenter   Books Books...
Get       Get      Get     Get    Get   Get     Get Cost     Ledger   Source Transac MTMs   Legs    CostCenters   Books   ...
Select Transaction, MTM, ReferenceData FromMTM, Transaction, Ref Where Cost Centre =‘CC1’                                 ...
Select Transaction, MTM, ReferenceData FromMTM, Transaction, Ref Where Cost Centre =‘CC1’                                 ...
Select Transaction, MTM, ReferenceData From             MTM, Transaction, Ref Where Cost Centre =             ‘CC1’Join   ...
JavReplicated         a                  clie                           Partitioned                   nt                  ...
So all the big stuff is  held partitioned And we can join without shipping keys around andhaving intermediate      results
Trade                 r            Part                               y                       Trad                        ...
Trade  r            Part        Version 1                y        Trad         e       Trade                   r          ...
Trad          Trade       Part e              r        y       Part   TradeTrad    y       r e       Part        y     Tra...
FactsThis is a dimension  • It has a different    key to the Facts.    Dimensions  • And it’s BIG
So we only replicate‘Connected’ or ‘Used’     dimensions
Processing Layer                        Dimension                         Caches                       (Replicated)       ...
Query Layer    Save Trade                  (With connected                                dimension Caches)               ...
Query Layer                         (With connected                         dimension Caches)                         Data...
Query Layer                                  (With connected                                  dimension Caches)           ...
‘Connected Replication’   A simple pattern whichrecurses through the foreign keys in the domain model, ensuring only ‘Conn...
With ‘Connected  Replication’ only  1/10th of the dataneeds to be replicated    (on average).
Jav               aJava schema   clie               nt              API    Java ‘Stored                     Procedures’   ...
Query with more        than twenty joins           conditions:2GB per min /  250Mb/s           3ms latency (per client)
Partitioned Storage
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Upcoming SlideShare
Loading in …5
×

Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

1,847 views

Published on

2011-11-02 | 02:25 PM - 03:15 PM
In 2009 RBS set out to build a single store of trade and risk data that all applications in the bank could access simultaniously. This talk discusses a number of novel techniques that were developed as part of this work. Based on Oracle Coherence the ODC departs from the trend set by most caching solutions by holding its data in a normalised form making it both memory efficient and easy to change. However it does this in a novel way that supports most arbitrary queries without the usual problems associated with distributed joins. We'll be discussing these patterns as well as others that allow linear scalability, fault tolerance and millisecond latencies.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,847
On SlideShare
0
From Embeds
0
Number of Embeds
18
Actions
Shares
0
Downloads
19
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Big data sets are held distributed and only joined on the grid to collocated objects.Small data sets are held in replicated caches so they can be joined in process (only ‘active’ data is held)
  • Big data sets are held distributed and only joined on the grid to collocated objects.Small data sets are held in replicated caches so they can be joined in process (only ‘active’ data is held)
  • Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

    1. 1. The lay of the land:The main architectural constructs in the database industry
    2. 2. Shared Disk
    3. 3. ms μs ns ps 1MB Disk/Network 1MB Main Memory 0.000,000,000,000Cross Continental Main Memory L1 Cache RefRound Trip Ref Cross Network Round L2 Cache Ref Trip * L1 ref is about 2 clock cycles or 0.7ns. This is the time it takes light to travel 20cm
    4. 4. Taken from “OLTP Throughthe Looking Glass, andWhat We Found There”Harizopoulos et al
    5. 5. In-Memory TimesTen, Database HSQL, KD B Shared Vertica, Gre Nothing enplumb Regular Distributed Exasol,Database In-Memory VoltDB, HanaOracle, Sybase, MySql NoSQL Mongo, Cass andra Data Grid Coherence, Teracotta
    6. 6. Distributed ArchitectureSimplify the Contract Stick to RAM
    7. 7. 450 processes 2TB of RAM Oracle Coherenc eMessaging (Topic Based) as a system of record (persistence)
    8. 8. Access Layer Jav Jav a a clie clie nt API nt APIQuery Layer TransactionData Layer s Mtms CashflowsPersistence Layer
    9. 9. IndexingPartitioning Replication
    10. 10. But your storage is limited bythe memory on a node
    11. 11. Associating data indifferent partitions impliesmoving it.Scalable storage, bandwidthand processing
    12. 12. Trade r Part Version 1 y Trad e Trade r Part Version 2 y Trad e Trade r Part Version 3 y Trad e Trade r Part Version 4 y Trad…and you need eversioning to do MVCC
    13. 13. Trad Trade Part e r y Part TradeTrad y r e Part y Trade rTrad e Part y
    14. 14. So we want to hold entities separately(normalised) to alleviate concerns around consistency and space usage
    15. 15. Independently Data is Versioned Trade Singleton r Part y Trad e Trad Trade Part e r y
    16. 16. Trade r Part y Trad eTrad Trade Part e r y
    17. 17. This is what using Snowflake Schemas and the Connected Replication pattern is all about!
    18. 18. Crosscuttin g KeysCommon Keys
    19. 19. ReplicatedTrader Party Trade Partitioned
    20. 20. Facts:=>Big,commonkeysDimensions=>Small,crosscuttingKeys
    21. 21. Use a Key AssignmentTrade Policy MTMs (e.g. KeyAssociation s in Coherence) Common Key
    22. 22. ReplicatedTrader Party Trade Partitioned
    23. 23. Query LayerTrader Party Trade Transactions Data Layer Mtms Cashflows Fact Storage (Partitioned)
    24. 24. Dimension s (repliacte) Transactions Facts Mtms Cashflows (distribute/ partition)Fact Storage(Partitioned)
    25. 25. Facts:=>Big=>Distribut eDimensions=>Small=> Replicate
    26. 26. We use a variant on a Snowflake Schema topartition big entities that canbe related via a partitioningkey and replicate small stuffwho’s keys can’t map to our partitioning key.
    27. 27. ReplicateDistribute
    28. 28. Select Transaction, MTM,RefrenceData From MTM,Transaction, Ref Where Cost Centre= ‘CC1’
    29. 29. Get Get Get Get Get Get Get Cost Ledger Source Transa MTMs Legs CostCenter Books Books c-tions Center s s Network Time
    30. 30. Get Get Get Get Get Get Get Cost Ledger Source Transac MTMs Legs CostCenters Books Books -tions Centers Network
    31. 31. Select Transaction, MTM, ReferenceData FromMTM, Transaction, Ref Where Cost Centre =‘CC1’ Join Dimensions in Query Layer Transactions Mtms Cashflows Partitioned
    32. 32. Select Transaction, MTM, ReferenceData FromMTM, Transaction, Ref Where Cost Centre =‘CC1’ Join Dimensions in Query Layer Transactions Join Facts Mtms acrossCashflows cluster Partitioned
    33. 33. Select Transaction, MTM, ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’Join Join DimensionsDimensions in Query Layerin QueryLayer Transactions Join FactsMtms across Cashflows cluster Partitioned
    34. 34. JavReplicated a clie Partitioned nt APIDimensions FactsWe never have to do a distributed join!
    35. 35. So all the big stuff is held partitioned And we can join without shipping keys around andhaving intermediate results
    36. 36. Trade r Part y Trad eTrad Trade Part e r y
    37. 37. Trade r Part Version 1 y Trad e Trade r Part Version 2 y Trad e Trade r Part Version 3 y Trad e Trade r Part Version 4 y Trad e
    38. 38. Trad Trade Part e r y Part TradeTrad y r e Part y Trade rTrad e Part y
    39. 39. FactsThis is a dimension • It has a different key to the Facts. Dimensions • And it’s BIG
    40. 40. So we only replicate‘Connected’ or ‘Used’ dimensions
    41. 41. Processing Layer Dimension Caches (Replicated) Transactions Data LayerAs new Facts are added Mtmsrelevant Dimensions thatthey reference are moved Cashflowsto processing layercaches Fact Storage (Partitioned)
    42. 42. Query Layer Save Trade (With connected dimension Caches) Data LayerCache Trad (All Normalised)Store e Partitioned Trigger Cache Party Sourc Ccy Alias e Book
    43. 43. Query Layer (With connected dimension Caches) Data Layer Trad (All Normalised) eParty Sourc CcyAlias e Book
    44. 44. Query Layer (With connected dimension Caches) Data Layer Trad (All Normalised) eParty Sourc CcyAlias e Book Party Ledge rBook
    45. 45. ‘Connected Replication’ A simple pattern whichrecurses through the foreign keys in the domain model, ensuring only ‘Connected’ dimensions are replicated
    46. 46. With ‘Connected Replication’ only 1/10th of the dataneeds to be replicated (on average).
    47. 47. Jav aJava schema clie nt API Java ‘Stored Procedures’ and ‘Triggers’
    48. 48. Query with more than twenty joins conditions:2GB per min / 250Mb/s 3ms latency (per client)
    49. 49. Partitioned Storage

    ×