SlideShare a Scribd company logo
ODC
Beyond The Data Grid: Coherence, Normalisation, Joins
and Linear Scalability

Ben Stopford : RBS
The Story…
                           Industry and academia
                                                         We introduce ODC a
                           have responded with a
The internet era has                                     NoSQL store with a
                           variety of solutions that
moved us away from                                       unique mechanism
                           leverage distribution, use
traditional database                                     for efficiently
                           of a simpler contract and
architecture, now a                                      managing normalised
                           RAM storage.
quarter of a century                                     data.
old.

                                                         We show how we
The result is a highly        Finally we introduce the   adapt the concept of
scalable, in-memory           ‘Connected Replication’    a Snowflake Schema
data store that can           pattern as mechanism       to aid the application
support both millisecond      for making the star        of replication and
queries and high              schema practical for in    partitioning and avoid
bandwidth exports over        memory architectures.      problems with
a normalised object                                      distributed joins.
model.
Database Architecture is Old

Most modern databases still follow a
1970s architecture (for example IBM’s
System R)
“Because RDBMSs can be beaten by more than
     an order of magnitude on the standard OLTP
     benchmark, then there is no market where they
     are competitive. As such, they should be
     considered as legacy technology more than a
     quarter of a century in age, for which a complete
     redesign and re-architecting is the appropriate
     next step.”


       Michael Stonebraker (Creator of Ingres and Postgres)
What steps have we taken
      to improve the
   performance of this
  original architecture?
Improving Database Performance (1)"
Shared Disk Architecture




                             Shared
                              Disk
Improving Database Performance (2)"
Shared Nothing Architecture




                     Shared Nothing
Improving Database Performance (3)"
In Memory Databases
Improving Database Performance (4)"
Distributed In Memory (Shared Nothing)
Improving Database Performance (5)   "
  Distributed Caching

                Distributed Cache
These approaches are converging
                   Shared 
                   Nothing
              Shared 
                                         Nothing
 Regular          Teradata, Vertica,
                      NoSQL…
           (memory)
Database
                               VoltDB, Hstore

Oracle, Sybase,
    MySql
                                          ODC
                  Distributed
                   Caching
                    Coherence,
                     Gemfire,
                    Gigaspaces
So how can we make a data store go
           even faster?

          Distributed Architecture


           Drop ACID: Simplify
              the Contract.


                 Drop disk
(1) Distribution for Scalability: The
Shared Nothing Architecture
•  Originated in 1990 (Gamma
   DB) but popularised by
   Teradata / BigTable /
   NoSQL
•  Massive storage potential
•  Massive scalability of
   processing
•  Commodity hardware
•  Limited by cross partition   Autonomous processing unit
   joins
                                     for a data subset
(2) Simplifying the Contract


•  For many users ACID is overkill.
•  Implementing ACID in a distributed architecture has a
   significant affect on performance.
•  NoSQL Movement: CouchDB, MongoDB,10gen,
   Basho, CouchOne, Cloudant, Cloudera, GoGrid,
   InfiniteGraph, Membase, Riptano, Scality….
Databases have huge operational
          overheads




                              Taken from “OLTP Through
                              the Looking Glass, and What
                              We Found There”
                              Harizopoulos et al
(3) Memory is 100x faster than disk
               ms
   μs
       ns
           ps

1MB Disk/Network
        1MB Main Memory


          0.000,000,000,000
Cross Continental    Main Memory
                  L1 Cache Ref
Round Trip
          Ref
         Cross Network             L2 Cache Ref
         Round Trip
       * L1 ref is about 2 clock cycles or 0.7ns. This is
                           the time it takes light to travel 20cm
Avoid all that overhead

RAM means:
•  No IO
•  Single Threaded
  ⇒ No locking / latching
•  Rapid aggregation etc
•  Query plans become less
   important
We were keen to leverage these three
    factors in building the ODC



                Simplify the   Memory
Distribution
                 contract
      Only
What is the ODC?
                        
Highly distributed, in memory, normalised data
 store designed for scalable data access and
                  processing.
The Concept
Originating from Scott Marcar’s concept of a central
brain within the bank: 

“The copying of data lies at the
route of many of the bank’s
problems. By supplying a single
real-time view that all systems can
interface with we remove the need
for reconciliation and promote the
concept of truly shared services” -
Scott Marcar (Head of Risk and
Finance Technology)
This is quite tricky problem


            High Bandwidth
            Access to Lots
                of Data



                      Low Latency
                        Access to
   Scalability to
                          small
   lots of users
                       amounts of
                          data
ODC Data Grid: Highly Distributed
     Physical Architecture
    In-memory           Lots of parallel
    storage
            processing
         Oracle 
                                           Coherence




Messaging (Topic Based) as a system of record
                (persistence)
The Layers"
  Access Layer      Java
                                    "
                                  Java


                                  
                    client
       client

                     API
          API
 Query Layer




                                            Transactions
  Data Layer




                                                  Mtms

                                               Cashflows
Persistence Layer
But unlike most caches the ODC is
            Normalised
Three Tools of Distributed Data
        Architecture
                  Indexing




  Partitioning
               Replication
For speed, replication is best
         Wherever you go the
         data will be there




          But your storage is limited by
          the memory on a node
For scalability, partitioning is best
Keys Aa-Ap
   Keys Fs-Fz
     Keys Xa-Yd




              Scalable storage, bandwidth
              and processing
Traditional Distributed Caching
             Approach
Keys Aa-Ap
             Keys Fs-Fz
           Keys Xa-Yd




              Trader
                                      Big Denormliased
                         Party
       Objects are spread
                    Trade
            across a distributed
                                      cache
But we believe a data
store needs to be more
than this: it needs to be
      normalised!
So why is that?
Surely denormalisation
 is going to be faster?
Denormalisation means replicating
   parts of your object model
…and that means managing
consistency over lots of copies
… as parts of the object graph will be
copied multiple times
                   Periphery objects that are
Trader
          Party
   denormalised onto core objects
      Trade
       will be duplicated multiple times
                   across the data grid.

                             Party A
…and all the duplication means you
  run out of space really quickly
Spaces issues are exaggerated
        further when data is versioned 
  Trader
            Party
         Version 1
        Trade
                 Trader
                           Party
         Version 2
                       Trade
                                Trader
                                          Party
         Version 3
                                      Trade
                                               Trader
                                                         Party
   Version 4
                                                     Trade
…and you need
versioning to do MVCC
And reconstituting a previous time
  slice becomes very difficult.

   Trade
            Trader
            Party


            Party
   Trader
   Trade


            Party
                     Trader

   Trade
            Party
Why Normalisation?

      Easy to change data (no
  distributed locks / transactions)


      Better use of memory.


       Facilitates Versioning


     And MVCC/Bi-temporal
OK, OK, lets normalise
 our data then. What
  does that mean?
We decompose our
domain model and
 hold each object
    separately
This means the object graph will be
  split across multiple machines.

                       Trader
                                 Party

                             Trade




  Trade
     Trader
                      Party
Binding them back together involves a
“distributed join” => Lots of network hops

                          Trader
                                    Party

                                Trade




   Trade
       Trader
                      Party
It’s going to be slow…
Whereas the denormalised model the
       join is already done
Hence Denormalisation is FAST!"
          (for reads)
So what we want is the advantages of a
normalised store at the speed of a
denormalised one! "
"

                     "
     This is what the ODC is all about!
Looking more closely: Why does
 normalisation mean we have to be
spread data around the cluster. Why
    can’t we hold it all together?
It’s all about the keys
We can collocate data with common keys but if they
 crosscut the only way to collocate is to replicate
     Crosscutting
        Keys




      Common
       Keys
We tackle this problem with a hybrid
               model:

                                      Denormalised
        Trader
                           Party


                  Trade
                                    Normalised
We adapt the concept of a Snowflake
             Schema.
Taking the concept of Facts and
          Dimensions
Everything starts from a Core Fact
          (Trades for us)
Facts are   Big, dimensions are   small
Facts have one key
Dimensions have many"
  (crosscutting) keys
Looking at the data:
         Valuation Legs

             Valuations

art Transaction Mapping

     Cashflow Mapping                                                    Facts:
             Party Alias

            Transaction
                                                                         =>Big, 
             Cashflows                                                   common
                   Legs

                 Parties
                                                                         keys
           Ledger Book

           Source Book
                                                                         Dimensions
            Cost Centre

                Product                                                  =>Small,
  Risk Organisation Unit

          Business Unit
                                                                         crosscutting 
             HCS Entity                                                  Keys
           Set of Books

                           0     37,500,000   75,000,000   112,500,000          150,000,000
We remember we are a grid. We
should avoid the distributed join.
… so we only want to ‘join’ data that
      is in the same process
                           Coherence’s
                          KeyAssociation
                           gives us this
   Trades
      MTMs



             Common
               Key
So we prescribe different physical
storage for Facts and Dimensions

                                           Replicated
       Trader
                          Party


                 Trade
                                     Partitioned
                                   (Key association ensures
                                     joins are in process)
Facts are held distributed,
                               Dimensions are replicated
         Valuation Legs

             Valuations

art Transaction Mapping

     Cashflow Mapping                                                       Facts:
             Party Alias

            Transaction
                                                                            =>Big
             Cashflows                                                      =>Distribute
                   Legs

                 Parties

           Ledger Book

           Source Book
                                                                            Dimensions
            Cost Centre

                Product                                                     =>Small 
  Risk Organisation Unit

          Business Unit
                                                                            => Replicate
             HCS Entity

           Set of Books

                           0        37,500,000   75,000,000   112,500,000         150,000,000
- Facts are partitioned across the data layer"
- Dimensions are replicated across the Query Layer




                                                         Query Layer
     Trader
                Party

           Trade




                                          Transactions




                                                          Data Layer
                                               Mtms

                                            Cashflows



                                       Fact Storage
                                       (Partitioned)
Key Point


   We use a variant on a
   Snowflake Schema to
 partition big stuff, that has
the same key and replicate
     small stuff that has
     crosscutting keys.
So how does they help us to run
    queries without distributed joins?
           Select Transaction, MTM, ReferenceData From
           MTM, Transaction, Ref Where Cost Centre = ‘CC1’



This query involves:
•  Joins between Dimensions: to evaluate where clause
•  Joins between Facts: Transaction joins to MTM
•  Joins between all facts and dimensions needed to
   construct return result
Stage 1: Focus on the where clause:"
        Where Cost Centre = ‘CC1’
Stage 1: Get the right keys to query
             the Facts
      Select Transaction, MTM, ReferenceData From
      MTM, Transaction, Ref Where Cost Centre = ‘CC1’


                                            LBs[]=getLedgerBooksFor(CC1)
                                            SBs[]=getSourceBooksFor(LBs[])
                                            So we have all the bottom level
                                            dimensions needed to query facts



                                         Transactions


                                               Mtms


                                           Cashflows



                                      Partitioned
Stage 2: Cluster Join to get Facts
     Select Transaction, MTM, ReferenceData From
     MTM, Transaction, Ref Where Cost Centre = ‘CC1’


                                           LBs[]=getLedgerBooksFor(CC1)
                                           SBs[]=getSourceBooksFor(LBs[])
                                           So we have all the bottom level
                                           dimensions needed to query facts



                                        Transactions

                              Get all Transactions and
                                              Mtms
                              MTMs (cluster side join) for
                              the passed Source Books
                                           Cashflows



                                     Partitioned
Stage 2: Join the facts together efficiently
    as we know they are collocated
Stage 3: Augment raw Facts with
              relevant Dimensions
                      Select Transaction, MTM, ReferenceData From
                      MTM, Transaction, Ref Where Cost Centre = ‘CC1’


Populate raw facts                                          LBs[]=getLedgerBooksFor(CC1)
(Transactions) with                                         SBs[]=getSourceBooksFor(LBs[])
dimension data
                                                            So we have all the bottom level
before returning to
                                                            dimensions needed to query facts
client.


                                                         Transactions

                                               Get all Transactions and
                                                               Mtms
                                               MTMs (cluster side join) for
                                               the passed Source Books
                                                            Cashflows



                                                      Partitioned
Stage 3: Bind relevant dimensions to
             the result
Bringing it together:

Replicated                  Partitioned
                  Java
                  client


Dimensions
                   Facts
                   API




We never have to do a distributed join!
Coherence Voodoo: Joining
Distributed Facts across the Cluster
                             Related Trades and MTMs
                             (Facts) are collocated on the
                             same machine with Key Affinity. 

   Trades
     MTMs



          Aggregator
                   



 Direct backing map access
 must be used due to threading            http://www.benstopford.com/
 issues in Coherence
                  2009/11/20/how-to-perform-efficient-
                                         cross-cache-joins-in-coherence/
So we are
  normalised


And we can join
 without extra
 network hops
We get to do this…


                        Trader
                                  Party

                              Trade




Trade
        Trader
                      Party
…and this…


Trader
          Party
         Version 1
      Trade
               Trader
                         Party
         Version 2
                     Trade
                              Trader
                                        Party
         Version 3
                                    Trade
                                             Trader
                                                       Party
   Version 4
                                                   Trade
..and this..


Trade
            Trader
         Party


         Party
   Trader
Trade


         Party
                  Trader

Trade
         Party
…without the problems of this…
…or this..
..all at the speed of this… well almost!
But there is a fly in the ointment…
I lied earlier. These aren’t all Facts.
        Valuation Legs

            Valuations

rt Transaction Mapping

    Cashflow Mapping

            Party Alias
                                                       Facts
           Transaction

            Cashflows

                  Legs

                Parties       This is a dimension
          Ledger Book
                               •  It has a different
          Source Book

           Cost Centre            key to the Facts.
   Dimensions
               Product
                               •  And it’s BIG
 Risk Organisation Unit

         Business Unit

            HCS Entity

          Set of Books

                          0                                     125,000,000
We can’t replicate really big stuff… we’ll
run out of space"
 => Big Dimensions are a problem.
Fortunately we found a simple
solution!
We noticed that whilst there are lots
of these big dimensions, we didn’t
actually use a lot of them. They are
not all “connected”.
If there are no Trades for Barclays in
the data store then a Trade Query will
never need the Barclays Counterparty
Looking at the All Dimension Data
                   some are quite large
           Party Alias



               Parties



         Ledger Book



         Source Book



          Cost Centre



              Product



Risk Organisation Unit



        Business Unit



           HCS Entity



         Set of Books


                         0   1,250,000   2,500,000   3,750,000   5,000,000
But Connected Dimension Data is tiny
               by comparison
           Party Alias



               Parties



         Ledger Book



         Source Book



          Cost Centre



              Product



Risk Organisation Unit



        Business Unit



           HCS Entity



         Set of Books


                         20   1,250,015   2,500,010   3,750,005   5,000,000
So we only replicate
‘Connected’ or ‘Used’
     dimensions
As data is written to the data store we keep our
       ‘Connected Caches’ up to date




                                                          Processing Layer
                              Dimension Caches
                                 (Replicated)


                                           Transactions




                                                            Data Layer
       As new Facts are added                    Mtms
       relevant Dimensions that
       they reference are moved
                                             Cashflows
       to processing layer caches


                                        Fact Storage
                                        (Partitioned)
Coherence Voodoo: ‘Connected
         Replication’
The Replicated Layer is updated by
recursing through the arcs on the
domain model when facts change
Saving a trade causes all it’s 1st level
              references to be triggered

                                      Query Layer
     Save Trade
                      (With connected
                                      dimension Caches)

                                      Data Layer
Cache
              Trade
                  (All Normalised)
Store

                                             Partitioned 
              Trigger
   Source              Cache
     Party                         Ccy
     Alias
               Book
This updates the connected caches


                              Query Layer
                              (With connected
                              dimension Caches)

                              Data Layer
         Trade
               (All Normalised)



Party             Source   Ccy
Alias
             Book
The process recurses through the object
                graph

                                        Query Layer
                                        (With connected
                                        dimension Caches)

                                        Data Layer
         Trade
                         (All Normalised)



Party              Source            Ccy
Alias
              Book
 

          Party
            Ledger
           
                 Book
‘Connected Replication’
    A simple pattern which
recurses through the foreign
 keys in the domain model,
 ensuring only ‘Connected’
  dimensions are replicated
Limitations of this approach

•  Data set size. Size of connected dimensions limits
   scalability.
•  Joins are only supported between “Facts” that can
   share a partitioning key (But any dimension join can be
   supported)
Performance is very sensitive to
serialisation costs: Avoid with POF

   Deserialise just one field
    from the object stream




                 Binary Value
Integer ID
Other cool stuff 
 (very briefly)
Everything is Java
               Java
               client

Java schema
    API
     Java ‘Stored
                          Procedures’
                         and ‘Triggers’
Messaging as a System of Record"
                                     "  ODC provides a realtime
 Access Layer




                            Java      Java
                            client!

                             API!
                                   
    view over any part of the
                                      client!


                                        dataset as messaging is the
                                       API!


                                                               used as the system of
                                                               record.
Processing Layer




                                                Transactions
                                                               Messaging provides a more
 Data Layer




                                                      Mtms     scalable system of record
                                                   Cashflows   than a database would.
Persistence Layer
Being event based changes the
     programming model.
         The system provides both real
        time and query based views on
                   the data. 


           The two are linked using
                 versioning


          Replication to DR, DB, fact
                 aggregation
API – Queries utilise a fluent interface
Performance


         Query with more
         than twenty joins
            conditions:


2GB per min /
  250Mb/s 
           3ms latency
 (per client)
Conclusion


Data warehousing, OLTP and Distributed
caching fields are all converging on in-
memory architectures to get away from
disk induced latencies.
Conclusion

Shared nothing architectures are always
subject to the distributed join problem if
they are to retain a degree of normalisation.
Conclusion

We present a novel mechanism for
avoiding the distributed join problem by
using a Star Schema to define whether
data should be replicated or partitioned.





                        Partitioned
                         Storage
Conclusion

We make the pattern applicable to ‘real’
data models by only replicating objects
that are actually used: the Connected
Replication pattern.
The End
•  Further details online http://www.benstopford.com
   (linked from my Qcon bio)
•  A big thanks to the team in both India and the UK who
   built this thing.
•  Questions?

More Related Content

What's hot

Dynamodb Presentation
Dynamodb PresentationDynamodb Presentation
Dynamodb Presentation
advaitdeo
 
Couchbase 101
Couchbase 101 Couchbase 101
Couchbase 101
Dipti Borkar
 
Introduction to couchbase
Introduction to couchbaseIntroduction to couchbase
Introduction to couchbaseDipti Borkar
 
Deploying MediaWiki On IBM DB2 in The Cloud Presentation
Deploying MediaWiki On IBM DB2 in The Cloud PresentationDeploying MediaWiki On IBM DB2 in The Cloud Presentation
Deploying MediaWiki On IBM DB2 in The Cloud PresentationLeons Petražickis
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache Kafka
Brian Ritchie
 
NoSql presentation
NoSql presentationNoSql presentation
NoSql presentation
Mat Wall
 
NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and when
Lorenzo Alberton
 
Couchbase Day
Couchbase DayCouchbase Day
Couchbase Day
Idan Tohami
 
JasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max SchiresonJasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max SchiresonMongoDB
 
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf
 
Introduction to NuoDB
Introduction to NuoDBIntroduction to NuoDB
Introduction to NuoDBSandun Perera
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
Gwen (Chen) Shapira
 
Revolutionizing the customer experience - Hello Engagement Database
Revolutionizing the customer experience - Hello Engagement DatabaseRevolutionizing the customer experience - Hello Engagement Database
Revolutionizing the customer experience - Hello Engagement Database
Dipti Borkar
 
Cassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large NodesCassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large Nodes
aaronmorton
 
MySQL NDB Cluster 101
MySQL NDB Cluster 101MySQL NDB Cluster 101
MySQL NDB Cluster 101
Bernd Ocklin
 
Nosql intro
Nosql introNosql intro
Nosql intro
Hoang Nguyen
 
Database Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big DataDatabase Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big Data
exponential-inc
 
Scalable Web Architecture and Distributed Systems
Scalable Web Architecture and Distributed SystemsScalable Web Architecture and Distributed Systems
Scalable Web Architecture and Distributed Systems
hyun soomyung
 
MongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
RTigger
 

What's hot (20)

Dynamodb Presentation
Dynamodb PresentationDynamodb Presentation
Dynamodb Presentation
 
Couchbase 101
Couchbase 101 Couchbase 101
Couchbase 101
 
Introduction to couchbase
Introduction to couchbaseIntroduction to couchbase
Introduction to couchbase
 
Deploying MediaWiki On IBM DB2 in The Cloud Presentation
Deploying MediaWiki On IBM DB2 in The Cloud PresentationDeploying MediaWiki On IBM DB2 in The Cloud Presentation
Deploying MediaWiki On IBM DB2 in The Cloud Presentation
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache Kafka
 
NoSql presentation
NoSql presentationNoSql presentation
NoSql presentation
 
NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and when
 
Couchbase Day
Couchbase DayCouchbase Day
Couchbase Day
 
JasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max SchiresonJasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max Schireson
 
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7
 
Introduction to NuoDB
Introduction to NuoDBIntroduction to NuoDB
Introduction to NuoDB
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
 
Revolutionizing the customer experience - Hello Engagement Database
Revolutionizing the customer experience - Hello Engagement DatabaseRevolutionizing the customer experience - Hello Engagement Database
Revolutionizing the customer experience - Hello Engagement Database
 
Cassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large NodesCassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large Nodes
 
MySQL NDB Cluster 101
MySQL NDB Cluster 101MySQL NDB Cluster 101
MySQL NDB Cluster 101
 
Nosql intro
Nosql introNosql intro
Nosql intro
 
Database Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big DataDatabase Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big Data
 
Scalable Web Architecture and Distributed Systems
Scalable Web Architecture and Distributed SystemsScalable Web Architecture and Distributed Systems
Scalable Web Architecture and Distributed Systems
 
MongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB and AWS Best Practices
MongoDB and AWS Best Practices
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
 

Viewers also liked

Coherence Implementation Patterns - Sig Nov 2011
Coherence Implementation Patterns - Sig Nov 2011Coherence Implementation Patterns - Sig Nov 2011
Coherence Implementation Patterns - Sig Nov 2011
Ben Stopford
 
Data Pipelines with Apache Kafka
Data Pipelines with Apache KafkaData Pipelines with Apache Kafka
Data Pipelines with Apache Kafka
Ben Stopford
 
Test-Oriented Languages: Is it time for a new era?
Test-Oriented Languages: Is it time for a new era?Test-Oriented Languages: Is it time for a new era?
Test-Oriented Languages: Is it time for a new era?
Ben Stopford
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
Ben Stopford
 
Ideas for Distributing Skills Across a Continental Divide
Ideas for Distributing Skills Across a Continental DivideIdeas for Distributing Skills Across a Continental Divide
Ideas for Distributing Skills Across a Continental Divide
Ben Stopford
 
Refactoring tested code - has mocking gone wrong?
Refactoring tested code - has mocking gone wrong?Refactoring tested code - has mocking gone wrong?
Refactoring tested code - has mocking gone wrong?
Ben Stopford
 
Network latency - measurement and improvement
Network latency - measurement and improvementNetwork latency - measurement and improvement
Network latency - measurement and improvement
Matt Willsher
 
Big Data & the Enterprise
Big Data & the EnterpriseBig Data & the Enterprise
Big Data & the Enterprise
Ben Stopford
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)Ben Stopford
 

Viewers also liked (9)

Coherence Implementation Patterns - Sig Nov 2011
Coherence Implementation Patterns - Sig Nov 2011Coherence Implementation Patterns - Sig Nov 2011
Coherence Implementation Patterns - Sig Nov 2011
 
Data Pipelines with Apache Kafka
Data Pipelines with Apache KafkaData Pipelines with Apache Kafka
Data Pipelines with Apache Kafka
 
Test-Oriented Languages: Is it time for a new era?
Test-Oriented Languages: Is it time for a new era?Test-Oriented Languages: Is it time for a new era?
Test-Oriented Languages: Is it time for a new era?
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
 
Ideas for Distributing Skills Across a Continental Divide
Ideas for Distributing Skills Across a Continental DivideIdeas for Distributing Skills Across a Continental Divide
Ideas for Distributing Skills Across a Continental Divide
 
Refactoring tested code - has mocking gone wrong?
Refactoring tested code - has mocking gone wrong?Refactoring tested code - has mocking gone wrong?
Refactoring tested code - has mocking gone wrong?
 
Network latency - measurement and improvement
Network latency - measurement and improvementNetwork latency - measurement and improvement
Network latency - measurement and improvement
 
Big Data & the Enterprise
Big Data & the EnterpriseBig Data & the Enterprise
Big Data & the Enterprise
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)
 

Similar to Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability

Advanced databases ben stopford
Advanced databases   ben stopfordAdvanced databases   ben stopford
Advanced databases ben stopfordBen Stopford
 
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
Ben Stopford
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12
mark madsen
 
Database Revolution - Exploratory Webcast
Database Revolution - Exploratory WebcastDatabase Revolution - Exploratory Webcast
Database Revolution - Exploratory Webcast
Inside Analysis
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Ben Stopford
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
BigDataCloud
 
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,..."Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
lisapaglia
 
Millions quotes per second in pure java
Millions quotes per second in pure javaMillions quotes per second in pure java
Millions quotes per second in pure javaRoman Elizarov
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
shnkr_rmchndrn
 
Wolfgang Lehner Technische Universitat Dresden
Wolfgang Lehner Technische Universitat DresdenWolfgang Lehner Technische Universitat Dresden
Wolfgang Lehner Technische Universitat Dresden
InfinIT - Innovationsnetværket for it
 
SQL and NoSQL in SQL Server
SQL and NoSQL in SQL ServerSQL and NoSQL in SQL Server
SQL and NoSQL in SQL Server
Michael Rys
 
Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Learning from google megastore (Part-1)
Learning from google megastore (Part-1)
Schubert Zhang
 
Caching principles-solutions
Caching principles-solutionsCaching principles-solutions
Caching principles-solutionspmanvi
 
Cache and consistency in nosql
Cache and consistency in nosqlCache and consistency in nosql
Cache and consistency in nosqlJoão Gabriel Lima
 
Clustering van IT-componenten
Clustering van IT-componentenClustering van IT-componenten
Clustering van IT-componenten
Richard Claassens CIPPE
 
CodeFutures - Scaling Your Database in the Cloud
CodeFutures - Scaling Your Database in the CloudCodeFutures - Scaling Your Database in the Cloud
CodeFutures - Scaling Your Database in the Cloud
RightScale
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion Queries
Bernd Ocklin
 
Azure and cloud design patterns
Azure and cloud design patternsAzure and cloud design patterns
Azure and cloud design patterns
Venkatesh Narayanan
 
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
Felix Gessert
 
Performance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsPerformance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ Applications
Michael Kopp
 

Similar to Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability (20)

Advanced databases ben stopford
Advanced databases   ben stopfordAdvanced databases   ben stopford
Advanced databases ben stopford
 
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12
 
Database Revolution - Exploratory Webcast
Database Revolution - Exploratory WebcastDatabase Revolution - Exploratory Webcast
Database Revolution - Exploratory Webcast
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
 
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,..."Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
 
Millions quotes per second in pure java
Millions quotes per second in pure javaMillions quotes per second in pure java
Millions quotes per second in pure java
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
Wolfgang Lehner Technische Universitat Dresden
Wolfgang Lehner Technische Universitat DresdenWolfgang Lehner Technische Universitat Dresden
Wolfgang Lehner Technische Universitat Dresden
 
SQL and NoSQL in SQL Server
SQL and NoSQL in SQL ServerSQL and NoSQL in SQL Server
SQL and NoSQL in SQL Server
 
Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Learning from google megastore (Part-1)
Learning from google megastore (Part-1)
 
Caching principles-solutions
Caching principles-solutionsCaching principles-solutions
Caching principles-solutions
 
Cache and consistency in nosql
Cache and consistency in nosqlCache and consistency in nosql
Cache and consistency in nosql
 
Clustering van IT-componenten
Clustering van IT-componentenClustering van IT-componenten
Clustering van IT-componenten
 
CodeFutures - Scaling Your Database in the Cloud
CodeFutures - Scaling Your Database in the CloudCodeFutures - Scaling Your Database in the Cloud
CodeFutures - Scaling Your Database in the Cloud
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion Queries
 
Azure and cloud design patterns
Azure and cloud design patternsAzure and cloud design patterns
Azure and cloud design patterns
 
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
 
Performance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsPerformance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ Applications
 

More from Ben Stopford

10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event-Driven Microservices with Apache Kafka10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event-Driven Microservices with Apache Kafka
Ben Stopford
 
10 Principals for Effective Event Driven Microservices
10 Principals for Effective Event Driven Microservices10 Principals for Effective Event Driven Microservices
10 Principals for Effective Event Driven Microservices
Ben Stopford
 
The Future of Streaming: Global Apps, Event Stores and Serverless
The Future of Streaming: Global Apps, Event Stores and ServerlessThe Future of Streaming: Global Apps, Event Stores and Serverless
The Future of Streaming: Global Apps, Event Stores and Serverless
Ben Stopford
 
A Global Source of Truth for the Microservices Generation
A Global Source of Truth for the Microservices GenerationA Global Source of Truth for the Microservices Generation
A Global Source of Truth for the Microservices Generation
Ben Stopford
 
Building Event Driven Services with Kafka Streams
Building Event Driven Services with Kafka StreamsBuilding Event Driven Services with Kafka Streams
Building Event Driven Services with Kafka Streams
Ben Stopford
 
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with StreamsNDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
Ben Stopford
 
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Ben Stopford
 
Building Event Driven Services with Stateful Streams
Building Event Driven Services with Stateful StreamsBuilding Event Driven Services with Stateful Streams
Building Event Driven Services with Stateful Streams
Ben Stopford
 
Devoxx London 2017 - Rethinking Services With Stateful Streams
Devoxx London 2017 - Rethinking Services With Stateful StreamsDevoxx London 2017 - Rethinking Services With Stateful Streams
Devoxx London 2017 - Rethinking Services With Stateful Streams
Ben Stopford
 
Event Driven Services Part 2: Building Event-Driven Services with Apache Kafka
Event Driven Services Part 2:  Building Event-Driven Services with Apache KafkaEvent Driven Services Part 2:  Building Event-Driven Services with Apache Kafka
Event Driven Services Part 2: Building Event-Driven Services with Apache Kafka
Ben Stopford
 
Event Driven Services Part 1: The Data Dichotomy
Event Driven Services Part 1: The Data Dichotomy Event Driven Services Part 1: The Data Dichotomy
Event Driven Services Part 1: The Data Dichotomy
Ben Stopford
 
Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...
Ben Stopford
 
Strata Software Architecture NY: The Data Dichotomy
Strata Software Architecture NY: The Data DichotomyStrata Software Architecture NY: The Data Dichotomy
Strata Software Architecture NY: The Data Dichotomy
Ben Stopford
 
A little bit of clojure
A little bit of clojureA little bit of clojure
A little bit of clojureBen Stopford
 

More from Ben Stopford (14)

10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event-Driven Microservices with Apache Kafka10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event-Driven Microservices with Apache Kafka
 
10 Principals for Effective Event Driven Microservices
10 Principals for Effective Event Driven Microservices10 Principals for Effective Event Driven Microservices
10 Principals for Effective Event Driven Microservices
 
The Future of Streaming: Global Apps, Event Stores and Serverless
The Future of Streaming: Global Apps, Event Stores and ServerlessThe Future of Streaming: Global Apps, Event Stores and Serverless
The Future of Streaming: Global Apps, Event Stores and Serverless
 
A Global Source of Truth for the Microservices Generation
A Global Source of Truth for the Microservices GenerationA Global Source of Truth for the Microservices Generation
A Global Source of Truth for the Microservices Generation
 
Building Event Driven Services with Kafka Streams
Building Event Driven Services with Kafka StreamsBuilding Event Driven Services with Kafka Streams
Building Event Driven Services with Kafka Streams
 
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with StreamsNDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
 
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
 
Building Event Driven Services with Stateful Streams
Building Event Driven Services with Stateful StreamsBuilding Event Driven Services with Stateful Streams
Building Event Driven Services with Stateful Streams
 
Devoxx London 2017 - Rethinking Services With Stateful Streams
Devoxx London 2017 - Rethinking Services With Stateful StreamsDevoxx London 2017 - Rethinking Services With Stateful Streams
Devoxx London 2017 - Rethinking Services With Stateful Streams
 
Event Driven Services Part 2: Building Event-Driven Services with Apache Kafka
Event Driven Services Part 2:  Building Event-Driven Services with Apache KafkaEvent Driven Services Part 2:  Building Event-Driven Services with Apache Kafka
Event Driven Services Part 2: Building Event-Driven Services with Apache Kafka
 
Event Driven Services Part 1: The Data Dichotomy
Event Driven Services Part 1: The Data Dichotomy Event Driven Services Part 1: The Data Dichotomy
Event Driven Services Part 1: The Data Dichotomy
 
Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...
 
Strata Software Architecture NY: The Data Dichotomy
Strata Software Architecture NY: The Data DichotomyStrata Software Architecture NY: The Data Dichotomy
Strata Software Architecture NY: The Data Dichotomy
 
A little bit of clojure
A little bit of clojureA little bit of clojure
A little bit of clojure
 

Recently uploaded

By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
UiPathCommunity
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 

Recently uploaded (20)

By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 

Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability

  • 1. ODC Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability Ben Stopford : RBS
  • 2. The Story… Industry and academia We introduce ODC a have responded with a The internet era has NoSQL store with a variety of solutions that moved us away from unique mechanism leverage distribution, use traditional database for efficiently of a simpler contract and architecture, now a managing normalised RAM storage. quarter of a century data. old. We show how we The result is a highly Finally we introduce the adapt the concept of scalable, in-memory ‘Connected Replication’ a Snowflake Schema data store that can pattern as mechanism to aid the application support both millisecond for making the star of replication and queries and high schema practical for in partitioning and avoid bandwidth exports over memory architectures. problems with a normalised object distributed joins. model.
  • 3. Database Architecture is Old Most modern databases still follow a 1970s architecture (for example IBM’s System R)
  • 4. “Because RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark, then there is no market where they are competitive. As such, they should be considered as legacy technology more than a quarter of a century in age, for which a complete redesign and re-architecting is the appropriate next step.” Michael Stonebraker (Creator of Ingres and Postgres)
  • 5. What steps have we taken to improve the performance of this original architecture?
  • 6. Improving Database Performance (1)" Shared Disk Architecture Shared Disk
  • 7. Improving Database Performance (2)" Shared Nothing Architecture Shared Nothing
  • 8. Improving Database Performance (3)" In Memory Databases
  • 9. Improving Database Performance (4)" Distributed In Memory (Shared Nothing)
  • 10. Improving Database Performance (5) " Distributed Caching Distributed Cache
  • 11. These approaches are converging Shared Nothing Shared Nothing Regular Teradata, Vertica, NoSQL… (memory) Database VoltDB, Hstore Oracle, Sybase, MySql ODC Distributed Caching Coherence, Gemfire, Gigaspaces
  • 12. So how can we make a data store go even faster? Distributed Architecture Drop ACID: Simplify the Contract. Drop disk
  • 13. (1) Distribution for Scalability: The Shared Nothing Architecture •  Originated in 1990 (Gamma DB) but popularised by Teradata / BigTable / NoSQL •  Massive storage potential •  Massive scalability of processing •  Commodity hardware •  Limited by cross partition Autonomous processing unit joins for a data subset
  • 14. (2) Simplifying the Contract •  For many users ACID is overkill. •  Implementing ACID in a distributed architecture has a significant affect on performance. •  NoSQL Movement: CouchDB, MongoDB,10gen, Basho, CouchOne, Cloudant, Cloudera, GoGrid, InfiniteGraph, Membase, Riptano, Scality….
  • 15. Databases have huge operational overheads Taken from “OLTP Through the Looking Glass, and What We Found There” Harizopoulos et al
  • 16. (3) Memory is 100x faster than disk ms μs ns ps 1MB Disk/Network 1MB Main Memory 0.000,000,000,000 Cross Continental Main Memory L1 Cache Ref Round Trip Ref Cross Network L2 Cache Ref Round Trip * L1 ref is about 2 clock cycles or 0.7ns. This is the time it takes light to travel 20cm
  • 17. Avoid all that overhead RAM means: •  No IO •  Single Threaded ⇒ No locking / latching •  Rapid aggregation etc •  Query plans become less important
  • 18. We were keen to leverage these three factors in building the ODC Simplify the Memory Distribution contract Only
  • 19. What is the ODC? Highly distributed, in memory, normalised data store designed for scalable data access and processing.
  • 20. The Concept Originating from Scott Marcar’s concept of a central brain within the bank: “The copying of data lies at the route of many of the bank’s problems. By supplying a single real-time view that all systems can interface with we remove the need for reconciliation and promote the concept of truly shared services” - Scott Marcar (Head of Risk and Finance Technology)
  • 21. This is quite tricky problem High Bandwidth Access to Lots of Data Low Latency Access to Scalability to small lots of users amounts of data
  • 22. ODC Data Grid: Highly Distributed Physical Architecture In-memory Lots of parallel storage processing Oracle Coherence Messaging (Topic Based) as a system of record (persistence)
  • 23. The Layers" Access Layer Java " Java client client API API Query Layer Transactions Data Layer Mtms Cashflows Persistence Layer
  • 24. But unlike most caches the ODC is Normalised
  • 25. Three Tools of Distributed Data Architecture Indexing Partitioning Replication
  • 26. For speed, replication is best Wherever you go the data will be there But your storage is limited by the memory on a node
  • 27. For scalability, partitioning is best Keys Aa-Ap Keys Fs-Fz Keys Xa-Yd Scalable storage, bandwidth and processing
  • 28. Traditional Distributed Caching Approach Keys Aa-Ap Keys Fs-Fz Keys Xa-Yd Trader Big Denormliased Party Objects are spread Trade across a distributed cache
  • 29. But we believe a data store needs to be more than this: it needs to be normalised!
  • 30. So why is that? Surely denormalisation is going to be faster?
  • 31. Denormalisation means replicating parts of your object model
  • 32. …and that means managing consistency over lots of copies
  • 33. … as parts of the object graph will be copied multiple times Periphery objects that are Trader Party denormalised onto core objects Trade will be duplicated multiple times across the data grid. Party A
  • 34. …and all the duplication means you run out of space really quickly
  • 35. Spaces issues are exaggerated further when data is versioned Trader Party Version 1 Trade Trader Party Version 2 Trade Trader Party Version 3 Trade Trader Party Version 4 Trade …and you need versioning to do MVCC
  • 36. And reconstituting a previous time slice becomes very difficult. Trade Trader Party Party Trader Trade Party Trader Trade Party
  • 37. Why Normalisation? Easy to change data (no distributed locks / transactions) Better use of memory. Facilitates Versioning And MVCC/Bi-temporal
  • 38. OK, OK, lets normalise our data then. What does that mean?
  • 39. We decompose our domain model and hold each object separately
  • 40. This means the object graph will be split across multiple machines. Trader Party Trade Trade Trader Party
  • 41. Binding them back together involves a “distributed join” => Lots of network hops Trader Party Trade Trade Trader Party
  • 42. It’s going to be slow…
  • 43. Whereas the denormalised model the join is already done
  • 44. Hence Denormalisation is FAST!" (for reads)
  • 45. So what we want is the advantages of a normalised store at the speed of a denormalised one! " " " This is what the ODC is all about!
  • 46. Looking more closely: Why does normalisation mean we have to be spread data around the cluster. Why can’t we hold it all together?
  • 47. It’s all about the keys
  • 48. We can collocate data with common keys but if they crosscut the only way to collocate is to replicate Crosscutting Keys Common Keys
  • 49. We tackle this problem with a hybrid model: Denormalised Trader Party Trade Normalised
  • 50. We adapt the concept of a Snowflake Schema.
  • 51. Taking the concept of Facts and Dimensions
  • 52. Everything starts from a Core Fact (Trades for us)
  • 53. Facts are Big, dimensions are small
  • 55. Dimensions have many" (crosscutting) keys
  • 56. Looking at the data: Valuation Legs Valuations art Transaction Mapping Cashflow Mapping Facts: Party Alias Transaction =>Big, Cashflows common Legs Parties keys Ledger Book Source Book Dimensions Cost Centre Product =>Small, Risk Organisation Unit Business Unit crosscutting HCS Entity Keys Set of Books 0 37,500,000 75,000,000 112,500,000 150,000,000
  • 57. We remember we are a grid. We should avoid the distributed join.
  • 58. … so we only want to ‘join’ data that is in the same process Coherence’s KeyAssociation gives us this Trades MTMs Common Key
  • 59. So we prescribe different physical storage for Facts and Dimensions Replicated Trader Party Trade Partitioned (Key association ensures joins are in process)
  • 60. Facts are held distributed, Dimensions are replicated Valuation Legs Valuations art Transaction Mapping Cashflow Mapping Facts: Party Alias Transaction =>Big Cashflows =>Distribute Legs Parties Ledger Book Source Book Dimensions Cost Centre Product =>Small Risk Organisation Unit Business Unit => Replicate HCS Entity Set of Books 0 37,500,000 75,000,000 112,500,000 150,000,000
  • 61. - Facts are partitioned across the data layer" - Dimensions are replicated across the Query Layer Query Layer Trader Party Trade Transactions Data Layer Mtms Cashflows Fact Storage (Partitioned)
  • 62. Key Point We use a variant on a Snowflake Schema to partition big stuff, that has the same key and replicate small stuff that has crosscutting keys.
  • 63. So how does they help us to run queries without distributed joins? Select Transaction, MTM, ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’ This query involves: •  Joins between Dimensions: to evaluate where clause •  Joins between Facts: Transaction joins to MTM •  Joins between all facts and dimensions needed to construct return result
  • 64. Stage 1: Focus on the where clause:" Where Cost Centre = ‘CC1’
  • 65. Stage 1: Get the right keys to query the Facts Select Transaction, MTM, ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’ LBs[]=getLedgerBooksFor(CC1) SBs[]=getSourceBooksFor(LBs[]) So we have all the bottom level dimensions needed to query facts Transactions Mtms Cashflows Partitioned
  • 66. Stage 2: Cluster Join to get Facts Select Transaction, MTM, ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’ LBs[]=getLedgerBooksFor(CC1) SBs[]=getSourceBooksFor(LBs[]) So we have all the bottom level dimensions needed to query facts Transactions Get all Transactions and Mtms MTMs (cluster side join) for the passed Source Books Cashflows Partitioned
  • 67. Stage 2: Join the facts together efficiently as we know they are collocated
  • 68. Stage 3: Augment raw Facts with relevant Dimensions Select Transaction, MTM, ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’ Populate raw facts LBs[]=getLedgerBooksFor(CC1) (Transactions) with SBs[]=getSourceBooksFor(LBs[]) dimension data So we have all the bottom level before returning to dimensions needed to query facts client. Transactions Get all Transactions and Mtms MTMs (cluster side join) for the passed Source Books Cashflows Partitioned
  • 69. Stage 3: Bind relevant dimensions to the result
  • 70. Bringing it together: Replicated Partitioned Java client Dimensions Facts API We never have to do a distributed join!
  • 71. Coherence Voodoo: Joining Distributed Facts across the Cluster Related Trades and MTMs (Facts) are collocated on the same machine with Key Affinity. Trades MTMs Aggregator Direct backing map access must be used due to threading http://www.benstopford.com/ issues in Coherence 2009/11/20/how-to-perform-efficient- cross-cache-joins-in-coherence/
  • 72. So we are normalised And we can join without extra network hops
  • 73. We get to do this… Trader Party Trade Trade Trader Party
  • 74. …and this… Trader Party Version 1 Trade Trader Party Version 2 Trade Trader Party Version 3 Trade Trader Party Version 4 Trade
  • 75. ..and this.. Trade Trader Party Party Trader Trade Party Trader Trade Party
  • 78. ..all at the speed of this… well almost!
  • 79.
  • 80. But there is a fly in the ointment…
  • 81. I lied earlier. These aren’t all Facts. Valuation Legs Valuations rt Transaction Mapping Cashflow Mapping Party Alias Facts Transaction Cashflows Legs Parties This is a dimension Ledger Book •  It has a different Source Book Cost Centre key to the Facts. Dimensions Product •  And it’s BIG Risk Organisation Unit Business Unit HCS Entity Set of Books 0 125,000,000
  • 82. We can’t replicate really big stuff… we’ll run out of space" => Big Dimensions are a problem.
  • 83. Fortunately we found a simple solution!
  • 84. We noticed that whilst there are lots of these big dimensions, we didn’t actually use a lot of them. They are not all “connected”.
  • 85. If there are no Trades for Barclays in the data store then a Trade Query will never need the Barclays Counterparty
  • 86. Looking at the All Dimension Data some are quite large Party Alias Parties Ledger Book Source Book Cost Centre Product Risk Organisation Unit Business Unit HCS Entity Set of Books 0 1,250,000 2,500,000 3,750,000 5,000,000
  • 87. But Connected Dimension Data is tiny by comparison Party Alias Parties Ledger Book Source Book Cost Centre Product Risk Organisation Unit Business Unit HCS Entity Set of Books 20 1,250,015 2,500,010 3,750,005 5,000,000
  • 88. So we only replicate ‘Connected’ or ‘Used’ dimensions
  • 89. As data is written to the data store we keep our ‘Connected Caches’ up to date Processing Layer Dimension Caches (Replicated) Transactions Data Layer As new Facts are added Mtms relevant Dimensions that they reference are moved Cashflows to processing layer caches Fact Storage (Partitioned)
  • 91. The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
  • 92. Saving a trade causes all it’s 1st level references to be triggered Query Layer Save Trade (With connected dimension Caches) Data Layer Cache Trade (All Normalised) Store Partitioned Trigger Source Cache Party Ccy Alias Book
  • 93. This updates the connected caches Query Layer (With connected dimension Caches) Data Layer Trade (All Normalised) Party Source Ccy Alias Book
  • 94. The process recurses through the object graph Query Layer (With connected dimension Caches) Data Layer Trade (All Normalised) Party Source Ccy Alias Book Party Ledger Book
  • 95. ‘Connected Replication’ A simple pattern which recurses through the foreign keys in the domain model, ensuring only ‘Connected’ dimensions are replicated
  • 96. Limitations of this approach •  Data set size. Size of connected dimensions limits scalability. •  Joins are only supported between “Facts” that can share a partitioning key (But any dimension join can be supported)
  • 97. Performance is very sensitive to serialisation costs: Avoid with POF Deserialise just one field from the object stream Binary Value Integer ID
  • 98. Other cool stuff (very briefly)
  • 99. Everything is Java Java client Java schema API Java ‘Stored Procedures’ and ‘Triggers’
  • 100. Messaging as a System of Record" " ODC provides a realtime Access Layer Java Java client! API! view over any part of the client! dataset as messaging is the API! used as the system of record. Processing Layer Transactions Messaging provides a more Data Layer Mtms scalable system of record Cashflows than a database would. Persistence Layer
  • 101. Being event based changes the programming model. The system provides both real time and query based views on the data. The two are linked using versioning Replication to DR, DB, fact aggregation
  • 102. API – Queries utilise a fluent interface
  • 103. Performance Query with more than twenty joins conditions: 2GB per min / 250Mb/s 3ms latency (per client)
  • 104. Conclusion Data warehousing, OLTP and Distributed caching fields are all converging on in- memory architectures to get away from disk induced latencies.
  • 105. Conclusion Shared nothing architectures are always subject to the distributed join problem if they are to retain a degree of normalisation.
  • 106. Conclusion We present a novel mechanism for avoiding the distributed join problem by using a Star Schema to define whether data should be replicated or partitioned. Partitioned Storage
  • 107. Conclusion We make the pattern applicable to ‘real’ data models by only replicating objects that are actually used: the Connected Replication pattern.
  • 108. The End •  Further details online http://www.benstopford.com (linked from my Qcon bio) •  A big thanks to the team in both India and the UK who built this thing. •  Questions?

Editor's Notes

  1. This avoids distributed transactions that would be needed when updating dimensions in a denormalised model. It also reduces memory consumption which can become prohibitive in MVCC systems. This then facilitates a storage topology that eradicates cross-machine joins (joins between data sets held on different machines which require that entire key sets are shipped to perform the join operation).
  2. Big data sets are held distributed and only joined on the grid to collocated objects.Small data sets are held in replicated caches so they can be joined in process (only ‘active’ data is held)
  3. Big data sets are held distributed and only joined on the grid to collocated objects.Small data sets are held in replicated caches so they can be joined in process (only ‘active’ data is held)