SCALABLE DATABASES
          From Relational Databases
            To Polyglot Persistence



                            ...
About Me
●   Software architect and engineer
    ●   Gioco Digitale (online gambling and casinos)
●   Open Source enthusia...
Five fallacies of data-centric systems


      Data model is static.
   Data volume is predictable.
 Data access load is p...
Scalable databases in action
●   Scaling your database as a way to solve fallacies above.
    ●   Scale to handle heteroge...
Scaling Relational Databases




         Sergio Bossa – sergio.bossa@gmail.com
         Javaday IV – Roma – 30 gennaio 20...
Master-Slave replication
●   Master - Slave replication.
    ●   One (and only one) master
        database.
    ●   One o...
Master-Master replication
●   Master - Master replication.
    ●   One or more masters.
    ●   Writes and reads can go to...
Vertical partitioning
●   Vertical partitioning.
    ●   Put tables belonging to different
        functional areas on dif...
Horizontal partitioning
●   Horizontal partitioning.
    ●   Split tables by key and put
        partitions (shards) on di...
Caching
●   Put a cache in front of your database.
    ●   Distribute.
    ●   Write-through for scaling reads.
    ●   Wr...
Did we solve our fallacies?
●   We tried, but ...
    ●   Still bound to the relational model.
    ●   Replication only co...
It's Not Only SQL




Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010
NOSQL Characteristics
●   Main traits of characterization:
    ●   Data Model.
    ●   Data Processing.
    ●   Consistenc...
Data Model (1)
●   Column-family based.
●   Structure:
    ●   Key-identified rows with a sparse number of columns.
    ● ...
Data Model (2)
●   Document based.
●   Structure:
    ●   Key-identified documents.
    ●   Schema-less (but optionally co...
Data Model (3)
●   Graph based.
●   Structure:
    ●   Nodes to represent your data.
    ●   Relations as meaningful links...
Data Model (4)
●   Key-Value based.
●   Structure:
    ●   Key-identified opaque values.
●   Highlights:
    ●   Great fle...
Data Processing
●   Several options:
    ●   Map/Reduce.
    ●   Predicates.
    ●   Range Queries.
    ●   ...
●   One co...
Consistency Model (1)
●   Strict Consistency.
    ●   All nodes ...
    ●   At every point in time ...
    ●   See a consi...
Consistency Model (2)
●   Eventual Consistency.
    ●   Only a subset of all nodes ...
    ●   At a specific point in time...
Scale Out (1)
●   Master-based.
    ●   Membership managed and
        broadcasted by masters.
    ●   Data consistency gu...
Scale Out (2)
●   Peer-to-peer.
    ●   Membership is maintained through
        multicast or gossip-based protocols.
    ...
NOSQL Use Cases
●   Use cases evolve along the following kinds of data:
    ●   Rich.
    ●   Runtime.
    ●   Hot Spot.
 ...
NOSQL Products - Cassandra
●   Cassandra (http://incubator.apache.org/cassandra)
●   Data Model:
    ●   Column-family bas...
NOSQL Products - Mongo DB
●   Mongo DB (http://www.mongodb.org)
●   Data Model:
    ●   Document based (JSON).
●   Data Pr...
NOSQL Products - Neo4j
●   Neo4j (http://neo4j.org)
●   Data Model:
    ●   Graph based.
●   Data Processing:
    ●   Path...
NOSQL Products - Riak
●   Riak (http://riak.basho.com)
●   Data Model:
    ●   Document based (JSON).
●   Data Processing:...
NOSQL Products - Terrastore
●   Terrastore (http://code.google.com/p/terrastore)
●   Data Model:
    ●   Document based (J...
NOSQL Products - Voldemort
●   Voldemort (http://project-voldemort.com)
●   Data Model:
    ●   Key-Value.
●   Data Proces...
NOSQL Products and Use Cases




           Sergio Bossa – sergio.bossa@gmail.com
            Javaday IV – Roma – 30 genna...
Final words
●   A New World.
    ●   New paradigms.
    ●   New use cases.
    ●   New products.
●   Don't dismiss the old...
Upcoming SlideShare
Loading in...5
×

Scalable Databases - From Relational Databases To Polyglot Persistence

5,777

Published on

In a world where everyone is connected, and everyone's data is on the web, scaling your database is no more a choice: it is a necessity.
In this talk we'll see how to make relational and non-relational databases scale at our needs by understanding and applying old and new patterns, then we'll look at the most common use cases, and how to address them by choosing the right patterns and tools.

Published in: Technology

Transcript of "Scalable Databases - From Relational Databases To Polyglot Persistence"

  1. 1. SCALABLE DATABASES From Relational Databases To Polyglot Persistence sergio.bossa@gmail.com Sergio Bossa http://twitter.com/sbtourist Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  2. 2. About Me ● Software architect and engineer ● Gioco Digitale (online gambling and casinos) ● Open Source enthusiast ● Terracotta Messaging (http://forge.terracotta.org) ● Terrastore (http://code.google.com/p/terrastore) ● Actorom (http://code.google.com/p/actorom) ● (Micro-)Blogger ● http://twitter.com/sbtourist ● http://sbtourist.blogspot.com Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  3. 3. Five fallacies of data-centric systems Data model is static. Data volume is predictable. Data access load is predictable. Database topology doesn't change. Database never fails. Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  4. 4. Scalable databases in action ● Scaling your database as a way to solve fallacies above. ● Scale to handle heterogeneous data. ● Scale to handle more data. ● Scale to handle more load. ● Scale to handle topology changes due to: ● Unplanned growth. ● Unpredictable failures. Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  5. 5. Scaling Relational Databases Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  6. 6. Master-Slave replication ● Master - Slave replication. ● One (and only one) master database. ● One or more slaves. ● All writes goes to the master. ● Replicated to slaves. ● Reads are balanced among master and slaves. ● Major issues: ● Single point of failure. ● Single point of bottleneck. ● Static topology. Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  7. 7. Master-Master replication ● Master - Master replication. ● One or more masters. ● Writes and reads can go to any master node. ● Writes are replicated among masters. ● Major issues: ● Limited performance and scalability (typically due to 2PC). ● Complexity. ● Static topology. Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  8. 8. Vertical partitioning ● Vertical partitioning. ● Put tables belonging to different functional areas on different database nodes. ● Scale your data and load by function. ● Move joins to the application level. ● Major issues: ● No more truly relational. ● What if a functional area grows too much? Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  9. 9. Horizontal partitioning ● Horizontal partitioning. ● Split tables by key and put partitions (shards) on different nodes. ● Scale your data and load by key. ● Move joins to the application level. ● Needs some kind of routing. ● Major issues: ● No more truly relational. ● What if your partition grows too much? Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  10. 10. Caching ● Put a cache in front of your database. ● Distribute. ● Write-through for scaling reads. ● Write-behind for scaling reads and writes. ● Saves you a lot of pain, but ... ● “Only” scales read/write load. Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  11. 11. Did we solve our fallacies? ● We tried, but ... ● Still bound to the relational model. ● Replication only covers a few use cases. ● Partitioning is hard. ● Caching is good, but not definitive. ● ... ● Can we do any better? Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  12. 12. It's Not Only SQL Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  13. 13. NOSQL Characteristics ● Main traits of characterization: ● Data Model. ● Data Processing. ● Consistency Model. ● Scale Out. Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  14. 14. Data Model (1) ● Column-family based. ● Structure: ● Key-identified rows with a sparse number of columns. ● Columns grouped in families. ● Multiple families for the same key. ● Highlights: ● Dynamically add and remove columns. ● Efficiently access columns in the same group (column family). Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  15. 15. Data Model (2) ● Document based. ● Structure: ● Key-identified documents. ● Schema-less (but optionally constrained). – JSON, XML ... ● Highlights: ● Dynamically change inner documents structure. ● Efficiently access documents as a unit. Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  16. 16. Data Model (3) ● Graph based. ● Structure: ● Nodes to represent your data. ● Relations as meaningful links between nodes. ● Properties to enrich both. ● Highlights: ● Rich data model. ● Efficient, fast, traversal of nodes and relations. Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  17. 17. Data Model (4) ● Key-Value based. ● Structure: ● Key-identified opaque values. ● Highlights: ● Great flexibility. ● Fast reads/writes for single entries. Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  18. 18. Data Processing ● Several options: ● Map/Reduce. ● Predicates. ● Range Queries. ● ... ● One common principle: ● Move processing toward related data. Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  19. 19. Consistency Model (1) ● Strict Consistency. ● All nodes ... ● At every point in time ... ● See a consistent view of the stored data. – Per-key consistency. – Multi-key consistency. Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  20. 20. Consistency Model (2) ● Eventual Consistency. ● Only a subset of all nodes ... ● At a specific point in time ... ● See a consistent view of the stored data. – Other nodes will serve stale data. – Other nodes will eventually get updates later. Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  21. 21. Scale Out (1) ● Master-based. ● Membership managed and broadcasted by masters. ● Data consistency guaranteed by masters. ● No SPOF with active/passive masters. ● No SPOB with active/active masters or cluster-cluster replication. ● Prone to partitioning failures. Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  22. 22. Scale Out (2) ● Peer-to-peer. ● Membership is maintained through multicast or gossip-based protocols. ● Data consistency is maintained through quorum protocols. ● Easier to scale. ● Harder to maintain consistency. Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  23. 23. NOSQL Use Cases ● Use cases evolve along the following kinds of data: ● Rich. ● Runtime. ● Hot Spot. ● Massive. ● Computational. ● Do not use the same product for all cases. ● Pick multiple products for different use cases. Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  24. 24. NOSQL Products - Cassandra ● Cassandra (http://incubator.apache.org/cassandra) ● Data Model: ● Column-family based. ● Data Processing: ● Range queries, Predicates. ● Consistency: ● Eventual consistency. ● Scalability: ● Peer-to-peer, gossip based. Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  25. 25. NOSQL Products - Mongo DB ● Mongo DB (http://www.mongodb.org) ● Data Model: ● Document based (JSON). ● Data Processing: ● Map/Reduce, SQL-like queries. ● Consistency: ● Per-document strict consistency. ● Scalability: ● Replication, partitioning (alpha). Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  26. 26. NOSQL Products - Neo4j ● Neo4j (http://neo4j.org) ● Data Model: ● Graph based. ● Data Processing: ● Path traversal, Index-based search. ● Consistency: ● Strict consistency. ● Scalability: ● Replication. Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  27. 27. NOSQL Products - Riak ● Riak (http://riak.basho.com) ● Data Model: ● Document based (JSON). ● Data Processing: ● Map/Reduce. ● Consistency: ● Eventual consistency. ● Scalability: ● Peer-to-peer, gossip based. Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  28. 28. NOSQL Products - Terrastore ● Terrastore (http://code.google.com/p/terrastore) ● Data Model: ● Document based (JSON). ● Data Processing: ● Range queries, Predicates. ● Consistency: ● Per-document strict consistency. ● Scalability: ● Master-based. Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  29. 29. NOSQL Products - Voldemort ● Voldemort (http://project-voldemort.com) ● Data Model: ● Key-Value. ● Data Processing: ● None. ● Consistency: ● Eventual consistency. ● Scalability: ● Peer-to-peer, gossip based. Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  30. 30. NOSQL Products and Use Cases Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  31. 31. Final words ● A New World. ● New paradigms. ● New use cases. ● New products. ● Don't dismiss the old stuff. ● Relational databases still have their place. ● Embrace change. ● May the NOSQL power be with you. ● Let the Polyglot Persistence era begin! Sergio Bossa – sergio.bossa@gmail.com Javaday IV – Roma – 30 gennaio 2010
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×