Database scalability




    Jonathan Ellis
Classic RDBMS persistence

                   Index




                    Data
Disk is the new tape*
• ~8ms to seek
    – ~4ms on expensive 15k rpm disks
performance   What scaling means




                    money
Performance
• Latency

• Throughput
Two kinds of operations
• Reads

• Writes
Caching
• Memcached

• Ehcache

• etc                   cache




                         DB
Cache invalidation
• Implicit

• Explicit
Cache set invalidation
get_cached_cart(cart=13, offset=10,
 limit=10)

get('cart:13:10:10')

?
Set invalidation 2
prefix = get('cart_prefix:13')

get(prefix + ':10:10')

del('cart_prefix:13')



http://www.aminus.org/...
Replication
Types of replication
• Master → slave
    – Master → slave → other slaves

• Master ↔ master
    – multi-master
Types of replication 2
• Synchronous

• Asynchronous
Synchronous
• Synchronous = slow(er)

• Complexity (e.g. 2pc)

• PGCluster

• Oracle
Asynchronous master/slave
• Easiest

• Failover

• MySQL replication

• Slony, Londiste, WAL shipping

• Tungsten
Asynchronous multi-master
• Conflict resolution
    – O(N3) or O(N2) as you add nodes
    – http://research.microsoft.com/...
Achtung!
• Asynchronous replication can lose data if
   the master fails
“Architecture”
• Primarily about how you cope with failure
    scenarios
Replication does not scale writes
Scaling writes
• Partitioning aka sharding
    – Key / horizontal
    – Vertical
    – Directed
Partitioning
Key based partitioning
• PK of “root” table controls destination
    – e.g. user id

• Retains referential integrity
Example: blogger.com

Users

Blogs

Comments
Example: blogger.com

Users

Blogs      Comments'


Comments
Vertical partitioning
• Tables on separate nodes

• Often a table that is too big to keep with
   the other tables, gets t...
Growing is hard
Directed partitioning
• Central db that knows what server owns a
   key

• Makes adding machines easier

• Single point of...
Partitioning
Partitioning with replication
What these have in common
• Ad hoc

• Error-prone

• Manpower-intensive
To summarize
• Scaling reads sucks

• Scaling writes sucks more
*
      Distributed databases
• Data is automatically partitioned

• Transparent to application

• Add capacity without do...
Two famous papers
• Bigtable: A distributed storage system for
    structured data, 2006

• Dynamo: amazon's highly availa...
The world doesn't need another
  half-assed key/value store

(See also Olin Shivers' 100% and 80%
  solutions)
Two approaches
• Bigtable: “How can we build a distributed
    database on top of GFS?”

• Dynamo: “How can we build a dis...
Bigtable architecture
Lookup in Bigtable
Dynamo
Eventually consistent
• Amazon:
   http://www.allthingsdistributed.com/2008/12

• eBay:
   http://queue.acm.org/detail.cfm...
Consistency in a BASE world
• If W + R > N, you are 100% consistent

• W=1, R=N

• W=N, R=1

• W=Q, R=Q where Q = N / 2 + 1
Cassandra
Memtable / SSTable




Disk

  Commit log
ColumnFamilies

keyA           column1   column2   column3
keyC           column1   column7   column11


Column
Byte[] Nam...
LSM write properties
• No reads

• No seeks

• Fast

• Atomic within ColumnFamily
vs MySQL with 50GB of data
• MySQL
    – ~300ms write
    – ~350ms read

• Cassandra
    – ~0.12ms write
    – ~15ms read
...
Classic RDBMS persistence

                   Index




                    Data
Questions
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalability
Upcoming SlideShare
Loading in...5
×

What Every Developer Should Know About Database Scalability

31,114

Published on

Replication. Partitioning. Relational databases. Bigtable. Dynamo. There is no one-size-fits-all approach to scaling your database, and the CAP theorem proved that there never will be. This talk will explain the advantages and limits of the approaches to scaling traditional relational databases, as well as the tradeoffs made by the designers of newer distributed systems like Cassandra. These slides are from Jonathan Ellis's OSCON 09 talk: http://en.oreilly.com/oscon2009/public/schedule/detail/7955

Published in: Technology, Business
6 Comments
76 Likes
Statistics
Notes
  • Related to your 'partitioning' class of approaches to scalability is horizontal scale with traditional relational databases. ParElastic (full disclosure, I'm a founder & CTO) makes a product that provides elastic horizontal scalability for MySQL. Check out www.parelastic.com.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • scalability
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Slide 12 says 'replication' but looks more like sharding to me. Replication would be storing the same smiley face on more than one disk/database. Instead, you have several databases and each smiley is going to one of those. Slide 22 is the same but the cylinders are replaced with puzzle pieces, and this time the title is 'partitioning'. Slide 30 finally seems to have it right. Slide 48 is identical to slide 2. This probably makes more sense if you were there to hear this presentation.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • slide 17: MySQL Cluster is synchronous and fault tolerant. It has other problems, but failing on master crash is not one of them (every node is a peer).
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Below products can be used for database scalability

    Object caching products: memcached

    Database Caching Products: CSQL, Timesten
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
31,114
On Slideshare
0
From Embeds
0
Number of Embeds
18
Actions
Shares
0
Downloads
1,087
Comments
6
Likes
76
Embeds 0
No embeds

No notes for slide

What Every Developer Should Know About Database Scalability

  1. 1. Database scalability Jonathan Ellis
  2. 2. Classic RDBMS persistence Index Data
  3. 3. Disk is the new tape* • ~8ms to seek – ~4ms on expensive 15k rpm disks
  4. 4. performance What scaling means money
  5. 5. Performance • Latency • Throughput
  6. 6. Two kinds of operations • Reads • Writes
  7. 7. Caching • Memcached • Ehcache • etc cache DB
  8. 8. Cache invalidation • Implicit • Explicit
  9. 9. Cache set invalidation get_cached_cart(cart=13, offset=10, limit=10) get('cart:13:10:10') ?
  10. 10. Set invalidation 2 prefix = get('cart_prefix:13') get(prefix + ':10:10') del('cart_prefix:13') http://www.aminus.org/blogs/index.php/2007/1
  11. 11. Replication
  12. 12. Types of replication • Master → slave – Master → slave → other slaves • Master ↔ master – multi-master
  13. 13. Types of replication 2 • Synchronous • Asynchronous
  14. 14. Synchronous • Synchronous = slow(er) • Complexity (e.g. 2pc) • PGCluster • Oracle
  15. 15. Asynchronous master/slave • Easiest • Failover • MySQL replication • Slony, Londiste, WAL shipping • Tungsten
  16. 16. Asynchronous multi-master • Conflict resolution – O(N3) or O(N2) as you add nodes – http://research.microsoft.com/~gray/replicas.ps • Bucardo • MySQL Cluster
  17. 17. Achtung! • Asynchronous replication can lose data if the master fails
  18. 18. “Architecture” • Primarily about how you cope with failure scenarios
  19. 19. Replication does not scale writes
  20. 20. Scaling writes • Partitioning aka sharding – Key / horizontal – Vertical – Directed
  21. 21. Partitioning
  22. 22. Key based partitioning • PK of “root” table controls destination – e.g. user id • Retains referential integrity
  23. 23. Example: blogger.com Users Blogs Comments
  24. 24. Example: blogger.com Users Blogs Comments' Comments
  25. 25. Vertical partitioning • Tables on separate nodes • Often a table that is too big to keep with the other tables, gets too big for a single node
  26. 26. Growing is hard
  27. 27. Directed partitioning • Central db that knows what server owns a key • Makes adding machines easier • Single point of failure
  28. 28. Partitioning
  29. 29. Partitioning with replication
  30. 30. What these have in common • Ad hoc • Error-prone • Manpower-intensive
  31. 31. To summarize • Scaling reads sucks • Scaling writes sucks more
  32. 32. * Distributed databases • Data is automatically partitioned • Transparent to application • Add capacity without downtime • Failure tolerant *Like Bigtable, not Lotus Notes
  33. 33. Two famous papers • Bigtable: A distributed storage system for structured data, 2006 • Dynamo: amazon's highly available key- value store, 2007
  34. 34. The world doesn't need another half-assed key/value store (See also Olin Shivers' 100% and 80% solutions)
  35. 35. Two approaches • Bigtable: “How can we build a distributed database on top of GFS?” • Dynamo: “How can we build a distributed hash table appropriate for the data center?”
  36. 36. Bigtable architecture
  37. 37. Lookup in Bigtable
  38. 38. Dynamo
  39. 39. Eventually consistent • Amazon: http://www.allthingsdistributed.com/2008/12 • eBay: http://queue.acm.org/detail.cfm?id=139412
  40. 40. Consistency in a BASE world • If W + R > N, you are 100% consistent • W=1, R=N • W=N, R=1 • W=Q, R=Q where Q = N / 2 + 1
  41. 41. Cassandra
  42. 42. Memtable / SSTable Disk Commit log
  43. 43. ColumnFamilies keyA column1 column2 column3 keyC column1 column7 column11 Column Byte[] Name Byte[] Value I64 timestamp
  44. 44. LSM write properties • No reads • No seeks • Fast • Atomic within ColumnFamily
  45. 45. vs MySQL with 50GB of data • MySQL – ~300ms write – ~350ms read • Cassandra – ~0.12ms write – ~15ms read • Achtung!
  46. 46. Classic RDBMS persistence Index Data
  47. 47. Questions
  1. Gostou de algum slide específico?

    Recortar slides é uma maneira fácil de colecionar informações para acessar mais tarde.

×