Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

What Every Developer Should Know About Database Scalability


Published on

Replication. Partitioning. Relational databases. Bigtable. Dynamo. There is no one-size-fits-all approach to scaling your database, and the CAP theorem proved that there never will be. This talk will explain the advantages and limits of the approaches to scaling traditional relational databases, as well as the tradeoffs made by the designers of newer distributed systems like Cassandra. These slides are from Jonathan Ellis's OSCON 09 talk:

Published in: Technology, Business
  • Related to your 'partitioning' class of approaches to scalability is horizontal scale with traditional relational databases. ParElastic (full disclosure, I'm a founder & CTO) makes a product that provides elastic horizontal scalability for MySQL. Check out
    Are you sure you want to  Yes  No
    Your message goes here
  • scalability
    Are you sure you want to  Yes  No
    Your message goes here
  • Slide 12 says 'replication' but looks more like sharding to me. Replication would be storing the same smiley face on more than one disk/database. Instead, you have several databases and each smiley is going to one of those. Slide 22 is the same but the cylinders are replaced with puzzle pieces, and this time the title is 'partitioning'. Slide 30 finally seems to have it right. Slide 48 is identical to slide 2. This probably makes more sense if you were there to hear this presentation.
    Are you sure you want to  Yes  No
    Your message goes here
  • slide 17: MySQL Cluster is synchronous and fault tolerant. It has other problems, but failing on master crash is not one of them (every node is a peer).
    Are you sure you want to  Yes  No
    Your message goes here
  • Below products can be used for database scalability

    Object caching products: memcached

    Database Caching Products: CSQL, Timesten
    Are you sure you want to  Yes  No
    Your message goes here

What Every Developer Should Know About Database Scalability

  1. Database scalability Jonathan Ellis
  2. Classic RDBMS persistence Index Data
  3. Disk is the new tape* • ~8ms to seek – ~4ms on expensive 15k rpm disks
  4. performance What scaling means money
  5. Performance • Latency • Throughput
  6. Two kinds of operations • Reads • Writes
  7. Caching • Memcached • Ehcache • etc cache DB
  8. Cache invalidation • Implicit • Explicit
  9. Cache set invalidation get_cached_cart(cart=13, offset=10, limit=10) get('cart:13:10:10') ?
  10. Set invalidation 2 prefix = get('cart_prefix:13') get(prefix + ':10:10') del('cart_prefix:13')
  11. Replication
  12. Types of replication • Master → slave – Master → slave → other slaves • Master ↔ master – multi-master
  13. Types of replication 2 • Synchronous • Asynchronous
  14. Synchronous • Synchronous = slow(er) • Complexity (e.g. 2pc) • PGCluster • Oracle
  15. Asynchronous master/slave • Easiest • Failover • MySQL replication • Slony, Londiste, WAL shipping • Tungsten
  16. Asynchronous multi-master • Conflict resolution – O(N3) or O(N2) as you add nodes – • Bucardo • MySQL Cluster
  17. Achtung! • Asynchronous replication can lose data if the master fails
  18. “Architecture” • Primarily about how you cope with failure scenarios
  19. Replication does not scale writes
  20. Scaling writes • Partitioning aka sharding – Key / horizontal – Vertical – Directed
  21. Partitioning
  22. Key based partitioning • PK of “root” table controls destination – e.g. user id • Retains referential integrity
  23. Example: Users Blogs Comments
  24. Example: Users Blogs Comments' Comments
  25. Vertical partitioning • Tables on separate nodes • Often a table that is too big to keep with the other tables, gets too big for a single node
  26. Growing is hard
  27. Directed partitioning • Central db that knows what server owns a key • Makes adding machines easier • Single point of failure
  28. Partitioning
  29. Partitioning with replication
  30. What these have in common • Ad hoc • Error-prone • Manpower-intensive
  31. To summarize • Scaling reads sucks • Scaling writes sucks more
  32. * Distributed databases • Data is automatically partitioned • Transparent to application • Add capacity without downtime • Failure tolerant *Like Bigtable, not Lotus Notes
  33. Two famous papers • Bigtable: A distributed storage system for structured data, 2006 • Dynamo: amazon's highly available key- value store, 2007
  34. The world doesn't need another half-assed key/value store (See also Olin Shivers' 100% and 80% solutions)
  35. Two approaches • Bigtable: “How can we build a distributed database on top of GFS?” • Dynamo: “How can we build a distributed hash table appropriate for the data center?”
  36. Bigtable architecture
  37. Lookup in Bigtable
  38. Dynamo
  39. Eventually consistent • Amazon: • eBay:
  40. Consistency in a BASE world • If W + R > N, you are 100% consistent • W=1, R=N • W=N, R=1 • W=Q, R=Q where Q = N / 2 + 1
  41. Cassandra
  42. Memtable / SSTable Disk Commit log
  43. ColumnFamilies keyA column1 column2 column3 keyC column1 column7 column11 Column Byte[] Name Byte[] Value I64 timestamp
  44. LSM write properties • No reads • No seeks • Fast • Atomic within ColumnFamily
  45. vs MySQL with 50GB of data • MySQL – ~300ms write – ~350ms read • Cassandra – ~0.12ms write – ~15ms read • Achtung!
  46. Classic RDBMS persistence Index Data
  47. Questions