What Every Developer Should Know About Database Scalability

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

4 comments

Comments 1 - 4 of 4 previous next Post a comment

  • + nlfiedler nlfiedler 23 hours ago
    Slide 12 says 'replication' but looks more like sharding to me. Replication would be storing the same smiley face on more than one disk/database. Instead, you have several databases and each smiley is going to one of those. Slide 22 is the same but the cylinders are replaced with puzzle pieces, and this time the title is 'partitioning'. Slide 30 finally seems to have it right. Slide 48 is identical to slide 2. This probably makes more sense if you were there to hear this presentation.
  • + datacharmer Giuseppe Maxia 2 weeks ago
    slide 17: MySQL Cluster is synchronous and fault tolerant. It has other problems, but failing on master crash is not one of them (every node is a peer).
  • + praba_tuty praba_tuty 1 month ago
    Below products can be used for database scalability

    Object caching products: memcached

    Database Caching Products: CSQL, Timesten
  • + jbellis jbellis 4 months ago
    The comic is from http://xkcd.com/386/, of course.
Post a comment
Embed Video
Edit your comment Cancel

15 Favorites

What Every Developer Should Know About Database Scalability - Presentation Transcript

  1. Database scalability Jonathan Ellis
  2. Classic RDBMS persistence Index Data
  3. Disk is the new tape* • ~8ms to seek – ~4ms on expensive 15k rpm disks
  4. performance What scaling means money
  5. Performance • Latency • Throughput
  6. Two kinds of operations • Reads • Writes
  7. Caching • Memcached • Ehcache • etc cache DB
  8. Cache invalidation • Implicit • Explicit
  9. Cache set invalidation get_cached_cart(cart=13, offset=10, limit=10) get('cart:13:10:10') ?
  10. Set invalidation 2 prefix = get('cart_prefix:13') get(prefix + ':10:10') del('cart_prefix:13') http://www.aminus.org/blogs/index.php/2007/1
  11. Replication
  12. Types of replication • Master → slave – Master → slave → other slaves • Master ↔ master – multi-master
  13. Types of replication 2 • Synchronous • Asynchronous
  14. Synchronous • Synchronous = slow(er) • Complexity (e.g. 2pc) • PGCluster • Oracle
  15. Asynchronous master/slave • Easiest • Failover • MySQL replication • Slony, Londiste, WAL shipping • Tungsten
  16. Asynchronous multi-master • Conflict resolution – O(N3) or O(N2) as you add nodes – http://research.microsoft.com/~gray/replicas.ps • Bucardo • MySQL Cluster
  17. Achtung! • Asynchronous replication can lose data if the master fails
  18. “Architecture” • Primarily about how you cope with failure scenarios
  19. Replication does not scale writes
  20. Scaling writes • Partitioning aka sharding – Key / horizontal – Vertical – Directed
  21. Partitioning
  22. Key based partitioning • PK of “root” table controls destination – e.g. user id • Retains referential integrity
  23. Example: blogger.com Users Blogs Comments
  24. Example: blogger.com Users Blogs Comments' Comments
  25. Vertical partitioning • Tables on separate nodes • Often a table that is too big to keep with the other tables, gets too big for a single node
  26. Growing is hard
  27. Directed partitioning • Central db that knows what server owns a key • Makes adding machines easier • Single point of failure
  28. Partitioning
  29. Partitioning with replication
  30. What these have in common • Ad hoc • Error-prone • Manpower-intensive
  31. To summarize • Scaling reads sucks • Scaling writes sucks more
  32. * Distributed databases • Data is automatically partitioned • Transparent to application • Add capacity without downtime • Failure tolerant *Like Bigtable, not Lotus Notes
  33. Two famous papers • Bigtable: A distributed storage system for structured data, 2006 • Dynamo: amazon's highly available key- value store, 2007
  34. The world doesn't need another half-assed key/value store (See also Olin Shivers' 100% and 80% solutions)
  35. Two approaches • Bigtable: “How can we build a distributed database on top of GFS?” • Dynamo: “How can we build a distributed hash table appropriate for the data center?”
  36. Bigtable architecture
  37. Lookup in Bigtable
  38. Dynamo
  39. Eventually consistent • Amazon: http://www.allthingsdistributed.com/2008/12 • eBay: http://queue.acm.org/detail.cfm?id=139412
  40. Consistency in a BASE world • If W + R > N, you are 100% consistent • W=1, R=N • W=N, R=1 • W=Q, R=Q where Q = N / 2 + 1
  41. Cassandra
  42. Memtable / SSTable Disk Commit log
  43. ColumnFamilies keyA column1 column2 column3 keyC column1 column7 column11 Column Byte[] Name Byte[] Value I64 timestamp
  44. LSM write properties • No reads • No seeks • Fast • Atomic within ColumnFamily
  45. vs MySQL with 50GB of data • MySQL – ~300ms write – ~350ms read • Cassandra – ~0.12ms write – ~15ms read • Achtung!
  46. Classic RDBMS persistence Index Data
  47. Questions

+ jbellisjbellis, 4 months ago

custom

3380 views, 15 favs, 6 embeds more stats

Replication. Partitioning. Relational databases. Bi more

More info about this document

© All Rights Reserved

Go to text version

  • Total Views 3380
    • 2730 on SlideShare
    • 650 from embeds
  • Comments 4
  • Favorites 15
  • Downloads 131
Most viewed embeds
  • 623 views on http://spyced.blogspot.com
  • 23 views on http://saltfactory.textcube.com
  • 1 views on http://www.blogger.com
  • 1 views on http://www.iss.nl
  • 1 views on http://www.slideshare.net

more

All embeds
  • 623 views on http://spyced.blogspot.com
  • 23 views on http://saltfactory.textcube.com
  • 1 views on http://www.blogger.com
  • 1 views on http://www.iss.nl
  • 1 views on http://www.slideshare.net
  • 1 views on http://adrianamap.wikispaces.com

less

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

Cancel
File a copyright complaint
Having problems? Go to our helpdesk?

Categories