Your SlideShare is downloading. ×
0
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

What every developer should know about database scalability, PyCon 2010

9,929

Published on

0 Comments
29 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
9,929
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
266
Comments
0
Likes
29
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Database scalability Jonathan Ellis / @spyced
  • 2. operations / s = throughput money What scaling means
  • 3. “It scales, but” • It scales, but we have to keep adding people to our ops team • It scales, but when the netapp runs out of capacity we're screwed • It scales, but it's not fast
  • 4. 2000 2010
  • 5. Problem 1 Index Data
  • 6. Problem 2
  • 7. Problem 3
  • 8. Two kinds of operations • Reads • Writes
  • 9. Band-aids • (Not actually scaling your database)
  • 10. Problem 1: btrees suck • ~100 MB/s sequential writes • ~1.5 MB/s random writes – ~10ms to seek – ~5ms on expensive 15k rpm disks
  • 11. Magic bullets • SSD – ~100MB/s random writes (of 4KB) – ~$800 for 64GB • vs ~$200 for 2 TB rotational • Moore's law – Aka the 37signals approach
  • 12. Caching • Memcached • Ehcache • etc cache DB
  • 13. Partitioning a (read) cache i = hash(key) % len(servers) server = servers[i]
  • 14. The cold start problem
  • 15. Consistent hashing
  • 16. Consistent hashing • libketema for memcached – Multiple “virtual nodes” per machine for better load balancing
  • 17. Cache coherence strategies • Write-through • Explicit invalidation • Implicit invalidation
  • 18. Cache set invalidation def get_cached_cart(cart, offset): key = 'cart:%s:%s' % (cart, offset) return get(key) # cart:13:10 def invalidate_cart(cart): ?
  • 19. Set invalidation 2 def get_cached_cart(cart, offset): prefix = get_cart_prefix(cart) key = '%s:%s' % (prefix, offset) return get(key) def invalidate_cart(cart): del(get_cart_prefix(cart)) http://www.aminus.org/blogs/index.php/2007/1
  • 20. Pop quiz • Can caching reduce write load?
  • 21. Write-back caching foo.bar = 1 foo.bar = 2 foo.bar = 4 cache UPDATE foos SET bar=4 WHERE id=3 DB
  • 22. Achtung! • Commercial solutions – Terracotta • Roll your own – Memcached – ehcache – Zookeeper?
  • 23. Scaling reads
  • 24. Replication
  • 25. Types of replication • Master → slave – Master → slave → other slaves • Master ↔ master – multi-master
  • 26. Types of replication 2 • Synchronous • Asynchronous
  • 27. Synchronous master/slave • ?
  • 28. Synchronous multi-master • Synchronous = slow(er) • Complexity (e.g. 2pc) • PGCluster • Oracle
  • 29. Asynchronous master/slave • Easiest • MySQL replication • Slony, Londiste, WAL shipping • Tungsten • MongoDB • Redis
  • 30. Asynchronous multi-master • Conflict resolution – O(N3) or O(N2) as you add nodes – http://research.microsoft.com/~gray/replicas.ps • Bucardo • MySQL Cluster • Tokyo Cabinet
  • 31. Achtung! • Asynchronous replication can lose data if the master fails
  • 32. “Architecture” • Primarily about how you cope with failure scenarios
  • 33. Replication does not scale writes
  • 34. Scaling writes
  • 35. Partitioning
  • 36. Partitioning • sharding / horizontal / key • Functional / vertical
  • 37. Key based partitioning • PK of “root” table controls destination – e.g. user id • Retains referential integrity
  • 38. Example: blogger.com Users Blogs Comments
  • 39. Example: blogger.com Users Blogs Comments' Comments
  • 40. What server gets each key? • Consistent hashing • Central db that knows what server owns a key – Makes adding machines easier (than – Single point of failure & query bottleneck – MongoDB
  • 41. Vertical partitioning • Tables on separate nodes • Often a table that is too big to keep with the other tables, gets too big for a single node
  • 42. Users2 Users1 Products1 Vertical segments Both Trans3 Trans2 Trans1 Sharding
  • 43. Partitioning
  • 44. Partitioning with replication
  • 45. What these have in common • Not application-transparent • Ad hoc • Error-prone • Manpower-intensive
  • 46. To summarize • Scaling reads sucks • Scaling writes sucks more
  • 47. Case in point
  • 48. NoSQL • Distributed (partitioned, scalable) – Cassandra – HBase – Voldemort • Not distributed – Neo4j – CouchDB – Redis
  • 49. * Distributed databases • Data is automatically partitioned • Transparent to application • Add capacity without downtime • Failure tolerant *Like Bigtable, not like Lotus Notes
  • 50. Two famous papers • Bigtable: A distributed storage system for structured data, 2006 • Dynamo: amazon's highly available key- value store, 2007
  • 51. Two approaches • Bigtable: “How can we build a distributed database on top of GFS?” • Dynamo: “How can we build a distributed hash table appropriate for the data center?”
  • 52. Cassandra • Dynamo-style replication • Bigtable ColumnFamily data model • High-performance – Digg phasing out memcached • Open space, Saturday at 17:30
  • 53. Questions

×