• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
What every developer should know about database scalability, PyCon 2010
 

What every developer should know about database scalability, PyCon 2010

on

  • 11,721 views

 

Statistics

Views

Total Views
11,721
Views on SlideShare
9,796
Embed Views
1,925

Actions

Likes
28
Downloads
261
Comments
0

10 Embeds 1,925

http://nosql.mypopescu.com 1217
http://www.royans.net 614
http://www.slideshare.net 50
http://irr.posterous.com 31
http://www.iweb34.com 4
http://translate.googleusercontent.com 4
http://posterous.com 2
http://feeds.feedburner.com 1
http://andrei-pamula.tumblr.com 1
http://www.tumblr.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    What every developer should know about database scalability, PyCon 2010 What every developer should know about database scalability, PyCon 2010 Presentation Transcript

    • Database scalability Jonathan Ellis / @spyced
    • operations / s = throughput money What scaling means
    • “It scales, but” • It scales, but we have to keep adding people to our ops team • It scales, but when the netapp runs out of capacity we're screwed • It scales, but it's not fast
    • 2000 2010
    • Problem 1 Index Data
    • Problem 2
    • Problem 3
    • Two kinds of operations • Reads • Writes
    • Band-aids • (Not actually scaling your database)
    • Problem 1: btrees suck • ~100 MB/s sequential writes • ~1.5 MB/s random writes – ~10ms to seek – ~5ms on expensive 15k rpm disks
    • Magic bullets • SSD – ~100MB/s random writes (of 4KB) – ~$800 for 64GB • vs ~$200 for 2 TB rotational • Moore's law – Aka the 37signals approach
    • Caching • Memcached • Ehcache • etc cache DB
    • Partitioning a (read) cache i = hash(key) % len(servers) server = servers[i]
    • The cold start problem
    • Consistent hashing
    • Consistent hashing • libketema for memcached – Multiple “virtual nodes” per machine for better load balancing
    • Cache coherence strategies • Write-through • Explicit invalidation • Implicit invalidation
    • Cache set invalidation def get_cached_cart(cart, offset): key = 'cart:%s:%s' % (cart, offset) return get(key) # cart:13:10 def invalidate_cart(cart): ?
    • Set invalidation 2 def get_cached_cart(cart, offset): prefix = get_cart_prefix(cart) key = '%s:%s' % (prefix, offset) return get(key) def invalidate_cart(cart): del(get_cart_prefix(cart)) http://www.aminus.org/blogs/index.php/2007/1
    • Pop quiz • Can caching reduce write load?
    • Write-back caching foo.bar = 1 foo.bar = 2 foo.bar = 4 cache UPDATE foos SET bar=4 WHERE id=3 DB
    • Achtung! • Commercial solutions – Terracotta • Roll your own – Memcached – ehcache – Zookeeper?
    • Scaling reads
    • Replication
    • Types of replication • Master → slave – Master → slave → other slaves • Master ↔ master – multi-master
    • Types of replication 2 • Synchronous • Asynchronous
    • Synchronous master/slave • ?
    • Synchronous multi-master • Synchronous = slow(er) • Complexity (e.g. 2pc) • PGCluster • Oracle
    • Asynchronous master/slave • Easiest • MySQL replication • Slony, Londiste, WAL shipping • Tungsten • MongoDB • Redis
    • Asynchronous multi-master • Conflict resolution – O(N3) or O(N2) as you add nodes – http://research.microsoft.com/~gray/replicas.ps • Bucardo • MySQL Cluster • Tokyo Cabinet
    • Achtung! • Asynchronous replication can lose data if the master fails
    • “Architecture” • Primarily about how you cope with failure scenarios
    • Replication does not scale writes
    • Scaling writes
    • Partitioning
    • Partitioning • sharding / horizontal / key • Functional / vertical
    • Key based partitioning • PK of “root” table controls destination – e.g. user id • Retains referential integrity
    • Example: blogger.com Users Blogs Comments
    • Example: blogger.com Users Blogs Comments' Comments
    • What server gets each key? • Consistent hashing • Central db that knows what server owns a key – Makes adding machines easier (than – Single point of failure & query bottleneck – MongoDB
    • Vertical partitioning • Tables on separate nodes • Often a table that is too big to keep with the other tables, gets too big for a single node
    • Users2 Users1 Products1 Vertical segments Both Trans3 Trans2 Trans1 Sharding
    • Partitioning
    • Partitioning with replication
    • What these have in common • Not application-transparent • Ad hoc • Error-prone • Manpower-intensive
    • To summarize • Scaling reads sucks • Scaling writes sucks more
    • Case in point
    • NoSQL • Distributed (partitioned, scalable) – Cassandra – HBase – Voldemort • Not distributed – Neo4j – CouchDB – Redis
    • * Distributed databases • Data is automatically partitioned • Transparent to application • Add capacity without downtime • Failure tolerant *Like Bigtable, not like Lotus Notes
    • Two famous papers • Bigtable: A distributed storage system for structured data, 2006 • Dynamo: amazon's highly available key- value store, 2007
    • Two approaches • Bigtable: “How can we build a distributed database on top of GFS?” • Dynamo: “How can we build a distributed hash table appropriate for the data center?”
    • Cassandra • Dynamo-style replication • Bigtable ColumnFamily data model • High-performance – Digg phasing out memcached • Open space, Saturday at 17:30
    • Questions