• Share
  • Email
  • Embed
  • Like
  • Private Content
2010 mongo berlin-scaling

2010 mongo berlin-scaling






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • What is scaling? Well - hopefully for everyone here.
  • ec2 goes up to 64gb, maybe mention 256gb box here??? ($30-40k) maybe can but 256gb box, but i spin up 10 ec2 64gb boxes in 10 minutes
  • not schema less - dynamic schema schema is just as important, or more important than relational understand write vs read tradeoffs
  • compare to mysql here
  • most common performance problem why _id index can be ignored
  • data looked at per second/minute/hour/day are you indexes accessed randomly
  • 256gb ram $30-40k
  • Don’t pre-emptively shard - easy to add later

2010 mongo berlin-scaling 2010 mongo berlin-scaling Presentation Transcript

  • Eliot Horowitz @eliothorowitz MongoBerlin October 4, 2010 Scaling with MongoDB
  • Scaling
    • Storage needs only go up
    • Operations/sec only go up
    • Complexity only goes up
  • Scaling by Optimization
    • Schema Design
    • Index Design
    • Hardware Configuration
  • Horizontal Scaling
    • Vertical scaling is limited
    • Hard to scale vertically in the cloud
    • Can scale wider than higher
  • Schema
    • Modeling the same data in different ways can change performance by orders of magnitude
    • Very often performance problems can be solved by changing Schema
  • Embedding
    • Great for read performance
    • One seek to load entire object
    • One roundtrip to database
    • Writes can be slow if adding to objects all the time
    • Should you embed comments?
  • Indexes
    • Index common queries
    • Make sure there aren’t duplicates: (A) and (A,B) aren’t needed
    • Right-balance indexes keep working set small
  • RAM Requirements
    • Understand working set
    • What percentage of your data has to fit in RAM?
    • How do you figure this out?
  • Hardware
    • Disk performance
    • How many drives
    • What about ec2?
    • Network performance
  • Read Scaling
    • One master at any time
    • Programmer determines if read hits master or a slave
    • Pro: easy to setup, can scale reads very well
    • Con: reads are inconsistent on a slave
    • Writes don’t scale
    • Good for read heavy applications
  • One Master, Many Slaves
    • Custom Master/Slave setup
    • Have as many slaves as you want
    • Can put them local to appservers
    • Good for 90+% read heavy applications (Wikipedia)
  • Replica Sets
    • High Availability Cluster
    • One master at any time, up to 6 slaves
    • A slave automatically promoted to master if failure
    • Drivers [will] support auto routing of reads to slaves if programmer allows
    • Good for applications that need high write availability but mostly reads (Commenting System)
    • Many masters, even more slaves
    • Can scale reads and writes in two dimensions
    • Add slaves for inconsistent read scaling and redundancy
    • Add Shards for write and data size scaling
  • Common Setup
    • Typical setup is 3 shards with 3 servers per shard: 3 masters, 6 slaves
    • One massive collection, dozen non-sharded
    • Can add sharding later to an existing replica set with no down time
    • Can have sharded and non-sharded collections
  • Use Cases
    • Millions of user profiles
    • User activity stream
    • Photos
    • Logging
  • Download MongoDB http://www.mongodb.org and let us know what you think @eliothorowitz @mongodb 10gen is hiring! http://www.10gen.com/jobs