Realtime Analytics with MongoDB
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Realtime Analytics with MongoDB

on

  • 23,771 views

My talk from Mongo Boston (9/20/2010) about how we use MongoDB to scale Rails at Yottaa.

My talk from Mongo Boston (9/20/2010) about how we use MongoDB to scale Rails at Yottaa.

Statistics

Views

Total Views
23,771
Views on SlideShare
21,810
Embed Views
1,961

Actions

Likes
60
Downloads
259
Comments
0

18 Embeds 1,961

http://tech-public.blogspot.jp 804
http://blog.yottaa.com 786
http://www.yottaa.com 122
http://www.mongodb.org 100
http://www.nosqldatabases.com 89
http://tech-public.blogspot.com 15
http://localhost 10
http://yottaa1.web12.hubspot.com 10
http://feeds.feedburner.com 5
http://paper.li 5
https://twitter.com 4
http://www.digaku.com 3
http://translate.googleusercontent.com 2
http://tech-public.blogspot.fr 2
http://a0.twimg.com 1
http://webcache.googleusercontent.com 1
http://tech-public.blogspot.de 1
http://yottaa.blog.163.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Realtime Analytics with MongoDB Presentation Transcript

  • 1. Scaling Rails @ Yottaa
    Jared Rosoff
    @forjared
    jrosoff@yottaa.com
    September 20th 2010
  • 2. From zero to humongous
    2
    About our application
    How we chose MongoDB
    How we use MongoDB
  • 3. About our application
    3
    We collect lots of data
    6000+ URLs
    300 samples per URL per day
    Some samples are >1MB (firebug)
    Missing a sample isn’t a bit deal
    We visualize data in real-time
    No delay when showing data
    “On-Demand” samples
    The “check now” button
  • 4. The Yottaa Network
    4
  • 5. How we chose mongo
    5
  • 6. Requirements
    Our data set is going to grow very quickly
    Scalable by default
    We have a very small team
    Focus on application, not infrastructure
    We are a startup
    Requirements change hourly
    Operations
    We’re 100% in the cloud
    6
  • 7. Rails default architecture
    Performance Bottleneck: Too much load
    Collection Server
    Data Source
    MySQL
    User
    Reporting Server
    “Just” a Rails App
  • 8. Let’s add replication!
    Performance Bottleneck: Still can’t scale writes
    MySQL
    Master
    Collection Server
    Data Source
    Replication
    MySQL
    Master
    User
    Reporting Server
    MySQL
    Master
    MySQL
    Master
    Off the shelf!
    Scalable Reads!
  • 9. What about sharding?
    Development Bottleneck:
    Need to write custom code
    Collection Server
    Data Source
    Sharding
    MySQL
    Master
    MySQL
    Master
    MySQL
    Master
    User
    Reporting Server
    Sharding
    Scalable Writes!
  • 10. Key Value stores to the rescue?
    Development Bottleneck:
    Reporting is limited / hard
    Collection Server
    Data Source
    MySQL
    Master
    MySQL
    Master
    Cassandra or
    Voldemort
    User
    Reporting Server
    Scalable Writes!
  • 11. Can I Hadoop my way out of this?
    Development Bottleneck:
    Too many systems!
    MySQL
    Master
    MySQL
    Master
    Cassandra or
    Voldemort
    Collection Server
    Data Source
    Hadoop
    MySQL
    Master
    Scalable Writes!
    Flexible Reports!
    “Just” a Rails App
    MySQL
    Master
    User
    Reporting Server
    MySQL
    Master
    MySQL
    Slave
  • 12. MongoDB!
    Collection Server
    Data Source
    MySQL
    Master
    MySQL
    Master
    MongoDB
    User
    Reporting Server
    Scalable Writes!
    “Just” a rails app
    Flexible Reporting!
  • 13. MongoD
    App Server
    Data Source
    Collection
    MongoD
    Load
    Balancer
    Passenger
    Nginx
    Mongos
    Reporting
    User
    MongoD
    Sharding!
    High Concurrency
    Scale-Out
  • 14. Sharding is critical
    14
    Distribute write load across servers
    Decentralize data storage
    Scale out!
  • 15. Before Sharding
    15
    App
    Server
    App Server
    App Server
    Need higher write volume
    Buy a bigger database
    Need more storage volume
    Buy a bigger database
  • 16. After Sharding
    16
    App
    Server
    App Server
    App Server
    Need higher write volume
    Add more servers
    Need more storage volume
    Add more servers
  • 17. Scale out is the new scale up
    17
    App
    Server
    App Server
    App Server
  • 18. How we’re using MongoDB
    18
  • 19. Our Data Model
    19
    Document per URL we track
    Meta-data
    Summary Data
    Most recent measurements
    Document per URL per Day
    Detailed metrics
    Pre-aggregated data
  • 20. Thinking in rows
    20
    { url: ‘www.google.com’,
    location: “SFO”
    connect: 23,
    first_byte: 123,
    last_byte: 245,
    timestamp: 1234 }
    { url: ‘www.google.com’,
    location: “NYC”
    connect: 23,
    first_byte: 123,
    last_byte: 245,
    timestamp: 2345 }
  • 21. Thinking in rows
    21
    What was the average connect time for google on friday?
    From SFO?
    From NYC?
    Between 1AM-2AM?
  • 22. Thinking in rows
    22
    Up to 100’s of samples per URL per day!!
    Day 1
    AVG
    Result
    Day 2
    An “average” chart had to hit 600 rows
    AVG
    Day 3
    AVG
    30 days average query range
  • 23. Thinking in Documents
    This document contains all data for www.google.com collected during 9/20/2010
    This tells us the average value for this metric for this url / time period
    Average value from SFO
    Average value from NYC
    23
  • 24. Storing a sample
    24
    db.metrics.dailies.update(
    { url: ‘www.google.com’,
    day: ‘9/20/2010’ },
    { ‘$inc’: {
    ‘connect.sum’:1234,
    ‘connect.count’:1,
    ‘connect.sfo.sum’:1234,
    ‘connect.sfo.count’:1 } },
    { upsert: true }
    );
    Which document we’re updating
    Update the aggregate value
    Update the location specific value
    Atomically update the document
    Create the document if it doesn’t already exist
  • 25. Putting it together
    25
    Atomically update the daily data
    1
    { url: ‘www.google.com’,
    location: “SFO”
    connect: 23,
    first_byte: 123,
    last_byte: 245,
    timestamp: 1234 }
    Atomically update the weekly data
    2
    Atomically update the monthly data
    3
  • 26. Drawing connect time graph
    26
    db.metrics.dailies.find(
    { url: ‘www.google.com’,
    day: { “$gte”: ‘9/1/2010’,
    “$lte”:’9/20/2010’ },
    { ‘connect’:true}
    );
    Data for google
    We just want connect time data
    Compound index to make this query fast
    The range of dates for the chart
    db.metrics.dailies.ensureIndex({url:1,day:-1})
  • 27. More efficient charts
    27
    1 Document per URL per Day
    Day 1
    AVG
    Result
    Day 2
    Average chart hits 30 documents.
    AVG
    20x fewer
    Day 3
    AVG
    30 days == 30 documents
  • 28. Real Time Updates
    28
    Single query to fetch all metric data for a URL
    Fast enough that browser can poll constantly for updated data without impacting server
  • 29. Final thoughts
    Mongo has been a great choice
    80gb of data and counting
    Majorly compressed after moving from table to document oriented data model
    100’s of updates per second 24x7
    Not using Sharding in production yet, but planning on it soon
    You are using replication, right?
    29