• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Realtime Analytics with MongoDB
 

Realtime Analytics with MongoDB

on

  • 23,026 views

My talk from Mongo Boston (9/20/2010) about how we use MongoDB to scale Rails at Yottaa.

My talk from Mongo Boston (9/20/2010) about how we use MongoDB to scale Rails at Yottaa.

Statistics

Views

Total Views
23,026
Views on SlideShare
21,090
Embed Views
1,936

Actions

Likes
60
Downloads
254
Comments
0

17 Embeds 1,936

http://tech-public.blogspot.jp 788
http://blog.yottaa.com 786
http://www.yottaa.com 120
http://www.mongodb.org 100
http://www.nosqldatabases.com 87
http://tech-public.blogspot.com 14
http://localhost 10
http://yottaa1.web12.hubspot.com 10
http://paper.li 5
http://feeds.feedburner.com 5
http://www.digaku.com 3
http://translate.googleusercontent.com 2
http://tech-public.blogspot.fr 2
http://a0.twimg.com 1
http://yottaa.blog.163.com 1
http://webcache.googleusercontent.com 1
http://tech-public.blogspot.de 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Realtime Analytics with MongoDB Realtime Analytics with MongoDB Presentation Transcript

    • Scaling Rails @ Yottaa
      Jared Rosoff
      @forjared
      jrosoff@yottaa.com
      September 20th 2010
    • From zero to humongous
      2
      About our application
      How we chose MongoDB
      How we use MongoDB
    • About our application
      3
      We collect lots of data
      6000+ URLs
      300 samples per URL per day
      Some samples are >1MB (firebug)
      Missing a sample isn’t a bit deal
      We visualize data in real-time
      No delay when showing data
      “On-Demand” samples
      The “check now” button
    • The Yottaa Network
      4
    • How we chose mongo
      5
    • Requirements
      Our data set is going to grow very quickly
      Scalable by default
      We have a very small team
      Focus on application, not infrastructure
      We are a startup
      Requirements change hourly
      Operations
      We’re 100% in the cloud
      6
    • Rails default architecture
      Performance Bottleneck: Too much load
      Collection Server
      Data Source
      MySQL
      User
      Reporting Server
      “Just” a Rails App
    • Let’s add replication!
      Performance Bottleneck: Still can’t scale writes
      MySQL
      Master
      Collection Server
      Data Source
      Replication
      MySQL
      Master
      User
      Reporting Server
      MySQL
      Master
      MySQL
      Master
      Off the shelf!
      Scalable Reads!
    • What about sharding?
      Development Bottleneck:
      Need to write custom code
      Collection Server
      Data Source
      Sharding
      MySQL
      Master
      MySQL
      Master
      MySQL
      Master
      User
      Reporting Server
      Sharding
      Scalable Writes!
    • Key Value stores to the rescue?
      Development Bottleneck:
      Reporting is limited / hard
      Collection Server
      Data Source
      MySQL
      Master
      MySQL
      Master
      Cassandra or
      Voldemort
      User
      Reporting Server
      Scalable Writes!
    • Can I Hadoop my way out of this?
      Development Bottleneck:
      Too many systems!
      MySQL
      Master
      MySQL
      Master
      Cassandra or
      Voldemort
      Collection Server
      Data Source
      Hadoop
      MySQL
      Master
      Scalable Writes!
      Flexible Reports!
      “Just” a Rails App
      MySQL
      Master
      User
      Reporting Server
      MySQL
      Master
      MySQL
      Slave
    • MongoDB!
      Collection Server
      Data Source
      MySQL
      Master
      MySQL
      Master
      MongoDB
      User
      Reporting Server
      Scalable Writes!
      “Just” a rails app
      Flexible Reporting!
    • MongoD
      App Server
      Data Source
      Collection
      MongoD
      Load
      Balancer
      Passenger
      Nginx
      Mongos
      Reporting
      User
      MongoD
      Sharding!
      High Concurrency
      Scale-Out
    • Sharding is critical
      14
      Distribute write load across servers
      Decentralize data storage
      Scale out!
    • Before Sharding
      15
      App
      Server
      App Server
      App Server
      Need higher write volume
      Buy a bigger database
      Need more storage volume
      Buy a bigger database
    • After Sharding
      16
      App
      Server
      App Server
      App Server
      Need higher write volume
      Add more servers
      Need more storage volume
      Add more servers
    • Scale out is the new scale up
      17
      App
      Server
      App Server
      App Server
    • How we’re using MongoDB
      18
    • Our Data Model
      19
      Document per URL we track
      Meta-data
      Summary Data
      Most recent measurements
      Document per URL per Day
      Detailed metrics
      Pre-aggregated data
    • Thinking in rows
      20
      { url: ‘www.google.com’,
      location: “SFO”
      connect: 23,
      first_byte: 123,
      last_byte: 245,
      timestamp: 1234 }
      { url: ‘www.google.com’,
      location: “NYC”
      connect: 23,
      first_byte: 123,
      last_byte: 245,
      timestamp: 2345 }
    • Thinking in rows
      21
      What was the average connect time for google on friday?
      From SFO?
      From NYC?
      Between 1AM-2AM?
    • Thinking in rows
      22
      Up to 100’s of samples per URL per day!!
      Day 1
      AVG
      Result
      Day 2
      An “average” chart had to hit 600 rows
      AVG
      Day 3
      AVG
      30 days average query range
    • Thinking in Documents
      This document contains all data for www.google.com collected during 9/20/2010
      This tells us the average value for this metric for this url / time period
      Average value from SFO
      Average value from NYC
      23
    • Storing a sample
      24
      db.metrics.dailies.update(
      { url: ‘www.google.com’,
      day: ‘9/20/2010’ },
      { ‘$inc’: {
      ‘connect.sum’:1234,
      ‘connect.count’:1,
      ‘connect.sfo.sum’:1234,
      ‘connect.sfo.count’:1 } },
      { upsert: true }
      );
      Which document we’re updating
      Update the aggregate value
      Update the location specific value
      Atomically update the document
      Create the document if it doesn’t already exist
    • Putting it together
      25
      Atomically update the daily data
      1
      { url: ‘www.google.com’,
      location: “SFO”
      connect: 23,
      first_byte: 123,
      last_byte: 245,
      timestamp: 1234 }
      Atomically update the weekly data
      2
      Atomically update the monthly data
      3
    • Drawing connect time graph
      26
      db.metrics.dailies.find(
      { url: ‘www.google.com’,
      day: { “$gte”: ‘9/1/2010’,
      “$lte”:’9/20/2010’ },
      { ‘connect’:true}
      );
      Data for google
      We just want connect time data
      Compound index to make this query fast
      The range of dates for the chart
      db.metrics.dailies.ensureIndex({url:1,day:-1})
    • More efficient charts
      27
      1 Document per URL per Day
      Day 1
      AVG
      Result
      Day 2
      Average chart hits 30 documents.
      AVG
      20x fewer
      Day 3
      AVG
      30 days == 30 documents
    • Real Time Updates
      28
      Single query to fetch all metric data for a URL
      Fast enough that browser can poll constantly for updated data without impacting server
    • Final thoughts
      Mongo has been a great choice
      80gb of data and counting
      Majorly compressed after moving from table to document oriented data model
      100’s of updates per second 24x7
      Not using Sharding in production yet, but planning on it soon
      You are using replication, right?
      29