Scalable Event Analytics with MongoDB & Ruby on Rails

  • 29,541 views
Uploaded on

Slides from my talk at RubyConfChina 2010.

Slides from my talk at RubyConfChina 2010.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
29,541
On Slideshare
0
From Embeds
0
Number of Embeds
11

Actions

Shares
Downloads
228
Comments
7
Likes
109

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Scalable Event Analytics with Ruby on Rails & MongoDB
    Ruby Conf China 2010
    Jared Rosoff (@forjared) jrosoff@yottaa.com
  • 2. Yottaa!!!! (www.yottaa.com)
  • 3. Overview
    Ruby at Scale
    What is Event Analytics?
    What are the different ways you could do it?
    How we did it
  • 4. Ruby At Scale?
    http://www.flickr.com/photos/laughingsquid
  • 5. Event Analytics
    Data Source
    Event
    User
    Event
    Event
    Query
    Event Analytics
    Data Source
    Report
    Event
    Event
    Event
    Event
    User
    Query
    Data Source
    Event
    Event
    Report
    Event
    Event
    Event
    Data Source
  • 6.
  • 7. Oh and we are a startup
  • 8. Our requirements:
  • 9. Rails default architecture
    Collection Server
    Data Source
    MySQL
    User
    Reporting Server
  • 10. Rails default architecture
    Collection Server
    Data Source
    MySQL
    User
    Reporting Server
    “Just” a Rails App
  • 11. Rails default architecture
    Performance Bottleneck: Too much load
    Collection Server
    Data Source
    MySQL
    User
    Reporting Server
    “Just” a Rails App
  • 12. Let’s add replication!
    MySQL
    Master
    Collection Server
    Data Source
    Replication
    MySQL
    Master
    User
    Reporting Server
    MySQL
    Master
    MySQL
    Master
  • 13. Let’s add replication!
    MySQL
    Master
    Collection Server
    Data Source
    Replication
    MySQL
    Master
    User
    Reporting Server
    MySQL
    Master
    MySQL
    Master
    Off the shelf!
    Scalable Reads!
  • 14. Let’s add replication!
    Performance Bottleneck: Still can’t scale writes
    MySQL
    Master
    Collection Server
    Data Source
    Replication
    MySQL
    Master
    User
    Reporting Server
    MySQL
    Master
    MySQL
    Master
    Off the shelf!
    Scalable Reads!
  • 15. What about sharding?
    Collection Server
    Data Source
    Sharding
    MySQL
    Master
    MySQL
    Master
    MySQL
    Master
    User
    Reporting Server
    Sharding
  • 16. What about sharding?
    Collection Server
    Data Source
    Sharding
    MySQL
    Master
    MySQL
    Master
    MySQL
    Master
    User
    Reporting Server
    Sharding
    Scalable Writes!
  • 17. What about sharding?
    Development Bottleneck:
    Need to write custom code
    Collection Server
    Data Source
    Sharding
    MySQL
    Master
    MySQL
    Master
    MySQL
    Master
    User
    Reporting Server
    Sharding
    Scalable Writes!
  • 18. Key Value stores to the rescue?
    Collection Server
    Data Source
    MySQL
    Master
    MySQL
    Master
    Cassandra or
    Voldemort
    User
    Reporting Server
  • 19. Key Value stores to the rescue?
    Collection Server
    Data Source
    MySQL
    Master
    MySQL
    Master
    Cassandra or
    Voldemort
    User
    Reporting Server
    Scalable Writes!
  • 20. Key Value stores to the rescue?
    Development Bottleneck:
    Reporting is limited / hard
    Collection Server
    Data Source
    MySQL
    Master
    MySQL
    Master
    Cassandra or
    Voldemort
    User
    Reporting Server
    Scalable Writes!
  • 21. Can I Hadoop my way out of this?
    MySQL
    Master
    MySQL
    Master
    Cassandra or
    Voldemort
    Collection Server
    Data Source
    Hadoop
    MySQL
    Master
    MySQL
    Master
    User
    Reporting Server
    MySQL
    Master
    MySQL
    Slave
  • 22. Can I Hadoop my way out of this?
    MySQL
    Master
    MySQL
    Master
    Cassandra or
    Voldemort
    Collection Server
    Data Source
    Hadoop
    MySQL
    Master
    Scalable Writes!
    MySQL
    Master
    User
    Reporting Server
    MySQL
    Master
    MySQL
    Slave
  • 23. Can I Hadoop my way out of this?
    MySQL
    Master
    MySQL
    Master
    Cassandra or
    Voldemort
    Collection Server
    Data Source
    Hadoop
    MySQL
    Master
    Scalable Writes!
    Flexible Reports!
    MySQL
    Master
    User
    Reporting Server
    MySQL
    Master
    MySQL
    Slave
  • 24. Can I Hadoop my way out of this?
    MySQL
    Master
    MySQL
    Master
    Cassandra or
    Voldemort
    Collection Server
    Data Source
    Hadoop
    MySQL
    Master
    Scalable Writes!
    Flexible Reports!
    “Just” a Rails App
    MySQL
    Master
    User
    Reporting Server
    MySQL
    Master
    MySQL
    Slave
  • 25. Can I Hadoop my way out of this?
    Development Bottleneck:
    Too many systems!
    MySQL
    Master
    MySQL
    Master
    Cassandra or
    Voldemort
    Collection Server
    Data Source
    Hadoop
    MySQL
    Master
    Scalable Writes!
    Flexible Reports!
    “Just” a Rails App
    MySQL
    Master
    User
    Reporting Server
    MySQL
    Master
    MySQL
    Slave
  • 26. MongoDB!
    Collection Server
    Data Source
    MySQL
    Master
    MySQL
    Master
    MongoDB
    User
    Reporting Server
  • 27. MongoDB!
    Collection Server
    Data Source
    MySQL
    Master
    MySQL
    Master
    MongoDB
    User
    Reporting Server
    Scalable Writes!
  • 28. MongoDB!
    Collection Server
    Data Source
    MySQL
    Master
    MySQL
    Master
    MongoDB
    User
    Reporting Server
    Scalable Writes!
    Flexible Reporting!
  • 29. MongoDB!
    Collection Server
    Data Source
    MySQL
    Master
    MySQL
    Master
    MongoDB
    User
    Reporting Server
    Scalable Writes!
    “Just” a rails app
    Flexible Reporting!
  • 30. MongoD
    App Server
    Data Source
    Collection
    MongoD
    Load
    Balancer
    Passenger
    Nginx
    Mongos
    Reporting
    User
    MongoD
  • 31. MongoD
    App Server
    Data Source
    Collection
    MongoD
    Load
    Balancer
    Passenger
    Nginx
    Mongos
    Reporting
    User
    MongoD
    Sharding!
  • 32. MongoD
    App Server
    Data Source
    Collection
    MongoD
    Load
    Balancer
    Passenger
    Nginx
    Mongos
    Reporting
    User
    MongoD
    Sharding!
    High Concurrency
  • 33. MongoD
    App Server
    Data Source
    Collection
    MongoD
    Load
    Balancer
    Passenger
    Nginx
    Mongos
    Reporting
    User
    MongoD
    Sharding!
    High Concurrency
    Scale-Out
  • 34. MongoDBSharding
  • 35. MongoDBSharding
    Replica Sets let us scale storage & transaction capacity for each shard
  • 36. MongoDBSharding
    Replica Sets let us scale storage & transaction capacity for each shard
    Mongos routes transactions to shards based on “shard key”
  • 37. MongoDBSharding
    Replica Sets let us scale storage & transaction capacity for each shard
    Mongos routes transactions to shards based on “shard key”
    Config servers store information about which shards exist
  • 38. Inserting
    3
    2
    Shard key == name
    bob  Shard 2
    1
    insert { ‘name’ : bob }
    Insert { ‘name’ : bob }
  • 39. Querying
    3
    2
    Shard key == name
    bob  Shard 2
    1
    Query { ‘name’ : bob }
    Query { ‘name’ : bob }
  • 40. Map Reduce
    2
    2
    2
    2
    1
    Map-reduce( … )
    Map-reduce(…)
  • 41. Working with Mongo
    MongoMapper makes it look like ActiveRecord
    Documents are more natural than rows in many cases
    Map-Reduce rocks (but needs better support in rails)
    http://www.flickr.com/photos/elhamalawy/2526783078/
  • 42.
  • 43. Ruby
    Mongo
  • 44. Runs over all the objects in the views table, counting how many times a page was viewed
    Adds up all the counts for a unique url / date combination
    Run the map reduce job and return a collection containing the results
  • 45. Results
    Version 1 of our analytics system took 2 weeks with 1 engineer
    We have since added a lot more complexity, but we did it incrementally
    We replaced MySQL entirely with MongoDB
    No need for joins, transactions
    Every table is now a document collection
    It’s fast!
    63ms – Average response time for sending data to server
    93ms – Average response time for displaying reports