Scalable Event Analytics with Ruby on Rails & MongoDB<br />Ruby Conf China 2010<br />Jared Rosoff (@forjared)  jrosoff@yot...
Yottaa!!!! (www.yottaa.com)<br />
Overview<br />Ruby at Scale<br />What is Event Analytics? <br />What are the different ways you could do it? <br />How we ...
Ruby At Scale?<br />http://www.flickr.com/photos/laughingsquid<br />
Event Analytics<br />Data Source<br />Event<br />User<br />Event<br />Event<br />Query<br />Event Analytics<br />Data Sour...
Oh and we are a startup<br />
Our requirements:<br />
Rails default architecture<br />Collection Server<br />Data Source<br />MySQL<br />User<br />Reporting Server<br />
Rails default architecture<br />Collection Server<br />Data Source<br />MySQL<br />User<br />Reporting Server<br />“Just” ...
Rails default architecture<br />	Performance Bottleneck: Too much load<br />Collection Server<br />Data Source<br />MySQL<...
Let’s add replication!<br />MySQL<br />Master<br />Collection Server<br />Data Source<br />Replication<br />MySQL<br />Mas...
Let’s add replication!<br />MySQL<br />Master<br />Collection Server<br />Data Source<br />Replication<br />MySQL<br />Mas...
Let’s add replication!<br />	Performance Bottleneck: Still can’t scale writes<br />MySQL<br />Master<br />Collection Serve...
What about sharding?<br />Collection Server<br />Data Source<br />Sharding<br />MySQL<br />Master<br />MySQL<br />Master<b...
What about sharding?<br />Collection Server<br />Data Source<br />Sharding<br />MySQL<br />Master<br />MySQL<br />Master<b...
What about sharding?<br />	Development Bottleneck:<br />Need to write custom code<br />Collection Server<br />Data Source<...
Key Value stores to the rescue?<br />Collection Server<br />Data Source<br />MySQL<br />Master<br />MySQL<br />Master<br /...
Key Value stores to the rescue?<br />Collection Server<br />Data Source<br />MySQL<br />Master<br />MySQL<br />Master<br /...
Key Value stores to the rescue?<br />	Development Bottleneck:<br />Reporting is limited / hard<br />Collection Server<br /...
Can I Hadoop my way out of this?<br />MySQL<br />Master<br />MySQL<br />Master<br />Cassandra or<br />Voldemort<br />Colle...
Can I Hadoop my way out of this?<br />MySQL<br />Master<br />MySQL<br />Master<br />Cassandra or<br />Voldemort<br />Colle...
Can I Hadoop my way out of this?<br />MySQL<br />Master<br />MySQL<br />Master<br />Cassandra or<br />Voldemort<br />Colle...
Can I Hadoop my way out of this?<br />MySQL<br />Master<br />MySQL<br />Master<br />Cassandra or<br />Voldemort<br />Colle...
Can I Hadoop my way out of this?<br />	Development Bottleneck:<br />Too many systems!<br />MySQL<br />Master<br />MySQL<br...
MongoDB! <br />Collection Server<br />Data Source<br />MySQL<br />Master<br />MySQL<br />Master<br />MongoDB<br />User<br ...
MongoDB! <br />Collection Server<br />Data Source<br />MySQL<br />Master<br />MySQL<br />Master<br />MongoDB<br />User<br ...
MongoDB! <br />Collection Server<br />Data Source<br />MySQL<br />Master<br />MySQL<br />Master<br />MongoDB<br />User<br ...
MongoDB! <br />Collection Server<br />Data Source<br />MySQL<br />Master<br />MySQL<br />Master<br />MongoDB<br />User<br ...
MongoD<br />App Server<br />Data Source<br />Collection<br />MongoD<br />Load<br />Balancer<br />Passenger<br />Nginx<br /...
MongoD<br />App Server<br />Data Source<br />Collection<br />MongoD<br />Load<br />Balancer<br />Passenger<br />Nginx<br /...
MongoD<br />App Server<br />Data Source<br />Collection<br />MongoD<br />Load<br />Balancer<br />Passenger<br />Nginx<br /...
MongoD<br />App Server<br />Data Source<br />Collection<br />MongoD<br />Load<br />Balancer<br />Passenger<br />Nginx<br /...
MongoDBSharding<br />
MongoDBSharding<br />Replica Sets let us scale storage & transaction capacity for each shard<br />
MongoDBSharding<br />Replica Sets let us scale storage & transaction capacity for each shard<br />Mongos routes transactio...
MongoDBSharding<br />Replica Sets let us scale storage & transaction capacity for each shard<br />Mongos routes transactio...
Inserting<br />3<br />2<br />Shard key == name<br />bob  Shard 2<br />1<br />insert { ‘name’ : bob }<br />Insert { ‘name’...
Querying<br />3<br />2<br />Shard key == name<br />bob  Shard 2<br />1<br />Query { ‘name’ : bob }<br />Query { ‘name’ : ...
Map Reduce<br />2<br />2<br />2<br />2<br />1<br />Map-reduce( … ) <br />Map-reduce(…)<br />
Working with Mongo<br />MongoMapper makes it look like ActiveRecord<br />Documents are more natural than rows in many case...
Ruby<br />Mongo<br />
Runs over all the objects in the views table, counting how many times a page was viewed<br />Adds up all the counts for a ...
Results<br />Version 1 of our analytics system took 2 weeks with 1 engineer <br />We have since added a lot more complexit...
Upcoming SlideShare
Loading in...5
×

Scalable Event Analytics with MongoDB & Ruby on Rails

31,454

Published on

Slides from my talk at RubyConfChina 2010.

Published in: Technology
7 Comments
112 Likes
Statistics
Notes
No Downloads
Views
Total Views
31,454
On Slideshare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
240
Comments
7
Likes
112
Embeds 0
No embeds

No notes for slide

Scalable Event Analytics with MongoDB & Ruby on Rails

  1. 1. Scalable Event Analytics with Ruby on Rails & MongoDB<br />Ruby Conf China 2010<br />Jared Rosoff (@forjared) jrosoff@yottaa.com<br />
  2. 2. Yottaa!!!! (www.yottaa.com)<br />
  3. 3. Overview<br />Ruby at Scale<br />What is Event Analytics? <br />What are the different ways you could do it? <br />How we did it<br />
  4. 4. Ruby At Scale?<br />http://www.flickr.com/photos/laughingsquid<br />
  5. 5. Event Analytics<br />Data Source<br />Event<br />User<br />Event<br />Event<br />Query<br />Event Analytics<br />Data Source<br />Report<br />Event<br />Event<br />Event<br />Event<br />User<br />Query<br />Data Source<br />Event<br />Event<br />Report<br />Event<br />Event<br />Event<br />Data Source<br />
  6. 6.
  7. 7. Oh and we are a startup<br />
  8. 8. Our requirements:<br />
  9. 9. Rails default architecture<br />Collection Server<br />Data Source<br />MySQL<br />User<br />Reporting Server<br />
  10. 10. Rails default architecture<br />Collection Server<br />Data Source<br />MySQL<br />User<br />Reporting Server<br />“Just” a Rails App<br />
  11. 11. Rails default architecture<br /> Performance Bottleneck: Too much load<br />Collection Server<br />Data Source<br />MySQL<br />User<br />Reporting Server<br />“Just” a Rails App<br />
  12. 12. Let’s add replication!<br />MySQL<br />Master<br />Collection Server<br />Data Source<br />Replication<br />MySQL<br />Master<br />User<br />Reporting Server<br />MySQL<br />Master<br />MySQL<br />Master<br />
  13. 13. Let’s add replication!<br />MySQL<br />Master<br />Collection Server<br />Data Source<br />Replication<br />MySQL<br />Master<br />User<br />Reporting Server<br />MySQL<br />Master<br />MySQL<br />Master<br />Off the shelf!<br />Scalable Reads!<br />
  14. 14. Let’s add replication!<br /> Performance Bottleneck: Still can’t scale writes<br />MySQL<br />Master<br />Collection Server<br />Data Source<br />Replication<br />MySQL<br />Master<br />User<br />Reporting Server<br />MySQL<br />Master<br />MySQL<br />Master<br />Off the shelf!<br />Scalable Reads!<br />
  15. 15. What about sharding?<br />Collection Server<br />Data Source<br />Sharding<br />MySQL<br />Master<br />MySQL<br />Master<br />MySQL<br />Master<br />User<br />Reporting Server<br />Sharding<br />
  16. 16. What about sharding?<br />Collection Server<br />Data Source<br />Sharding<br />MySQL<br />Master<br />MySQL<br />Master<br />MySQL<br />Master<br />User<br />Reporting Server<br />Sharding<br />Scalable Writes!<br />
  17. 17. What about sharding?<br /> Development Bottleneck:<br />Need to write custom code<br />Collection Server<br />Data Source<br />Sharding<br />MySQL<br />Master<br />MySQL<br />Master<br />MySQL<br />Master<br />User<br />Reporting Server<br />Sharding<br />Scalable Writes!<br />
  18. 18. Key Value stores to the rescue?<br />Collection Server<br />Data Source<br />MySQL<br />Master<br />MySQL<br />Master<br />Cassandra or<br />Voldemort<br />User<br />Reporting Server<br />
  19. 19. Key Value stores to the rescue?<br />Collection Server<br />Data Source<br />MySQL<br />Master<br />MySQL<br />Master<br />Cassandra or<br />Voldemort<br />User<br />Reporting Server<br />Scalable Writes!<br />
  20. 20. Key Value stores to the rescue?<br /> Development Bottleneck:<br />Reporting is limited / hard<br />Collection Server<br />Data Source<br />MySQL<br />Master<br />MySQL<br />Master<br />Cassandra or<br />Voldemort<br />User<br />Reporting Server<br />Scalable Writes!<br />
  21. 21. Can I Hadoop my way out of this?<br />MySQL<br />Master<br />MySQL<br />Master<br />Cassandra or<br />Voldemort<br />Collection Server<br />Data Source<br />Hadoop<br />MySQL<br />Master<br />MySQL<br />Master<br />User<br />Reporting Server<br />MySQL<br />Master<br />MySQL<br />Slave<br />
  22. 22. Can I Hadoop my way out of this?<br />MySQL<br />Master<br />MySQL<br />Master<br />Cassandra or<br />Voldemort<br />Collection Server<br />Data Source<br />Hadoop<br />MySQL<br />Master<br />Scalable Writes!<br />MySQL<br />Master<br />User<br />Reporting Server<br />MySQL<br />Master<br />MySQL<br />Slave<br />
  23. 23. Can I Hadoop my way out of this?<br />MySQL<br />Master<br />MySQL<br />Master<br />Cassandra or<br />Voldemort<br />Collection Server<br />Data Source<br />Hadoop<br />MySQL<br />Master<br />Scalable Writes!<br />Flexible Reports!<br />MySQL<br />Master<br />User<br />Reporting Server<br />MySQL<br />Master<br />MySQL<br />Slave<br />
  24. 24. Can I Hadoop my way out of this?<br />MySQL<br />Master<br />MySQL<br />Master<br />Cassandra or<br />Voldemort<br />Collection Server<br />Data Source<br />Hadoop<br />MySQL<br />Master<br />Scalable Writes!<br />Flexible Reports!<br />“Just” a Rails App<br />MySQL<br />Master<br />User<br />Reporting Server<br />MySQL<br />Master<br />MySQL<br />Slave<br />
  25. 25. Can I Hadoop my way out of this?<br /> Development Bottleneck:<br />Too many systems!<br />MySQL<br />Master<br />MySQL<br />Master<br />Cassandra or<br />Voldemort<br />Collection Server<br />Data Source<br />Hadoop<br />MySQL<br />Master<br />Scalable Writes!<br />Flexible Reports!<br />“Just” a Rails App<br />MySQL<br />Master<br />User<br />Reporting Server<br />MySQL<br />Master<br />MySQL<br />Slave<br />
  26. 26. MongoDB! <br />Collection Server<br />Data Source<br />MySQL<br />Master<br />MySQL<br />Master<br />MongoDB<br />User<br />Reporting Server<br />
  27. 27. MongoDB! <br />Collection Server<br />Data Source<br />MySQL<br />Master<br />MySQL<br />Master<br />MongoDB<br />User<br />Reporting Server<br />Scalable Writes!<br />
  28. 28. MongoDB! <br />Collection Server<br />Data Source<br />MySQL<br />Master<br />MySQL<br />Master<br />MongoDB<br />User<br />Reporting Server<br />Scalable Writes!<br />Flexible Reporting!<br />
  29. 29. MongoDB! <br />Collection Server<br />Data Source<br />MySQL<br />Master<br />MySQL<br />Master<br />MongoDB<br />User<br />Reporting Server<br />Scalable Writes!<br />“Just” a rails app<br />Flexible Reporting!<br />
  30. 30. MongoD<br />App Server<br />Data Source<br />Collection<br />MongoD<br />Load<br />Balancer<br />Passenger<br />Nginx<br />Mongos<br />Reporting<br />User<br />MongoD<br />
  31. 31. MongoD<br />App Server<br />Data Source<br />Collection<br />MongoD<br />Load<br />Balancer<br />Passenger<br />Nginx<br />Mongos<br />Reporting<br />User<br />MongoD<br />Sharding!<br />
  32. 32. MongoD<br />App Server<br />Data Source<br />Collection<br />MongoD<br />Load<br />Balancer<br />Passenger<br />Nginx<br />Mongos<br />Reporting<br />User<br />MongoD<br />Sharding!<br />High Concurrency<br />
  33. 33. MongoD<br />App Server<br />Data Source<br />Collection<br />MongoD<br />Load<br />Balancer<br />Passenger<br />Nginx<br />Mongos<br />Reporting<br />User<br />MongoD<br />Sharding!<br />High Concurrency<br />Scale-Out<br />
  34. 34. MongoDBSharding<br />
  35. 35. MongoDBSharding<br />Replica Sets let us scale storage & transaction capacity for each shard<br />
  36. 36. MongoDBSharding<br />Replica Sets let us scale storage & transaction capacity for each shard<br />Mongos routes transactions to shards based on “shard key”<br />
  37. 37. MongoDBSharding<br />Replica Sets let us scale storage & transaction capacity for each shard<br />Mongos routes transactions to shards based on “shard key”<br />Config servers store information about which shards exist<br />
  38. 38. Inserting<br />3<br />2<br />Shard key == name<br />bob  Shard 2<br />1<br />insert { ‘name’ : bob }<br />Insert { ‘name’ : bob }<br />
  39. 39. Querying<br />3<br />2<br />Shard key == name<br />bob  Shard 2<br />1<br />Query { ‘name’ : bob }<br />Query { ‘name’ : bob }<br />
  40. 40. Map Reduce<br />2<br />2<br />2<br />2<br />1<br />Map-reduce( … ) <br />Map-reduce(…)<br />
  41. 41. Working with Mongo<br />MongoMapper makes it look like ActiveRecord<br />Documents are more natural than rows in many cases<br />Map-Reduce rocks (but needs better support in rails)<br />http://www.flickr.com/photos/elhamalawy/2526783078/<br />
  42. 42.
  43. 43. Ruby<br />Mongo<br />
  44. 44. Runs over all the objects in the views table, counting how many times a page was viewed<br />Adds up all the counts for a unique url / date combination<br />Run the map reduce job and return a collection containing the results<br />
  45. 45. Results<br />Version 1 of our analytics system took 2 weeks with 1 engineer <br />We have since added a lot more complexity, but we did it incrementally<br />We replaced MySQL entirely with MongoDB<br />No need for joins, transactions <br />Every table is now a document collection<br />It’s fast! <br />63ms – Average response time for sending data to server<br />93ms – Average response time for displaying reports<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×