Scalable Event Analytics with MongoDB & Ruby on Rails

37,157 views
36,682 views

Published on

Slides from my talk at RubyConfChina 2010.

Published in: Technology
7 Comments
115 Likes
Statistics
Notes
No Downloads
Views
Total views
37,157
On SlideShare
0
From Embeds
0
Number of Embeds
6,596
Actions
Shares
0
Downloads
258
Comments
7
Likes
115
Embeds 0
No embeds

No notes for slide

Scalable Event Analytics with MongoDB & Ruby on Rails

  1. Scalable Event Analytics with Ruby on Rails & MongoDB<br />Ruby Conf China 2010<br />Jared Rosoff (@forjared) jrosoff@yottaa.com<br />
  2. Yottaa!!!! (www.yottaa.com)<br />
  3. Overview<br />Ruby at Scale<br />What is Event Analytics? <br />What are the different ways you could do it? <br />How we did it<br />
  4. Ruby At Scale?<br />http://www.flickr.com/photos/laughingsquid<br />
  5. Event Analytics<br />Data Source<br />Event<br />User<br />Event<br />Event<br />Query<br />Event Analytics<br />Data Source<br />Report<br />Event<br />Event<br />Event<br />Event<br />User<br />Query<br />Data Source<br />Event<br />Event<br />Report<br />Event<br />Event<br />Event<br />Data Source<br />
  6. Oh and we are a startup<br />
  7. Our requirements:<br />
  8. Rails default architecture<br />Collection Server<br />Data Source<br />MySQL<br />User<br />Reporting Server<br />
  9. Rails default architecture<br />Collection Server<br />Data Source<br />MySQL<br />User<br />Reporting Server<br />“Just” a Rails App<br />
  10. Rails default architecture<br /> Performance Bottleneck: Too much load<br />Collection Server<br />Data Source<br />MySQL<br />User<br />Reporting Server<br />“Just” a Rails App<br />
  11. Let’s add replication!<br />MySQL<br />Master<br />Collection Server<br />Data Source<br />Replication<br />MySQL<br />Master<br />User<br />Reporting Server<br />MySQL<br />Master<br />MySQL<br />Master<br />
  12. Let’s add replication!<br />MySQL<br />Master<br />Collection Server<br />Data Source<br />Replication<br />MySQL<br />Master<br />User<br />Reporting Server<br />MySQL<br />Master<br />MySQL<br />Master<br />Off the shelf!<br />Scalable Reads!<br />
  13. Let’s add replication!<br /> Performance Bottleneck: Still can’t scale writes<br />MySQL<br />Master<br />Collection Server<br />Data Source<br />Replication<br />MySQL<br />Master<br />User<br />Reporting Server<br />MySQL<br />Master<br />MySQL<br />Master<br />Off the shelf!<br />Scalable Reads!<br />
  14. What about sharding?<br />Collection Server<br />Data Source<br />Sharding<br />MySQL<br />Master<br />MySQL<br />Master<br />MySQL<br />Master<br />User<br />Reporting Server<br />Sharding<br />
  15. What about sharding?<br />Collection Server<br />Data Source<br />Sharding<br />MySQL<br />Master<br />MySQL<br />Master<br />MySQL<br />Master<br />User<br />Reporting Server<br />Sharding<br />Scalable Writes!<br />
  16. What about sharding?<br /> Development Bottleneck:<br />Need to write custom code<br />Collection Server<br />Data Source<br />Sharding<br />MySQL<br />Master<br />MySQL<br />Master<br />MySQL<br />Master<br />User<br />Reporting Server<br />Sharding<br />Scalable Writes!<br />
  17. Key Value stores to the rescue?<br />Collection Server<br />Data Source<br />MySQL<br />Master<br />MySQL<br />Master<br />Cassandra or<br />Voldemort<br />User<br />Reporting Server<br />
  18. Key Value stores to the rescue?<br />Collection Server<br />Data Source<br />MySQL<br />Master<br />MySQL<br />Master<br />Cassandra or<br />Voldemort<br />User<br />Reporting Server<br />Scalable Writes!<br />
  19. Key Value stores to the rescue?<br /> Development Bottleneck:<br />Reporting is limited / hard<br />Collection Server<br />Data Source<br />MySQL<br />Master<br />MySQL<br />Master<br />Cassandra or<br />Voldemort<br />User<br />Reporting Server<br />Scalable Writes!<br />
  20. Can I Hadoop my way out of this?<br />MySQL<br />Master<br />MySQL<br />Master<br />Cassandra or<br />Voldemort<br />Collection Server<br />Data Source<br />Hadoop<br />MySQL<br />Master<br />MySQL<br />Master<br />User<br />Reporting Server<br />MySQL<br />Master<br />MySQL<br />Slave<br />
  21. Can I Hadoop my way out of this?<br />MySQL<br />Master<br />MySQL<br />Master<br />Cassandra or<br />Voldemort<br />Collection Server<br />Data Source<br />Hadoop<br />MySQL<br />Master<br />Scalable Writes!<br />MySQL<br />Master<br />User<br />Reporting Server<br />MySQL<br />Master<br />MySQL<br />Slave<br />
  22. Can I Hadoop my way out of this?<br />MySQL<br />Master<br />MySQL<br />Master<br />Cassandra or<br />Voldemort<br />Collection Server<br />Data Source<br />Hadoop<br />MySQL<br />Master<br />Scalable Writes!<br />Flexible Reports!<br />MySQL<br />Master<br />User<br />Reporting Server<br />MySQL<br />Master<br />MySQL<br />Slave<br />
  23. Can I Hadoop my way out of this?<br />MySQL<br />Master<br />MySQL<br />Master<br />Cassandra or<br />Voldemort<br />Collection Server<br />Data Source<br />Hadoop<br />MySQL<br />Master<br />Scalable Writes!<br />Flexible Reports!<br />“Just” a Rails App<br />MySQL<br />Master<br />User<br />Reporting Server<br />MySQL<br />Master<br />MySQL<br />Slave<br />
  24. Can I Hadoop my way out of this?<br /> Development Bottleneck:<br />Too many systems!<br />MySQL<br />Master<br />MySQL<br />Master<br />Cassandra or<br />Voldemort<br />Collection Server<br />Data Source<br />Hadoop<br />MySQL<br />Master<br />Scalable Writes!<br />Flexible Reports!<br />“Just” a Rails App<br />MySQL<br />Master<br />User<br />Reporting Server<br />MySQL<br />Master<br />MySQL<br />Slave<br />
  25. MongoDB! <br />Collection Server<br />Data Source<br />MySQL<br />Master<br />MySQL<br />Master<br />MongoDB<br />User<br />Reporting Server<br />
  26. MongoDB! <br />Collection Server<br />Data Source<br />MySQL<br />Master<br />MySQL<br />Master<br />MongoDB<br />User<br />Reporting Server<br />Scalable Writes!<br />
  27. MongoDB! <br />Collection Server<br />Data Source<br />MySQL<br />Master<br />MySQL<br />Master<br />MongoDB<br />User<br />Reporting Server<br />Scalable Writes!<br />Flexible Reporting!<br />
  28. MongoDB! <br />Collection Server<br />Data Source<br />MySQL<br />Master<br />MySQL<br />Master<br />MongoDB<br />User<br />Reporting Server<br />Scalable Writes!<br />“Just” a rails app<br />Flexible Reporting!<br />
  29. MongoD<br />App Server<br />Data Source<br />Collection<br />MongoD<br />Load<br />Balancer<br />Passenger<br />Nginx<br />Mongos<br />Reporting<br />User<br />MongoD<br />
  30. MongoD<br />App Server<br />Data Source<br />Collection<br />MongoD<br />Load<br />Balancer<br />Passenger<br />Nginx<br />Mongos<br />Reporting<br />User<br />MongoD<br />Sharding!<br />
  31. MongoD<br />App Server<br />Data Source<br />Collection<br />MongoD<br />Load<br />Balancer<br />Passenger<br />Nginx<br />Mongos<br />Reporting<br />User<br />MongoD<br />Sharding!<br />High Concurrency<br />
  32. MongoD<br />App Server<br />Data Source<br />Collection<br />MongoD<br />Load<br />Balancer<br />Passenger<br />Nginx<br />Mongos<br />Reporting<br />User<br />MongoD<br />Sharding!<br />High Concurrency<br />Scale-Out<br />
  33. MongoDBSharding<br />
  34. MongoDBSharding<br />Replica Sets let us scale storage & transaction capacity for each shard<br />
  35. MongoDBSharding<br />Replica Sets let us scale storage & transaction capacity for each shard<br />Mongos routes transactions to shards based on “shard key”<br />
  36. MongoDBSharding<br />Replica Sets let us scale storage & transaction capacity for each shard<br />Mongos routes transactions to shards based on “shard key”<br />Config servers store information about which shards exist<br />
  37. Inserting<br />3<br />2<br />Shard key == name<br />bob  Shard 2<br />1<br />insert { ‘name’ : bob }<br />Insert { ‘name’ : bob }<br />
  38. Querying<br />3<br />2<br />Shard key == name<br />bob  Shard 2<br />1<br />Query { ‘name’ : bob }<br />Query { ‘name’ : bob }<br />
  39. Map Reduce<br />2<br />2<br />2<br />2<br />1<br />Map-reduce( … ) <br />Map-reduce(…)<br />
  40. Working with Mongo<br />MongoMapper makes it look like ActiveRecord<br />Documents are more natural than rows in many cases<br />Map-Reduce rocks (but needs better support in rails)<br />http://www.flickr.com/photos/elhamalawy/2526783078/<br />
  41. Ruby<br />Mongo<br />
  42. Runs over all the objects in the views table, counting how many times a page was viewed<br />Adds up all the counts for a unique url / date combination<br />Run the map reduce job and return a collection containing the results<br />
  43. Results<br />Version 1 of our analytics system took 2 weeks with 1 engineer <br />We have since added a lot more complexity, but we did it incrementally<br />We replaced MySQL entirely with MongoDB<br />No need for joins, transactions <br />Every table is now a document collection<br />It’s fast! <br />63ms – Average response time for sending data to server<br />93ms – Average response time for displaying reports<br />

×