Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Scalable Event Analytics with MongoDB & Ruby on Rails

44,972 views

Published on

Slides from my talk at RubyConfChina 2010.

Published in: Technology

Scalable Event Analytics with MongoDB & Ruby on Rails

  1. Scalable Event Analytics with Ruby on Rails & MongoDB<br />Ruby Conf China 2010<br />Jared Rosoff (@forjared) jrosoff@yottaa.com<br />
  2. Yottaa!!!! (www.yottaa.com)<br />
  3. Overview<br />Ruby at Scale<br />What is Event Analytics? <br />What are the different ways you could do it? <br />How we did it<br />
  4. Ruby At Scale?<br />http://www.flickr.com/photos/laughingsquid<br />
  5. Event Analytics<br />Data Source<br />Event<br />User<br />Event<br />Event<br />Query<br />Event Analytics<br />Data Source<br />Report<br />Event<br />Event<br />Event<br />Event<br />User<br />Query<br />Data Source<br />Event<br />Event<br />Report<br />Event<br />Event<br />Event<br />Data Source<br />
  6. Oh and we are a startup<br />
  7. Our requirements:<br />
  8. Rails default architecture<br />Collection Server<br />Data Source<br />MySQL<br />User<br />Reporting Server<br />
  9. Rails default architecture<br />Collection Server<br />Data Source<br />MySQL<br />User<br />Reporting Server<br />“Just” a Rails App<br />
  10. Rails default architecture<br /> Performance Bottleneck: Too much load<br />Collection Server<br />Data Source<br />MySQL<br />User<br />Reporting Server<br />“Just” a Rails App<br />
  11. Let’s add replication!<br />MySQL<br />Master<br />Collection Server<br />Data Source<br />Replication<br />MySQL<br />Master<br />User<br />Reporting Server<br />MySQL<br />Master<br />MySQL<br />Master<br />
  12. Let’s add replication!<br />MySQL<br />Master<br />Collection Server<br />Data Source<br />Replication<br />MySQL<br />Master<br />User<br />Reporting Server<br />MySQL<br />Master<br />MySQL<br />Master<br />Off the shelf!<br />Scalable Reads!<br />
  13. Let’s add replication!<br /> Performance Bottleneck: Still can’t scale writes<br />MySQL<br />Master<br />Collection Server<br />Data Source<br />Replication<br />MySQL<br />Master<br />User<br />Reporting Server<br />MySQL<br />Master<br />MySQL<br />Master<br />Off the shelf!<br />Scalable Reads!<br />
  14. What about sharding?<br />Collection Server<br />Data Source<br />Sharding<br />MySQL<br />Master<br />MySQL<br />Master<br />MySQL<br />Master<br />User<br />Reporting Server<br />Sharding<br />
  15. What about sharding?<br />Collection Server<br />Data Source<br />Sharding<br />MySQL<br />Master<br />MySQL<br />Master<br />MySQL<br />Master<br />User<br />Reporting Server<br />Sharding<br />Scalable Writes!<br />
  16. What about sharding?<br /> Development Bottleneck:<br />Need to write custom code<br />Collection Server<br />Data Source<br />Sharding<br />MySQL<br />Master<br />MySQL<br />Master<br />MySQL<br />Master<br />User<br />Reporting Server<br />Sharding<br />Scalable Writes!<br />
  17. Key Value stores to the rescue?<br />Collection Server<br />Data Source<br />MySQL<br />Master<br />MySQL<br />Master<br />Cassandra or<br />Voldemort<br />User<br />Reporting Server<br />
  18. Key Value stores to the rescue?<br />Collection Server<br />Data Source<br />MySQL<br />Master<br />MySQL<br />Master<br />Cassandra or<br />Voldemort<br />User<br />Reporting Server<br />Scalable Writes!<br />
  19. Key Value stores to the rescue?<br /> Development Bottleneck:<br />Reporting is limited / hard<br />Collection Server<br />Data Source<br />MySQL<br />Master<br />MySQL<br />Master<br />Cassandra or<br />Voldemort<br />User<br />Reporting Server<br />Scalable Writes!<br />
  20. Can I Hadoop my way out of this?<br />MySQL<br />Master<br />MySQL<br />Master<br />Cassandra or<br />Voldemort<br />Collection Server<br />Data Source<br />Hadoop<br />MySQL<br />Master<br />MySQL<br />Master<br />User<br />Reporting Server<br />MySQL<br />Master<br />MySQL<br />Slave<br />
  21. Can I Hadoop my way out of this?<br />MySQL<br />Master<br />MySQL<br />Master<br />Cassandra or<br />Voldemort<br />Collection Server<br />Data Source<br />Hadoop<br />MySQL<br />Master<br />Scalable Writes!<br />MySQL<br />Master<br />User<br />Reporting Server<br />MySQL<br />Master<br />MySQL<br />Slave<br />
  22. Can I Hadoop my way out of this?<br />MySQL<br />Master<br />MySQL<br />Master<br />Cassandra or<br />Voldemort<br />Collection Server<br />Data Source<br />Hadoop<br />MySQL<br />Master<br />Scalable Writes!<br />Flexible Reports!<br />MySQL<br />Master<br />User<br />Reporting Server<br />MySQL<br />Master<br />MySQL<br />Slave<br />
  23. Can I Hadoop my way out of this?<br />MySQL<br />Master<br />MySQL<br />Master<br />Cassandra or<br />Voldemort<br />Collection Server<br />Data Source<br />Hadoop<br />MySQL<br />Master<br />Scalable Writes!<br />Flexible Reports!<br />“Just” a Rails App<br />MySQL<br />Master<br />User<br />Reporting Server<br />MySQL<br />Master<br />MySQL<br />Slave<br />
  24. Can I Hadoop my way out of this?<br /> Development Bottleneck:<br />Too many systems!<br />MySQL<br />Master<br />MySQL<br />Master<br />Cassandra or<br />Voldemort<br />Collection Server<br />Data Source<br />Hadoop<br />MySQL<br />Master<br />Scalable Writes!<br />Flexible Reports!<br />“Just” a Rails App<br />MySQL<br />Master<br />User<br />Reporting Server<br />MySQL<br />Master<br />MySQL<br />Slave<br />
  25. MongoDB! <br />Collection Server<br />Data Source<br />MySQL<br />Master<br />MySQL<br />Master<br />MongoDB<br />User<br />Reporting Server<br />
  26. MongoDB! <br />Collection Server<br />Data Source<br />MySQL<br />Master<br />MySQL<br />Master<br />MongoDB<br />User<br />Reporting Server<br />Scalable Writes!<br />
  27. MongoDB! <br />Collection Server<br />Data Source<br />MySQL<br />Master<br />MySQL<br />Master<br />MongoDB<br />User<br />Reporting Server<br />Scalable Writes!<br />Flexible Reporting!<br />
  28. MongoDB! <br />Collection Server<br />Data Source<br />MySQL<br />Master<br />MySQL<br />Master<br />MongoDB<br />User<br />Reporting Server<br />Scalable Writes!<br />“Just” a rails app<br />Flexible Reporting!<br />
  29. MongoD<br />App Server<br />Data Source<br />Collection<br />MongoD<br />Load<br />Balancer<br />Passenger<br />Nginx<br />Mongos<br />Reporting<br />User<br />MongoD<br />
  30. MongoD<br />App Server<br />Data Source<br />Collection<br />MongoD<br />Load<br />Balancer<br />Passenger<br />Nginx<br />Mongos<br />Reporting<br />User<br />MongoD<br />Sharding!<br />
  31. MongoD<br />App Server<br />Data Source<br />Collection<br />MongoD<br />Load<br />Balancer<br />Passenger<br />Nginx<br />Mongos<br />Reporting<br />User<br />MongoD<br />Sharding!<br />High Concurrency<br />
  32. MongoD<br />App Server<br />Data Source<br />Collection<br />MongoD<br />Load<br />Balancer<br />Passenger<br />Nginx<br />Mongos<br />Reporting<br />User<br />MongoD<br />Sharding!<br />High Concurrency<br />Scale-Out<br />
  33. MongoDBSharding<br />
  34. MongoDBSharding<br />Replica Sets let us scale storage & transaction capacity for each shard<br />
  35. MongoDBSharding<br />Replica Sets let us scale storage & transaction capacity for each shard<br />Mongos routes transactions to shards based on “shard key”<br />
  36. MongoDBSharding<br />Replica Sets let us scale storage & transaction capacity for each shard<br />Mongos routes transactions to shards based on “shard key”<br />Config servers store information about which shards exist<br />
  37. Inserting<br />3<br />2<br />Shard key == name<br />bob  Shard 2<br />1<br />insert { ‘name’ : bob }<br />Insert { ‘name’ : bob }<br />
  38. Querying<br />3<br />2<br />Shard key == name<br />bob  Shard 2<br />1<br />Query { ‘name’ : bob }<br />Query { ‘name’ : bob }<br />
  39. Map Reduce<br />2<br />2<br />2<br />2<br />1<br />Map-reduce( … ) <br />Map-reduce(…)<br />
  40. Working with Mongo<br />MongoMapper makes it look like ActiveRecord<br />Documents are more natural than rows in many cases<br />Map-Reduce rocks (but needs better support in rails)<br />http://www.flickr.com/photos/elhamalawy/2526783078/<br />
  41. Ruby<br />Mongo<br />
  42. Runs over all the objects in the views table, counting how many times a page was viewed<br />Adds up all the counts for a unique url / date combination<br />Run the map reduce job and return a collection containing the results<br />
  43. Results<br />Version 1 of our analytics system took 2 weeks with 1 engineer <br />We have since added a lot more complexity, but we did it incrementally<br />We replaced MySQL entirely with MongoDB<br />No need for joins, transactions <br />Every table is now a document collection<br />It’s fast! <br />63ms – Average response time for sending data to server<br />93ms – Average response time for displaying reports<br />

×