Your SlideShare is downloading. ×
0
Social Analytics with MongoDB<br />@BuddyMedia<br />
Disclaimer<br />+<br />= maybe not the best deck in the world<br />
What is MongoDB?<br />Document Store. <br />Schemaless.<br />High performance.<br />
Why MongoDB?<br />Months of testing<br />Data Types<br />Horizontal Scaling <br />Replication<br />Querying<br />Atomicity...
Everything in that last slide was a LIE.<br />
Same reason most of you do.<br />It’s new and cool and we wanted to check it out.<br />We become cool by association.<br /...
That last slide was kind of a lie too.<br />We started with Cassandra.<br />Cassandra was written by Facebook and Facebook...
Why Not Cassandra?<br />Thrift. <br />“Thrift is a software framework for scalable cross-language services development. It...
So MongoDB it Was.<br />Also, MongoDB Happened to be in NYC. We are in NYC. <br />NYC is Cool.<br />Proof that NYC is cool...
What You Should Know<br />MongoDB is not relational.<br />It’s also not schemaless even though they love to say that. (app...
Aggregate Analytics<br />Lots of “Stuff” happens at Buddy Media.<br />Need to keep track of it all.<br />Need to it to be ...
What does it look like?<br />
Architecture<br />
The Event Listener<br />Node.js is the perfect event listener.<br />Evented IO like Twisted or Event Machine.<br />2 days ...
Raw Event<br />A Pageview<br />{<br />	"_id" : ObjectId("4d8d0df101cddf2e6e0027af"),<br />	"created_date" : "2010-07-26 20...
Processing<br />3 resolutions<br />Minute<br />Hour<br />Day<br />1 event = 3 metric updates * number of groupings.<br />"...
Creating a Metric<br />A pageview happened and I want to update metrics for the client the page belongs to.<br />metrics.u...
Completed Metric<br />{<br />	"_id" : ObjectId("4da45cf6306a22719829b71b"),<br />	"aggregates" : {<br />		”1034" : 11<br /...
What about another client?<br />If a second pageview comes in for a different client, we end up updating the exact same re...
Some Queries<br />1. Get pageviews forallclientsthatoccurred on May 12 between 12:50 and 12:51<br />db.metrics.find({<br /...
More Queries<br />1. Get pageviews forallclientsthatoccurred on May 12 andgraphbyhour.<br />db.metrics.find({<br />	name:"...
Let’s take a peak.<br />
@patr1cks<br />@buddymedia<br />
Upcoming SlideShare
Loading in...5
×

Social Analytics on MongoDB at MongoNYC

1,230

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,230
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
28
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Social Analytics on MongoDB at MongoNYC"

  1. 1. Social Analytics with MongoDB<br />@BuddyMedia<br />
  2. 2. Disclaimer<br />+<br />= maybe not the best deck in the world<br />
  3. 3. What is MongoDB?<br />Document Store. <br />Schemaless.<br />High performance.<br />
  4. 4. Why MongoDB?<br />Months of testing<br />Data Types<br />Horizontal Scaling <br />Replication<br />Querying<br />Atomicity <br />Concurrency<br />
  5. 5. Everything in that last slide was a LIE.<br />
  6. 6. Same reason most of you do.<br />It’s new and cool and we wanted to check it out.<br />We become cool by association.<br />But mostly because we like learning new things.<br />
  7. 7. That last slide was kind of a lie too.<br />We started with Cassandra.<br />Cassandra was written by Facebook and Facebook is really cool, we wanted to be as cool as them.<br />
  8. 8. Why Not Cassandra?<br />Thrift. <br />“Thrift is a software framework for scalable cross-language services development. It combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, and OCaml.”<br />Eff that. We’re a startup.<br />
  9. 9. So MongoDB it Was.<br />Also, MongoDB Happened to be in NYC. We are in NYC. <br />NYC is Cool.<br />Proof that NYC is cool.<br />
  10. 10. What You Should Know<br />MongoDB is not relational.<br />It’s also not schemaless even though they love to say that. (applications always have schemas/data models).<br />Right tool for right job.<br />Logging<br />Queues<br />Aggregate Analytics<br />Don’t get confused with ORM.<br />Return what you need.<br />Don’t worry about document size limits.<br />
  11. 11. Aggregate Analytics<br />Lots of “Stuff” happens at Buddy Media.<br />Need to keep track of it all.<br />Need to it to be real time. <br />Need to be able to group it by various levels and resolutions.<br />Need to be able to create new metrics on the fly.<br />Write heavy, Read light.<br />
  12. 12. What does it look like?<br />
  13. 13. Architecture<br />
  14. 14. The Event Listener<br />Node.js is the perfect event listener.<br />Evented IO like Twisted or Event Machine.<br />2 days of development (maybe ~100 lines of JS). <br />0 lost events<br />0 downtime.<br />Just don’t upgrade <br />
  15. 15. Raw Event<br />A Pageview<br />{<br /> "_id" : ObjectId("4d8d0df101cddf2e6e0027af"),<br /> "created_date" : "2010-07-26 20:15:01",<br /> "data" : {<br /> "client_id" : "1034",<br /> "page_id" : "175”<br /> },<br /> "status" : {<br /> "state" : 0,<br /> "updated" : "2011-04-12 10:15:15"<br /> },<br /> "type" : "pageview"<br />}<br />
  16. 16. Processing<br />3 resolutions<br />Minute<br />Hour<br />Day<br />1 event = 3 metric updates * number of groupings.<br />"pageview": {<br /> "metrics": [<br /> { "name":"client.pageviews", "key":"client_id" },<br /> { "name":"page.pageviews", "key":"page_id" } <br />]<br />}<br />
  17. 17. Creating a Metric<br />A pageview happened and I want to update metrics for the client the page belongs to.<br />metrics.update(<br /> {<br /> 'name’:client.pageview',<br /> 'period':'minute',<br /> 'start_date':'2010-05-12 12:50:00'<br /> }, <br /> { '$inc': {'aggregates.1034':1} }, <br />upsert=True<br />);<br />
  18. 18. Completed Metric<br />{<br /> "_id" : ObjectId("4da45cf6306a22719829b71b"),<br /> "aggregates" : {<br /> ”1034" : 11<br /> },<br /> "end_date" : "2010-05-12 12:54:59",<br /> "name" : ”client.pageview",<br /> "period" : "minute",<br /> "start_date" : "2010-05-12 12:50:00",<br /> "total" : 11<br />}<br />
  19. 19. What about another client?<br />If a second pageview comes in for a different client, we end up updating the exact same record. Thus our last metric becomes:<br />{<br /> "_id" : ObjectId("4da45cf6306a22719829b71b"),<br /> "aggregates" : {<br /> ”1034" : 1,<br /> “1213”: 1<br /> },<br /> "end_date" : "2010-05-12 12:54:59",<br /> "name" : ”client.pageview",<br /> "period" : "minute",<br /> "start_date" : "2010-05-12 12:50:00",<br /> "total" : 11<br />}<br />
  20. 20. Some Queries<br />1. Get pageviews forallclientsthatoccurred on May 12 between 12:50 and 12:51<br />db.metrics.find({<br /> name:"client.pageview",<br />period:"minute",<br /> start_date:"2010-05-12 12:50:00”<br />});<br />2. Get pageviews forclient 1034 thatoccurred on May 12 between 12:50 and 12:51<br />db.metrics.find({<br /> name:"client.pageview",<br />period:"minute",<br /> start_date:"2010-05-12 12:50:00”<br />},{“aggregates.1034”:1});<br />1 Document, n entries.<br />1 Document, <br />1 entry.<br />
  21. 21. More Queries<br />1. Get pageviews forallclientsthatoccurred on May 12 andgraphbyhour.<br />db.metrics.find({<br /> name:"client.pageview",<br />period:”hour",<br />start_date:”/2010-05-12/”<br />});<br />2. Get pageviews forclient 1034 thatoccurred on May 12 andgraphby minute.<br />db.metrics.find({<br /> name:"client.pageview",<br />period:"minute",<br />start_date:”/2010-05-/”<br />},{“aggregates.1034”:1});<br />24 Documents, <br />n entries.<br />1440 Documents, <br />1 entry.<br />
  22. 22. Let’s take a peak.<br />
  23. 23. @patr1cks<br />@buddymedia<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×