SlideShare a Scribd company logo
1 of 28
Platform Overview Social analyticson MongoDB New York Mongodb user group  Feb 24, 2011 1
2 Why MongoDB?
Why mongodb? 3 There was no real reason. It just looked cool and we like messing with cool things because then we are cool by association.
Why mongodb? 4 Ok, actually we started with Cassandra cuz Facebook wrote Cassandra and Facebook  is really cool.
Why mongodb? 5 Ya know what’s not cool? Thrift is not cool. Thrift is a software framework for scalable cross-language services development. It combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, and OCaml.
Why mongodb? 6 So MongoDB it was. Also, 10gen is in NYC and what’s cooler than NYC? Nothing.
Transition to mongo 3 Stages Stage 1 – Started blindly writing code on a secret project. Data was NON-Critical. Stage 2 – Stage 1 worked out pretty well, we think we’re pretty smart now. Let’s go all-in. Stage 3 – OK, now we are REALLY smart. Lets use it for analytics. 7
Stage 1 Stage 1: Non Critical Project w/ Non Critical Data We thought it would be cool to let clients view their user activity. Mostly, we wanted to be able to prove to our Project Managers that it was the client who screwed up their content, not our platform. Effectively a LOG Application 8 We didn’t even read that.
Stage 1 9
Stage 1 – what we learned MongoDB is not MySQL  we re-learn this almost every day. You will too. “Schemaless” is a bad way to think about it. Schema is INCREDIBLY important. DBRef for foreign keys are kind of nasty. If you *really* need FKs, it’s usually easier to just use the IDs naturally. You probably do this in MySQL anyway and don’t bother with FK constraints. Your app handles it. You probably don’t need references. MongoDB is not MySQL. Use Sub Documents. If you don’t, you’ve just got rows and tables and rows and tables is MySQL. It’s OK to store data in sub docs that will change later. If it’s not… you’re probably trying to use the wrong tool. Don’t let that 4mb document limit worry you too much. 4mb is a lot. Use the right tool for the job!!! Typical jobs include: Logging Queues Aggregate Analytics A BSON Object is not an ORM Object. DONT take the whole document, alter it, and re-save it. (ORM) You Don’t need ORM. You probably don’t need a heavy abstraction layer. It sort of depends on what language you’re using. If you’re using PHP, you might want an abstraction layer. You might also want a new language… 10 PHP Python JavaScript Console
Stage 1 – what we learned 11 Most importantly. Don’t let people find out about secret side projects. BLAMO, You’re in production
Stage 2 Stage 2: Critical Data. Medium Volume. Big Spikes. We manage hundreds of pages on Facebook which account for 100’s of millions of fans.  Those fans posts to brand walls. Let’s build an app which lets the brand moderate that content. 12
Stage 2 13
Stage 2 – What we Learned Modifier Operators are dope. http://www.mongodb.org/display/DOCS/Updating Remember to use $set. MySQL complains, Mongo happily destroys your document. Tell your query what you want returned. Be Careful with 64 bit integers on 32 bit machines. (Facebook uses 64 bit ID’s) http://derickrethans.nl/64bit-ints-in-mongodb.html Read everything you can about indexing. You will likely create 2 dozen indexes that never get used. http://kylebanker.com/blog/2010/09/21/the-joy-of-mongodb-indexes/ Make sure your indexes fit in memory. Use Replica Sets. Seriously, use them. .stats(), .explain(), profiler and mongostat are your friends. Got slow queries? Use .explain() + .stats() to figure out if it’s using your indexes effectively. Try .hint() Got slow queries but can’t find them? Use profiler. Hell, you can query your queries! Still slow? Use mongostat to look for faults and locks. Faults = going to disc. 14
Stage 3 – Analytics Lots of “stuff” happens at Buddy Media.  We needed a structured way to keep track of all of that stuff. Has to be flexible enough to handle different levels of aggregation. Has to be near-real time. (1 minute aggregates). Need to be able to add new “stuff” or aggregates on-the-fly. Needs to handle lots of writes very fast. 15
Stage 3 – Analytics MongoDB Does. 16 Well DUH! That’s everyone’s dream Analytics system! What makes you special?  1 1 1 0 0 0 1 0 0 1 1 1 1
17 Stage 3 – analytcs Upsert + $inc =  *Remember earlier when I said Modifier Operators are awesome?
Stage 3 - Analytics 18 A Metric: { 	"_id" : ObjectId("4d656bd84b4395dce2bb7110"), 	"aggregates" : { 		"site1" : 3, 		"site2" : 2 	}, 	"type" : "pageviews", 	"period" : "minute", 	"start_date" : "2011-02-23 15:32:00", }
19
Stage 3 – Analytics  Storing the Metrics Originally, we keyed an event on [type, period, start_date, object]. This got huge… fast. Think pageviews. If you have 1000 pages and use a unique document to store pageviews for each object then you have max 1000 * 60 documents per hour. That’s for ONE Metric.  It’s not very Mongo-Like (use sub documents!). If I want to know the pageviews all 1000 pages got in an hour by minute, I have to return and iterate over 60,000 documents. Instead, we key on [type, period, start_date]. Reduces number of documents dramatically.  60 documents per hour instead of 60,000. 20
Stage 3 – Analytics  Pulling Events off the Queue 21 defgetEvent(self, db):         try:             item =db.command(                 'findAndModify',                  'events’,                 query={"status.state": 0},                 update={ "$set": {"status": {"state":1, "updated":now}}},                 sort={"created_date":1})             returnitem['value']         except:             return None One Event at a Time. No Race Conditions.
Stage 3 – Analytics  Events  Metrics  While we can only pull one event off the Queue at a time, that doesn’t mean we should process 1 event at a time. Remember, our documents contain lots of “object” aggregates. We can update a whole bunch at once. We get 10k events off Queue. At 10k (or empty Queue) we process by creating local documents in memory, adding each object to each document. We then construct a single upsert per metric instead of per event. 22
Stage 3 – Analytics  23 db.metrics.find({type:"pageviews",period:"minute", start_date:"2011-02-23 15:32:00"}); { 	"_id" : ObjectId("4d656bd84b4395dce2bb7110"), 	"aggregates" : { ”blue" : 3, ”green" : 2 	}, 	"type" : "pageviews", 	"period" : "minute", 	"start_date" : "2011-02-23 15:32:00", } Pageview Events All 5 occurred in minute 32 of 3pm metric = { "type" : "pageviews", 	"period" : "minute", 	"start_date" : "2011-02-23 15:32:00“, } aggregates[blue] = 3 aggregates[green] = 2 incrementors = {} for agg in aggregates: incrementors['aggregates.%s' % agg] = aggregates[agg] db.metrics.update(metric, {‘$inc’:incrementors}, True) Baby Jesus
Stage 3 - Analytics 24 SQL MongoDB SELECT 	DATE_TRUNC('day', event_time) AS group_time 	,COUNT(DISTINCT event_transaction_id) AS counts FROM f_localevents WHERE 1=1 	AND module_id = '4ce420bc36913' 	AND event_name = 'polls.vote_submitted' 	AND event_time >= '2010-12-01 00:00:00' 	AND event_time < '2010-12-31 00:00:00' GROUP BY 	DATE_TRUNC('day', event_time) ORDER BY group_time; Time: 0 hrs, 0 mins, 33 secs, 264 ms(holy shit) db.metrics.find( 	{  		name:"module.polls.vote_submitted",  period:"day",  start_date:{ 		"$gte":"2010-12-01 00:00:00",  			"$lte":"2010-12-31 23:59:59” 	} 	}, {"aggregates.4ce420bc36913":1}  ).explain(); {"cursor" : "BtreeCursor name_1_period_1_start_date_1", 	"nscanned" : 7, 	"nscannedObjects" : 7, 	"n" : 7, "millis" : 0, 	…
Stage 3 – what we learned You probably don’t need sharding. But if there is one situation where sharding is going to come up quickly it’s on the cloud. I can’t say, give me all pageviews for pages in category A.  There’s a way to do this, but we haven’t quite figured it out yet. Our app handles it for now. Less documents are always better. Find ways to combine data structures effectively. And last but not least…. Our LEAST favorite thing in MongoDB… 25
Stage 3 – What we learned 26 patrick@newdev:~$ mongo localhost MongoDB shell version: 1.6.4 connecting to: localhost > use analtyics; switched to dbanaltyics >
Shameless Plug(s) We are hiring all walks of life. Engineers, SysOps, Product Managers, UX Designers. Get to work on cool problems like this. (That makes you cool by association). http://bddy.me/ia8gi3 Meet us at SXSW! http://www.facebook.com/event.php?eid=204744279542095 27
Even More Shameless… 28 @patr1cks @buddymedia

More Related Content

Viewers also liked

Build Your Own Custom Mobile Analytics with Node and MongoDB
Build Your Own Custom Mobile Analytics with Node and MongoDBBuild Your Own Custom Mobile Analytics with Node and MongoDB
Build Your Own Custom Mobile Analytics with Node and MongoDBMongoDB
 
Appboy analytics - NYC MUG 11/19/13
Appboy analytics - NYC MUG 11/19/13Appboy analytics - NYC MUG 11/19/13
Appboy analytics - NYC MUG 11/19/13MongoDB
 
Webinar: Ensuring Zero Downtime for Your Mission Critical App
Webinar: Ensuring Zero Downtime for Your Mission Critical AppWebinar: Ensuring Zero Downtime for Your Mission Critical App
Webinar: Ensuring Zero Downtime for Your Mission Critical AppMongoDB
 
Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)
Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)
Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)Dan Robinson
 
How Signpost uses MongoDB for Tracking and Analytics
How Signpost uses MongoDB for Tracking and AnalyticsHow Signpost uses MongoDB for Tracking and Analytics
How Signpost uses MongoDB for Tracking and Analyticsmattinsler
 
MongoDB World 2016 Giant Ideas Stage eBook
MongoDB World 2016 Giant Ideas Stage eBookMongoDB World 2016 Giant Ideas Stage eBook
MongoDB World 2016 Giant Ideas Stage eBookMongoDB
 
Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...
Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...
Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...MongoSF
 
Replication in Distributed Database
Replication in Distributed DatabaseReplication in Distributed Database
Replication in Distributed DatabaseAbhilasha Lahigude
 
MongoFr : MongoDB as a log Collector
MongoFr : MongoDB as a log CollectorMongoFr : MongoDB as a log Collector
MongoFr : MongoDB as a log CollectorPierre Baillet
 
Scalable Event Analytics with MongoDB & Ruby on Rails
Scalable Event Analytics with MongoDB & Ruby on RailsScalable Event Analytics with MongoDB & Ruby on Rails
Scalable Event Analytics with MongoDB & Ruby on RailsJared Rosoff
 
MongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema DesignMongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema DesignMongoDB
 
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBMongoDB
 
Remaining Agile with Billions of Documents: Appboy and Creative MongoDB Schemas
Remaining Agile with Billions of Documents: Appboy and Creative MongoDB SchemasRemaining Agile with Billions of Documents: Appboy and Creative MongoDB Schemas
Remaining Agile with Billions of Documents: Appboy and Creative MongoDB SchemasMongoDB
 

Viewers also liked (14)

Build Your Own Custom Mobile Analytics with Node and MongoDB
Build Your Own Custom Mobile Analytics with Node and MongoDBBuild Your Own Custom Mobile Analytics with Node and MongoDB
Build Your Own Custom Mobile Analytics with Node and MongoDB
 
Appboy analytics - NYC MUG 11/19/13
Appboy analytics - NYC MUG 11/19/13Appboy analytics - NYC MUG 11/19/13
Appboy analytics - NYC MUG 11/19/13
 
Webinar: Ensuring Zero Downtime for Your Mission Critical App
Webinar: Ensuring Zero Downtime for Your Mission Critical AppWebinar: Ensuring Zero Downtime for Your Mission Critical App
Webinar: Ensuring Zero Downtime for Your Mission Critical App
 
Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)
Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)
Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)
 
How Signpost uses MongoDB for Tracking and Analytics
How Signpost uses MongoDB for Tracking and AnalyticsHow Signpost uses MongoDB for Tracking and Analytics
How Signpost uses MongoDB for Tracking and Analytics
 
MongoDB World 2016 Giant Ideas Stage eBook
MongoDB World 2016 Giant Ideas Stage eBookMongoDB World 2016 Giant Ideas Stage eBook
MongoDB World 2016 Giant Ideas Stage eBook
 
Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...
Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...
Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...
 
Replication in Distributed Database
Replication in Distributed DatabaseReplication in Distributed Database
Replication in Distributed Database
 
MongoFr : MongoDB as a log Collector
MongoFr : MongoDB as a log CollectorMongoFr : MongoDB as a log Collector
MongoFr : MongoDB as a log Collector
 
Scalable Event Analytics with MongoDB & Ruby on Rails
Scalable Event Analytics with MongoDB & Ruby on RailsScalable Event Analytics with MongoDB & Ruby on Rails
Scalable Event Analytics with MongoDB & Ruby on Rails
 
MongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema DesignMongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema Design
 
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
 
Remaining Agile with Billions of Documents: Appboy and Creative MongoDB Schemas
Remaining Agile with Billions of Documents: Appboy and Creative MongoDB SchemasRemaining Agile with Billions of Documents: Appboy and Creative MongoDB Schemas
Remaining Agile with Billions of Documents: Appboy and Creative MongoDB Schemas
 

Similar to Social Analytics with MongoDB

Social Analytics on MongoDB at MongoNYC
Social Analytics on MongoDB at MongoNYCSocial Analytics on MongoDB at MongoNYC
Social Analytics on MongoDB at MongoNYCPatrick Stokes
 
MongoDB and the MEAN Stack
MongoDB and the MEAN StackMongoDB and the MEAN Stack
MongoDB and the MEAN StackMongoDB
 
MongoDB Tokyo - Monitoring and Queueing
MongoDB Tokyo - Monitoring and QueueingMongoDB Tokyo - Monitoring and Queueing
MongoDB Tokyo - Monitoring and QueueingBoxed Ice
 
NoSQL into E-Commerce: lessons learned
NoSQL into E-Commerce: lessons learnedNoSQL into E-Commerce: lessons learned
NoSQL into E-Commerce: lessons learnedLa FeWeb
 
MongoDB 3.2 - a giant leap. What’s new?
MongoDB 3.2 - a giant leap. What’s new?MongoDB 3.2 - a giant leap. What’s new?
MongoDB 3.2 - a giant leap. What’s new?Binary Studio
 
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagramferreroroche11
 
Scaling Instagram
Scaling InstagramScaling Instagram
Scaling Instagramiammutex
 
Back to Basics 2017: Mí primera aplicación MongoDB
Back to Basics 2017: Mí primera aplicación MongoDBBack to Basics 2017: Mí primera aplicación MongoDB
Back to Basics 2017: Mí primera aplicación MongoDBMongoDB
 
How to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBHow to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBMongoDB
 
Lessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at CraigslistLessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at CraigslistJeremy Zawodny
 
Viacheslav Eremin interview about DOT NET (eng lang)
Viacheslav Eremin interview about DOT NET (eng lang)Viacheslav Eremin interview about DOT NET (eng lang)
Viacheslav Eremin interview about DOT NET (eng lang)Viacheslav Eremin
 
Lessons Learned from Building a Multi-Tenant Saas Content Management System o...
Lessons Learned from Building a Multi-Tenant Saas Content Management System o...Lessons Learned from Building a Multi-Tenant Saas Content Management System o...
Lessons Learned from Building a Multi-Tenant Saas Content Management System o...MongoDB
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Consjohnrjenson
 
EuroPython 2013 - FAST, DOCUMENTED AND RELIABLE JSON BASED WEBSERVICES WITH P...
EuroPython 2013 - FAST, DOCUMENTED AND RELIABLE JSON BASED WEBSERVICES WITH P...EuroPython 2013 - FAST, DOCUMENTED AND RELIABLE JSON BASED WEBSERVICES WITH P...
EuroPython 2013 - FAST, DOCUMENTED AND RELIABLE JSON BASED WEBSERVICES WITH P...Alessandro Molina
 
BreizhCamp 2013 - Pimp my backend
BreizhCamp 2013 - Pimp my backendBreizhCamp 2013 - Pimp my backend
BreizhCamp 2013 - Pimp my backendHoracio Gonzalez
 
MongoDB World 2019: Don't Panic - The Hitchhiker's Guide to the MongoDB Galaxy
MongoDB World 2019: Don't Panic - The Hitchhiker's Guide to the MongoDB GalaxyMongoDB World 2019: Don't Panic - The Hitchhiker's Guide to the MongoDB Galaxy
MongoDB World 2019: Don't Panic - The Hitchhiker's Guide to the MongoDB GalaxyMongoDB
 

Similar to Social Analytics with MongoDB (20)

Social Analytics on MongoDB at MongoNYC
Social Analytics on MongoDB at MongoNYCSocial Analytics on MongoDB at MongoNYC
Social Analytics on MongoDB at MongoNYC
 
MongoDB and the MEAN Stack
MongoDB and the MEAN StackMongoDB and the MEAN Stack
MongoDB and the MEAN Stack
 
Mongo db report
Mongo db reportMongo db report
Mongo db report
 
MongoDB Tokyo - Monitoring and Queueing
MongoDB Tokyo - Monitoring and QueueingMongoDB Tokyo - Monitoring and Queueing
MongoDB Tokyo - Monitoring and Queueing
 
NoSQL into E-Commerce: lessons learned
NoSQL into E-Commerce: lessons learnedNoSQL into E-Commerce: lessons learned
NoSQL into E-Commerce: lessons learned
 
MongoDB 3.2 - a giant leap. What’s new?
MongoDB 3.2 - a giant leap. What’s new?MongoDB 3.2 - a giant leap. What’s new?
MongoDB 3.2 - a giant leap. What’s new?
 
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
 
Scaling Instagram
Scaling InstagramScaling Instagram
Scaling Instagram
 
Back to Basics 2017: Mí primera aplicación MongoDB
Back to Basics 2017: Mí primera aplicación MongoDBBack to Basics 2017: Mí primera aplicación MongoDB
Back to Basics 2017: Mí primera aplicación MongoDB
 
How to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBHow to Achieve Scale with MongoDB
How to Achieve Scale with MongoDB
 
MongoDB Basics Unileon
MongoDB Basics UnileonMongoDB Basics Unileon
MongoDB Basics Unileon
 
Lessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at CraigslistLessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at Craigslist
 
Viacheslav Eremin interview about DOT NET (eng lang)
Viacheslav Eremin interview about DOT NET (eng lang)Viacheslav Eremin interview about DOT NET (eng lang)
Viacheslav Eremin interview about DOT NET (eng lang)
 
MyReplayInZen
MyReplayInZenMyReplayInZen
MyReplayInZen
 
Lessons Learned from Building a Multi-Tenant Saas Content Management System o...
Lessons Learned from Building a Multi-Tenant Saas Content Management System o...Lessons Learned from Building a Multi-Tenant Saas Content Management System o...
Lessons Learned from Building a Multi-Tenant Saas Content Management System o...
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Cons
 
EuroPython 2013 - FAST, DOCUMENTED AND RELIABLE JSON BASED WEBSERVICES WITH P...
EuroPython 2013 - FAST, DOCUMENTED AND RELIABLE JSON BASED WEBSERVICES WITH P...EuroPython 2013 - FAST, DOCUMENTED AND RELIABLE JSON BASED WEBSERVICES WITH P...
EuroPython 2013 - FAST, DOCUMENTED AND RELIABLE JSON BASED WEBSERVICES WITH P...
 
A Brief MongoDB Intro
A Brief MongoDB IntroA Brief MongoDB Intro
A Brief MongoDB Intro
 
BreizhCamp 2013 - Pimp my backend
BreizhCamp 2013 - Pimp my backendBreizhCamp 2013 - Pimp my backend
BreizhCamp 2013 - Pimp my backend
 
MongoDB World 2019: Don't Panic - The Hitchhiker's Guide to the MongoDB Galaxy
MongoDB World 2019: Don't Panic - The Hitchhiker's Guide to the MongoDB GalaxyMongoDB World 2019: Don't Panic - The Hitchhiker's Guide to the MongoDB Galaxy
MongoDB World 2019: Don't Panic - The Hitchhiker's Guide to the MongoDB Galaxy
 

Recently uploaded

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Social Analytics with MongoDB

  • 1. Platform Overview Social analyticson MongoDB New York Mongodb user group Feb 24, 2011 1
  • 3. Why mongodb? 3 There was no real reason. It just looked cool and we like messing with cool things because then we are cool by association.
  • 4. Why mongodb? 4 Ok, actually we started with Cassandra cuz Facebook wrote Cassandra and Facebook is really cool.
  • 5. Why mongodb? 5 Ya know what’s not cool? Thrift is not cool. Thrift is a software framework for scalable cross-language services development. It combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, and OCaml.
  • 6. Why mongodb? 6 So MongoDB it was. Also, 10gen is in NYC and what’s cooler than NYC? Nothing.
  • 7. Transition to mongo 3 Stages Stage 1 – Started blindly writing code on a secret project. Data was NON-Critical. Stage 2 – Stage 1 worked out pretty well, we think we’re pretty smart now. Let’s go all-in. Stage 3 – OK, now we are REALLY smart. Lets use it for analytics. 7
  • 8. Stage 1 Stage 1: Non Critical Project w/ Non Critical Data We thought it would be cool to let clients view their user activity. Mostly, we wanted to be able to prove to our Project Managers that it was the client who screwed up their content, not our platform. Effectively a LOG Application 8 We didn’t even read that.
  • 10. Stage 1 – what we learned MongoDB is not MySQL  we re-learn this almost every day. You will too. “Schemaless” is a bad way to think about it. Schema is INCREDIBLY important. DBRef for foreign keys are kind of nasty. If you *really* need FKs, it’s usually easier to just use the IDs naturally. You probably do this in MySQL anyway and don’t bother with FK constraints. Your app handles it. You probably don’t need references. MongoDB is not MySQL. Use Sub Documents. If you don’t, you’ve just got rows and tables and rows and tables is MySQL. It’s OK to store data in sub docs that will change later. If it’s not… you’re probably trying to use the wrong tool. Don’t let that 4mb document limit worry you too much. 4mb is a lot. Use the right tool for the job!!! Typical jobs include: Logging Queues Aggregate Analytics A BSON Object is not an ORM Object. DONT take the whole document, alter it, and re-save it. (ORM) You Don’t need ORM. You probably don’t need a heavy abstraction layer. It sort of depends on what language you’re using. If you’re using PHP, you might want an abstraction layer. You might also want a new language… 10 PHP Python JavaScript Console
  • 11. Stage 1 – what we learned 11 Most importantly. Don’t let people find out about secret side projects. BLAMO, You’re in production
  • 12. Stage 2 Stage 2: Critical Data. Medium Volume. Big Spikes. We manage hundreds of pages on Facebook which account for 100’s of millions of fans. Those fans posts to brand walls. Let’s build an app which lets the brand moderate that content. 12
  • 14. Stage 2 – What we Learned Modifier Operators are dope. http://www.mongodb.org/display/DOCS/Updating Remember to use $set. MySQL complains, Mongo happily destroys your document. Tell your query what you want returned. Be Careful with 64 bit integers on 32 bit machines. (Facebook uses 64 bit ID’s) http://derickrethans.nl/64bit-ints-in-mongodb.html Read everything you can about indexing. You will likely create 2 dozen indexes that never get used. http://kylebanker.com/blog/2010/09/21/the-joy-of-mongodb-indexes/ Make sure your indexes fit in memory. Use Replica Sets. Seriously, use them. .stats(), .explain(), profiler and mongostat are your friends. Got slow queries? Use .explain() + .stats() to figure out if it’s using your indexes effectively. Try .hint() Got slow queries but can’t find them? Use profiler. Hell, you can query your queries! Still slow? Use mongostat to look for faults and locks. Faults = going to disc. 14
  • 15. Stage 3 – Analytics Lots of “stuff” happens at Buddy Media. We needed a structured way to keep track of all of that stuff. Has to be flexible enough to handle different levels of aggregation. Has to be near-real time. (1 minute aggregates). Need to be able to add new “stuff” or aggregates on-the-fly. Needs to handle lots of writes very fast. 15
  • 16. Stage 3 – Analytics MongoDB Does. 16 Well DUH! That’s everyone’s dream Analytics system! What makes you special? 1 1 1 0 0 0 1 0 0 1 1 1 1
  • 17. 17 Stage 3 – analytcs Upsert + $inc = *Remember earlier when I said Modifier Operators are awesome?
  • 18. Stage 3 - Analytics 18 A Metric: { "_id" : ObjectId("4d656bd84b4395dce2bb7110"), "aggregates" : { "site1" : 3, "site2" : 2 }, "type" : "pageviews", "period" : "minute", "start_date" : "2011-02-23 15:32:00", }
  • 19. 19
  • 20. Stage 3 – Analytics Storing the Metrics Originally, we keyed an event on [type, period, start_date, object]. This got huge… fast. Think pageviews. If you have 1000 pages and use a unique document to store pageviews for each object then you have max 1000 * 60 documents per hour. That’s for ONE Metric. It’s not very Mongo-Like (use sub documents!). If I want to know the pageviews all 1000 pages got in an hour by minute, I have to return and iterate over 60,000 documents. Instead, we key on [type, period, start_date]. Reduces number of documents dramatically. 60 documents per hour instead of 60,000. 20
  • 21. Stage 3 – Analytics Pulling Events off the Queue 21 defgetEvent(self, db):         try:             item =db.command(                 'findAndModify',                 'events’,                 query={"status.state": 0},                 update={ "$set": {"status": {"state":1, "updated":now}}},                 sort={"created_date":1})             returnitem['value']         except:             return None One Event at a Time. No Race Conditions.
  • 22. Stage 3 – Analytics Events  Metrics While we can only pull one event off the Queue at a time, that doesn’t mean we should process 1 event at a time. Remember, our documents contain lots of “object” aggregates. We can update a whole bunch at once. We get 10k events off Queue. At 10k (or empty Queue) we process by creating local documents in memory, adding each object to each document. We then construct a single upsert per metric instead of per event. 22
  • 23. Stage 3 – Analytics 23 db.metrics.find({type:"pageviews",period:"minute", start_date:"2011-02-23 15:32:00"}); { "_id" : ObjectId("4d656bd84b4395dce2bb7110"), "aggregates" : { ”blue" : 3, ”green" : 2 }, "type" : "pageviews", "period" : "minute", "start_date" : "2011-02-23 15:32:00", } Pageview Events All 5 occurred in minute 32 of 3pm metric = { "type" : "pageviews", "period" : "minute", "start_date" : "2011-02-23 15:32:00“, } aggregates[blue] = 3 aggregates[green] = 2 incrementors = {} for agg in aggregates: incrementors['aggregates.%s' % agg] = aggregates[agg] db.metrics.update(metric, {‘$inc’:incrementors}, True) Baby Jesus
  • 24. Stage 3 - Analytics 24 SQL MongoDB SELECT DATE_TRUNC('day', event_time) AS group_time ,COUNT(DISTINCT event_transaction_id) AS counts FROM f_localevents WHERE 1=1 AND module_id = '4ce420bc36913' AND event_name = 'polls.vote_submitted' AND event_time >= '2010-12-01 00:00:00' AND event_time < '2010-12-31 00:00:00' GROUP BY DATE_TRUNC('day', event_time) ORDER BY group_time; Time: 0 hrs, 0 mins, 33 secs, 264 ms(holy shit) db.metrics.find( { name:"module.polls.vote_submitted", period:"day", start_date:{ "$gte":"2010-12-01 00:00:00", "$lte":"2010-12-31 23:59:59” } }, {"aggregates.4ce420bc36913":1} ).explain(); {"cursor" : "BtreeCursor name_1_period_1_start_date_1", "nscanned" : 7, "nscannedObjects" : 7, "n" : 7, "millis" : 0, …
  • 25. Stage 3 – what we learned You probably don’t need sharding. But if there is one situation where sharding is going to come up quickly it’s on the cloud. I can’t say, give me all pageviews for pages in category A. There’s a way to do this, but we haven’t quite figured it out yet. Our app handles it for now. Less documents are always better. Find ways to combine data structures effectively. And last but not least…. Our LEAST favorite thing in MongoDB… 25
  • 26. Stage 3 – What we learned 26 patrick@newdev:~$ mongo localhost MongoDB shell version: 1.6.4 connecting to: localhost > use analtyics; switched to dbanaltyics >
  • 27. Shameless Plug(s) We are hiring all walks of life. Engineers, SysOps, Product Managers, UX Designers. Get to work on cool problems like this. (That makes you cool by association). http://bddy.me/ia8gi3 Meet us at SXSW! http://www.facebook.com/event.php?eid=204744279542095 27
  • 28. Even More Shameless… 28 @patr1cks @buddymedia