SlideShare a Scribd company logo
Appboy Analytics
Jon Hyman
NY MongoDB User Group, November 19, 2013
eBay NYC

@appboy @jon_hyman
A LITTLE BIT ABOUT
US & APPBOY
(who we are and what we do)

Appboy is a mobile relationship
management platform for apps
Jon Hyman
CIO :: @jon_hyman

!
Harvard
Bridgewater
Appboy improves
engagement by helping you
understand your app users
•

IDENTIFY - Understand demographics,

social and behavioral data
•

SEGMENT - Organize customers into

groups based on behaviors, events, user
attributes, and location
•

ENGAGE - Message users through

push notifications, emails, and multiple
forms of in-app messages
Use Case: Customer engagement begins with onboarding

Urban Outfitters

textPlus

Shape Magazine
Agenda
•

How to quickly store time series data in
MongoDB using flexible schemas


•

Learn how flexible schemas can easily
provide breakdowns across dimensions


•

Counting quickly: statistical analysis on top
of MongoDB queries
What kinds of analytics does Appboy track?
•

Lots of time series data
•

App opens over time

•

Events over time

•

Revenue over time

•

Marketing campaign stats and efficacy over time
What kinds of analytics does Appboy track?
•

Breakdowns*
•

Device types

•

Device OS versions

•

Screen resolutions

•

Revenue by product

* We also care about this over time!
What kinds of analytics does Appboy track?
•

User segment membership
•

How many users are in each
segment?

•

How many can be emailed or
reached via push notifications?

•

What is the average revenue
per user in the segment?

•

Per paying user?
Pre-aggregated Analytics:

APP OPENS OVER TIME
Typical time series collection
Log a new row for each open received
!
{!
timestamp: 2013-11-14 00:00:00 UTC,!
app_id: App identifier!
}!
!
db.app_opens.find({app_id: A, timestamp: {$gte: date}})!

Pro: Really, really simple. Easy to add attribution to users.
Con: You need to aggregate the data before
drawing the chart; lots of documents read into
memory, lots of dirty pages
Fewer documents with pre-aggregation iteration 1
Create a document that groups by the time period
!

{!
app_id: App identifier,!
date: Date of the document,!
hour: 0-23 based hour this document represents,!
opens: Number of opens this hour!
}!
!

db.app_opens.update({date: D, app_id: A, hour: 0},
{$inc: {opens:1}})
Pro: Really easy to draw histograms
Con: We never care about an hour by itself. We lose attribution.
Fewer documents with pre-aggregation iteration 2
Create a document by day and have each hour be a field
!
{!
app_id: App identifier,!
date: Date of the document,!
total_opens: Total number of opens this day,!
0: Number of opens at midnight,!
1: Number of opens at 1am,!
...!
23: Number of opens at 11pm!
}!

!
db.app_opens.update(!
{date: D, app_id: A}, !
{$inc: {“0”:1, total:1}}!
)

Pro: Document count is low, easy to use aggregation framework
for longer spans, fast: document should be in working set
Fewer documents with pre-aggregation iteration 2
•

What about looking at different dimensions?
•

App opens by device type (e.g., how do iPads

compare to iPhones?)
•

Demographics (gender, age group)
Solution!

FLEXIBLE SCHEMAS!
Fewer documents with pre-aggregation iteration 3
Dynamically add dimensions in the document

!
{!
app_id; App identifier,!
date: Date of the document,!
totals: {!
app_opens: Total number of opens this day,!
devices: {!
"iPad Air": Total number of opens on the iPad Air,!
"iPhone 4": Total number of opens on the iPhone 4,!
},!
genders: {!
male: Total number of opens from male users,!
female: Total number of opens from female users!
},!
...!
},!
0: {!
app_opens: Number of opens at midnight,!
devices: {!
"iPad Air": Number of opens on the iPad Air at midnight,!
"iPhone 4": Number of opens on the iPhone 4 at midnight,!
},!
...!
},!
...!
}!

!

db.app_opens.update({date: D, app_id: A}, {$inc: {“0”:1, total:1}})
Pre-aggregated analytics
Pros

•
•

Easily extensible to add other dimensions

•

Still only using one document, therefore you can create
charts very quickly

•

You get breakdowns over a time period for free

!

Cons

•
•

Pre-aggregated data has no attribution

•

Have to know questions ahead of time

Follow up: What if we wanted to look at a graph by age group?
Pre-aggregated analytics summary
•

Get started tracking time series
data quickly

•

You get breakdowns for free

•

Adding dimensions is super simple

•

No attribution, need to know
questions ahead of time

•

Don’t just rely on pre-aggregated
analytics
Counting quickly:

USER SEGMENTATION &
STATISTICAL ANALYSIS
User Segmentation
•A

group of users who match some set of filters
Counting quickly
Appboy shows you segment membership in real-time
as you add/edit/remove filters.
!

How do we do it quickly?
!

We estimate the population sizes of segments when
using our web UI.
Counting quickly

Goal: Quickly get the
count() of an arbitrary
query
!

Problem: MongoDB
counts are slow,
especially unindexed
ones
Counting quickly
10 million documents that represent people:
{!
favorite_color: “blue”,!
age: 27,!
gender: “M”,!
favorite_food: “pizza”,!
city: “NYC”,!
shoe_size: 11,!
attractiveness: 10,!
...!
} !
Counting quickly
10 million documents that represent people:
{!
favorite_color: “blue”,!
age: 27,!
gender: “M”,!
favorite_food: “pizza”,!
city: “NYC”,!
shoe_size: 11,!
attractiveness: 10,!
...!
} !
•

How many people like blue?

•

How many live in NYC and love pizza?

•

How many men have a shoe size less than 10?
Answer:

Big Question:
How do you estimate
counts?

The same way news
networks do it.
!

With confidence.
Counting quickly
Add a random number in a known range to each document. Say,
between 0 and 9999.
{!
random: 4583,!
favorite_color: “blue”,!
age: 27,!
gender: “M”,!
favorite_food: “pizza”,!
city: “NYC”,!
shoe_size: 11,!
attractiveness: 10,!
...!
} !

Add an index on the random number:
!

db.users.ensureIndex({random:1})
Counting quickly
Step 1: Get a random sample
!

I have 10 million documents. Of my 10,000 random “buckets”, I
should expect each “bucket” to hold about 1,000 users.
!

E.g.,
!

db.users.find({random: 123}).count() == ~1000!
db.users.find({random: 9043}).count() == ~1000!
db.users.find({random: 4982}).count() == ~1000
Counting quickly
Step 1: Get a random sample
!

Let’s take a random 100,000 users. Grab a random range that
“holds” those users. These all work:
!

db.users.find({random: {$gt: 0, $lt: 101})!
db.users.find({random: {$gt: 503, $lt: 604})!
db.users.find({random: {$gt: 8938, $lt: 9039})!
db.users.find({$or: [!
{random: {$gt: 9955}}, !
{random: {$lt: 56}}!
])
Tip: Limit $maxScan to 100,000 just to be safe
Counting quickly
Step 2: Learn about that random sample
!

db.users.find(!
{!
random: {$gt: 0, $lt: 101},!
gender: “M”,!
favorite_color: “blue”,!
size_size: {$gt: 10}!
}, !
)!
._addSpecial(“$maxScan”, 100000)!
.explain()
Explain Result:
!
{!
nscannedObjects: 100000,!
n: 11302,!
...!
} !
Counting quickly
Step 3: Do the math
!

Population: 10,000,000
!

Sample size: 100,000
!

Num matches: 11,302
!

Percentage of users who matched: 11.3%
!

Estimated total count: 1,130,000 +/- 0.2%
with 95% confidence
Counting quickly
Step 4: Optimize
!

Limit $maxScan to (100,000/numShards) to be even
faster
•

!

Cache the random range for a few hours

•
!

Add more RAM (or shards)

•
!

Cache results to not hit the database for the same
query
•
Counting quickly
Step 5: Improve
!

Get more than one count: use the aggregation
framework on top of the population’s sample size

•

•

Work around all sorts of Mongo bugs :-(
Summarize
•

Pre-aggregated analytics
•

Create a document that represents event occurrences
in some time period

•

Takes full advantage of MongoDB’s flexible schemas

•

Not a catch-all for analytics, you should still store event
data
Summarize
•

Counting quickly
•

Estimate results of arbitrary queries using population
sample sizes

•

Depending on your app, this could be a great way to
keep response time predictable as you scale
Thanks! Questions?
jon@appboy.com

@appboy @jon_hyman

More Related Content

Similar to Appboy analytics - NYC MUG 11/19/13

Data Visualization
Data VisualizationData Visualization
Data Visualization
Vera Kovaleva
 
AWS re:Invent Hackathon
AWS re:Invent HackathonAWS re:Invent Hackathon
AWS re:Invent Hackathon
Amazon Web Services
 
AppSec Pipelines and Event based Security
AppSec Pipelines and Event based SecurityAppSec Pipelines and Event based Security
AppSec Pipelines and Event based Security
Matt Tesauro
 
穆黎森:Interactive batch query at scale
穆黎森:Interactive batch query at scale穆黎森:Interactive batch query at scale
穆黎森:Interactive batch query at scale
hdhappy001
 
UI and UX for Mobile Developers
UI and UX for Mobile DevelopersUI and UX for Mobile Developers
UI and UX for Mobile Developers
Mohamed Nabil, MSc.
 
Android development first steps
Android development   first stepsAndroid development   first steps
Android development first steps
christoforosnalmpantis
 
Using MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherUsing MongoDB + Hadoop Together
Using MongoDB + Hadoop Together
MongoDB
 
Our Data Ourselves, Pydata 2015
Our Data Ourselves, Pydata 2015Our Data Ourselves, Pydata 2015
Our Data Ourselves, Pydata 2015
kingsBSD
 
amansingh.docx
amansingh.docxamansingh.docx
amansingh.docx
ammusingh2409
 
Genn.ai introduction for Buzzwords
Genn.ai introduction for BuzzwordsGenn.ai introduction for Buzzwords
Genn.ai introduction for Buzzwords
Takeshi Nakano
 
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Amazon Web Services
 
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014The Hive
 
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Krist Wongsuphasawat
 
Agile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsAgile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics Applications
DataWorks Summit
 
An Introduction To Software Development - Software Development Midterm Review
An Introduction To Software Development - Software Development Midterm ReviewAn Introduction To Software Development - Software Development Midterm Review
An Introduction To Software Development - Software Development Midterm Review
Blue Elephant Consulting
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
CareerBuilder.com
 
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
Paolo Corti
 
Un backend: pour tous vos objets connectés
Un backend: pour tous vos objets connectésUn backend: pour tous vos objets connectés
Un backend: pour tous vos objets connectés
Amazon Web Services
 
Klout as an Example Application of Topics-oriented NLP APIs
Klout as an Example Application of Topics-oriented NLP APIsKlout as an Example Application of Topics-oriented NLP APIs
Klout as an Example Application of Topics-oriented NLP APIs
Tyler Singletary
 

Similar to Appboy analytics - NYC MUG 11/19/13 (20)

Data Visualization
Data VisualizationData Visualization
Data Visualization
 
AWS re:Invent Hackathon
AWS re:Invent HackathonAWS re:Invent Hackathon
AWS re:Invent Hackathon
 
AppSec Pipelines and Event based Security
AppSec Pipelines and Event based SecurityAppSec Pipelines and Event based Security
AppSec Pipelines and Event based Security
 
穆黎森:Interactive batch query at scale
穆黎森:Interactive batch query at scale穆黎森:Interactive batch query at scale
穆黎森:Interactive batch query at scale
 
UI and UX for Mobile Developers
UI and UX for Mobile DevelopersUI and UX for Mobile Developers
UI and UX for Mobile Developers
 
Android development first steps
Android development   first stepsAndroid development   first steps
Android development first steps
 
Using MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherUsing MongoDB + Hadoop Together
Using MongoDB + Hadoop Together
 
Our Data Ourselves, Pydata 2015
Our Data Ourselves, Pydata 2015Our Data Ourselves, Pydata 2015
Our Data Ourselves, Pydata 2015
 
amansingh.docx
amansingh.docxamansingh.docx
amansingh.docx
 
Genn.ai introduction for Buzzwords
Genn.ai introduction for BuzzwordsGenn.ai introduction for Buzzwords
Genn.ai introduction for Buzzwords
 
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
 
APIs v2
APIs v2APIs v2
APIs v2
 
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
 
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
 
Agile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsAgile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics Applications
 
An Introduction To Software Development - Software Development Midterm Review
An Introduction To Software Development - Software Development Midterm ReviewAn Introduction To Software Development - Software Development Midterm Review
An Introduction To Software Development - Software Development Midterm Review
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
 
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
 
Un backend: pour tous vos objets connectés
Un backend: pour tous vos objets connectésUn backend: pour tous vos objets connectés
Un backend: pour tous vos objets connectés
 
Klout as an Example Application of Topics-oriented NLP APIs
Klout as an Example Application of Topics-oriented NLP APIsKlout as an Example Application of Topics-oriented NLP APIs
Klout as an Example Application of Topics-oriented NLP APIs
 

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 

Recently uploaded (20)

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 

Appboy analytics - NYC MUG 11/19/13

  • 1. Appboy Analytics Jon Hyman NY MongoDB User Group, November 19, 2013 eBay NYC @appboy @jon_hyman
  • 2. A LITTLE BIT ABOUT US & APPBOY (who we are and what we do) Appboy is a mobile relationship management platform for apps Jon Hyman CIO :: @jon_hyman ! Harvard Bridgewater
  • 3. Appboy improves engagement by helping you understand your app users • IDENTIFY - Understand demographics, social and behavioral data • SEGMENT - Organize customers into groups based on behaviors, events, user attributes, and location • ENGAGE - Message users through push notifications, emails, and multiple forms of in-app messages
  • 4. Use Case: Customer engagement begins with onboarding Urban Outfitters textPlus Shape Magazine
  • 5. Agenda • How to quickly store time series data in MongoDB using flexible schemas
 • Learn how flexible schemas can easily provide breakdowns across dimensions
 • Counting quickly: statistical analysis on top of MongoDB queries
  • 6. What kinds of analytics does Appboy track? • Lots of time series data • App opens over time • Events over time • Revenue over time • Marketing campaign stats and efficacy over time
  • 7. What kinds of analytics does Appboy track? • Breakdowns* • Device types • Device OS versions • Screen resolutions • Revenue by product * We also care about this over time!
  • 8. What kinds of analytics does Appboy track? • User segment membership • How many users are in each segment? • How many can be emailed or reached via push notifications? • What is the average revenue per user in the segment? • Per paying user?
  • 10. Typical time series collection Log a new row for each open received ! {! timestamp: 2013-11-14 00:00:00 UTC,! app_id: App identifier! }! ! db.app_opens.find({app_id: A, timestamp: {$gte: date}})! Pro: Really, really simple. Easy to add attribution to users. Con: You need to aggregate the data before drawing the chart; lots of documents read into memory, lots of dirty pages
  • 11. Fewer documents with pre-aggregation iteration 1 Create a document that groups by the time period ! {! app_id: App identifier,! date: Date of the document,! hour: 0-23 based hour this document represents,! opens: Number of opens this hour! }! ! db.app_opens.update({date: D, app_id: A, hour: 0}, {$inc: {opens:1}}) Pro: Really easy to draw histograms Con: We never care about an hour by itself. We lose attribution.
  • 12. Fewer documents with pre-aggregation iteration 2 Create a document by day and have each hour be a field ! {! app_id: App identifier,! date: Date of the document,! total_opens: Total number of opens this day,! 0: Number of opens at midnight,! 1: Number of opens at 1am,! ...! 23: Number of opens at 11pm! }! ! db.app_opens.update(! {date: D, app_id: A}, ! {$inc: {“0”:1, total:1}}! ) Pro: Document count is low, easy to use aggregation framework for longer spans, fast: document should be in working set
  • 13. Fewer documents with pre-aggregation iteration 2 • What about looking at different dimensions? • App opens by device type (e.g., how do iPads compare to iPhones?) • Demographics (gender, age group)
  • 15. Fewer documents with pre-aggregation iteration 3 Dynamically add dimensions in the document ! {! app_id; App identifier,! date: Date of the document,! totals: {! app_opens: Total number of opens this day,! devices: {! "iPad Air": Total number of opens on the iPad Air,! "iPhone 4": Total number of opens on the iPhone 4,! },! genders: {! male: Total number of opens from male users,! female: Total number of opens from female users! },! ...! },! 0: {! app_opens: Number of opens at midnight,! devices: {! "iPad Air": Number of opens on the iPad Air at midnight,! "iPhone 4": Number of opens on the iPhone 4 at midnight,! },! ...! },! ...! }! ! db.app_opens.update({date: D, app_id: A}, {$inc: {“0”:1, total:1}})
  • 16. Pre-aggregated analytics Pros • • Easily extensible to add other dimensions • Still only using one document, therefore you can create charts very quickly • You get breakdowns over a time period for free ! Cons • • Pre-aggregated data has no attribution • Have to know questions ahead of time Follow up: What if we wanted to look at a graph by age group?
  • 17. Pre-aggregated analytics summary • Get started tracking time series data quickly • You get breakdowns for free • Adding dimensions is super simple • No attribution, need to know questions ahead of time • Don’t just rely on pre-aggregated analytics
  • 18. Counting quickly: USER SEGMENTATION & STATISTICAL ANALYSIS
  • 19. User Segmentation •A group of users who match some set of filters
  • 20. Counting quickly Appboy shows you segment membership in real-time as you add/edit/remove filters. ! How do we do it quickly? ! We estimate the population sizes of segments when using our web UI.
  • 21. Counting quickly Goal: Quickly get the count() of an arbitrary query ! Problem: MongoDB counts are slow, especially unindexed ones
  • 22. Counting quickly 10 million documents that represent people: {! favorite_color: “blue”,! age: 27,! gender: “M”,! favorite_food: “pizza”,! city: “NYC”,! shoe_size: 11,! attractiveness: 10,! ...! } !
  • 23. Counting quickly 10 million documents that represent people: {! favorite_color: “blue”,! age: 27,! gender: “M”,! favorite_food: “pizza”,! city: “NYC”,! shoe_size: 11,! attractiveness: 10,! ...! } ! • How many people like blue? • How many live in NYC and love pizza? • How many men have a shoe size less than 10?
  • 24. Answer: Big Question: How do you estimate counts? The same way news networks do it. ! With confidence.
  • 25. Counting quickly Add a random number in a known range to each document. Say, between 0 and 9999. {! random: 4583,! favorite_color: “blue”,! age: 27,! gender: “M”,! favorite_food: “pizza”,! city: “NYC”,! shoe_size: 11,! attractiveness: 10,! ...! } ! Add an index on the random number: ! db.users.ensureIndex({random:1})
  • 26. Counting quickly Step 1: Get a random sample ! I have 10 million documents. Of my 10,000 random “buckets”, I should expect each “bucket” to hold about 1,000 users. ! E.g., ! db.users.find({random: 123}).count() == ~1000! db.users.find({random: 9043}).count() == ~1000! db.users.find({random: 4982}).count() == ~1000
  • 27. Counting quickly Step 1: Get a random sample ! Let’s take a random 100,000 users. Grab a random range that “holds” those users. These all work: ! db.users.find({random: {$gt: 0, $lt: 101})! db.users.find({random: {$gt: 503, $lt: 604})! db.users.find({random: {$gt: 8938, $lt: 9039})! db.users.find({$or: [! {random: {$gt: 9955}}, ! {random: {$lt: 56}}! ]) Tip: Limit $maxScan to 100,000 just to be safe
  • 28. Counting quickly Step 2: Learn about that random sample ! db.users.find(! {! random: {$gt: 0, $lt: 101},! gender: “M”,! favorite_color: “blue”,! size_size: {$gt: 10}! }, ! )! ._addSpecial(“$maxScan”, 100000)! .explain() Explain Result: ! {! nscannedObjects: 100000,! n: 11302,! ...! } !
  • 29. Counting quickly Step 3: Do the math ! Population: 10,000,000 ! Sample size: 100,000 ! Num matches: 11,302 ! Percentage of users who matched: 11.3% ! Estimated total count: 1,130,000 +/- 0.2% with 95% confidence
  • 30. Counting quickly Step 4: Optimize ! Limit $maxScan to (100,000/numShards) to be even faster • ! Cache the random range for a few hours • ! Add more RAM (or shards) • ! Cache results to not hit the database for the same query •
  • 31. Counting quickly Step 5: Improve ! Get more than one count: use the aggregation framework on top of the population’s sample size
 • • Work around all sorts of Mongo bugs :-(
  • 32. Summarize • Pre-aggregated analytics • Create a document that represents event occurrences in some time period • Takes full advantage of MongoDB’s flexible schemas • Not a catch-all for analytics, you should still store event data
  • 33. Summarize • Counting quickly • Estimate results of arbitrary queries using population sample sizes • Depending on your app, this could be a great way to keep response time predictable as you scale