SlideShare a Scribd company logo
Data Analytics with MongoDB
Jane Fine
Director of Product Marketing, Analytics
jane.fine@mongodb.com
@janeuyvova
Bryan Reinero
Product Manager
bryan@mongodb.com
@bryanreinero
Data Analytics with MongoDB
Custom Code +
Charting Libraries
ETL +
3rd Party BI
Tools
MongoDB BI Connector +
3rd Party BI Tools
MongoDB
Charts
MongoDB
Compass
But First, Setup
MongoDB Atlas Cluster
MongoDB Connector for BI
Charts Server
MongoDB Compass
https://github.com/breinero/workshop
MongoDB Connector
for Business Intelligence
MongoDB BI Connector
Visualize and explore MongoDB
data in SQL-based BI tools:
Automatically discovers the schema
Translates complex SQL statements
issued by the BI tool into MongoDB
aggregation queries
Converts the results into a tabular format
for rendering inside the BI tool
Atlas Hosted BI Connector
...
MongoDB ODBC Driver
...
Schema Translation
{
_id:
ObjectId("5b9a6d75f68257c5a994558d"),
id: 2318,
listing_url:
"https://www.airbnb.com/rooms/2318",
name: "Casa Madrona",
summary: "Gorgeous 1917 Dutch
Colonial…”
calendar: [
{ listing_id: 2318,
date: "2019-08-15",
available: "t",
price: 296
},
{ listing_id: 2318,
date: "2019-08-14",
available: "t",
price: 296
}
,...
mysql> show tables;
+----------------+
| Tables_in_test |
+----------------+
| sea |
| sea_calendar |
| sea_ratings |
+----------------+
3 rows in set (0.06 sec)
Demo
● use $ mongo to examine data in MongoDB
● use $ mysql to examine tabular representation of collections
● explain()
● mysql -h dataanalyticsworkshop-biconnector-sshrq.mongodb.net -P
27015 -u demo -p --default-auth=mongosql_auth
Exercise 1: Analysing AirBnB Data
with BI Connector
Make a DSN connection to MongoDB
Import Data Into Excel
Identify the most expensive neighborhood group in Austin
Determine average nightly price for an Airbed, per neighborhood
MongoDB BI Connector - When to Use
Want to speak SQL to MongoDB
Multi data sources (not just MongoDB)
Business analysts
Reporting only
Powerful but you lose some benefits of schema flexibility
MongoDB Charts
Options for Visualizing MongoDB Data
Custom Code + Charting
Libraries
ETL + 3rd Party BI ToolsBI Connector + 3rd Party
BI Tools
Wouldn’t it be nice if...
You could visualize your MongoDB Data…
without needing to write your own code
without needing to move your data into a different repository
without needing to purchase and configure third-party tools
without losing the richness of the Document Model
MongoDB Charts beta
The fastest way to build
visualizations over your
MongoDB data
Built for the MongoDB Document
Model
Visualize live data from on-prem
or Atlas DB
Charts Basic Concepts
A data source is a reference to a MongoDB collection or view that
contains data you want to visualize.
A chart is a visualization of data from a single data source.
A dashboard is a collection of charts which you manage as a unit
(name, layout, sharing)
What to Expect
Charting Capabilities
Common chart types
Aggregation functions
Filtering
Sample Mode
Binning
Sorting
Document Model Support
Type handling
Polymorphic collections
Nested documents
Array reductions
Sharing and Permissions
All users log onto Charts with their own account
• Users can be managed by anyone with the UserAdmin role
Any user can add a data source
• They must have a valid connection URI to connect to the MongoDB instance
• A data source owner can choose to share with nobody, specific people or
everybody
Any user can add a dashboard
• By default, the dashboard is only visible to the creator
• A dashboard owner can choose to share with nobody, specific people or
everybody
Exercise 2: Analysing AirBnB Data
with Charts
Add a Data Source
Create a Dashboard
Add some charts
Multi-Series Stacked Bar Chart
Coloured Bar Chart
Review Scores Histogram
Area Chart with Binning
The fastest way to build visualizations over your MongoDB
data
Ad hoc analyses
Benefit from the Document Model
Collaboration
Self-service
Intuitive enough for domain experts, non-devs to use!
BREAK!
2 minutes
MongoDB Aggregation Framework
and Compass
Rich Queries
Point | Range | Geospatial | Faceted Search | Aggregations | JOINs | Graph Traversals
JSON Documents Tabular Key-Value Text GraphGeospatial
MongoDB Aggregation Framework
MongoDB
{ customer_id : 1,
first_name : "Mark",
last_name : "Smith",
city : "San Francisco",
phones: [ {
number : "1-212-777-1212",
type : "work"
},
{
number : "1-212-777-1213",
type : "cell"
}]
……...
Expressive
Queries
Find anyone with phone # “1-212…”
Check if the person with number “555…” is on the “do not call” list
Geospatial
Find the best offer for the customer at geo coordinates of 42nd St.
and 6th Ave
Text Search Find all tweets that mention the firm within the last 2 days
Aggregation
Count and sort number of customers by city, compute min, max, and
average spend
Native Binary
JSON Support
Add an additional phone number to Mark Smith’s record without
rewriting the document
Update just 2 phone numbers out of 10
Sort on the modified date
JOIN
($lookup)
Query for all San Francisco residences, lookup their transactions,
and sum the amount by person
Graph Queries
($graphLookup)
Query for all people within 3 degrees of separation from Mark
Rich query functionality
Advanced data processing
pipeline for transformations
and analytics
Multiple stages
Similar to a unix pipe
Rich Expressions
Collection
db.orders.aggregate( [
$match stage {$match: { status: "A" } },
$group stage { $group: { _id: "$cust_id",total: { $sum: "$amount" } }
}
] )
{
cust_id: "A123",
amount: 500,
status: "A",
}
{
cust_id: "A123",
amount: 250,
status: "A",
}
{
cust_id: "B212",
amount: 200,
status: "A",
}
{
cust_id: "A123",
amount: 300,
status: "D",
}
Orders
{
cust_id: "A123",
amount: 500,
status: "A",
}
{
cust_id: "A123",
amount: 250,
status: "A",
}
{
cust_id: "B212",
amount: 200,
status: "A",
}
{
id: "A123",
total: 750
}
{
id: "B212",
total: 200
}
$match $group
Aggregations
$match
$group
$facet
$geoNear
$graphLookup
Pipeline Stages
Mathematical
$add, $abs,
$substract,
$multiply, $divide,
$log, $log10,
$stdDevPop,
$stdDevSam, $avg,
$sqrt, $pow, $sum,
$zip, $convert, etc.
Array
$push, $reduce,
$reverseArray,
$addToSet,
$arrayElemAt,
$slice, etc.
Operators
$lookup
$project
$sort
$unwind
Conditionals
$and, $or, $eq, $lt,
$lte, $gt, $gte,
$cmp, $cond,
$switch, $in, etc
Date
$dateFromParts,
$dateToParts,
$dateFromString,
$dateToString,
$dayOfMonth,
$isoWeek, $minute,
$month, $year, etc.
String
$toUpper, $toLower,
$substr,
$strcasecmp,
$concat, $split, etc.
Laterals
$exp, $let, $literal,
$map, $type, etc
Aggregation Features
Developer / Data Analyst Tool
Data management and manipulation
document view
table view
Visual schema analyzer
with query builder
export to language
Aggregation pipeline builder
A good place to start
Exercise 3: Building Powerful
Aggregations
Explore data by doing some basic finds
Learn how to do aggregations you usually do in SQL
$match
$group
Learn how to do aggregations you cannot do in SQL
$objectToArray
$graphlookup
$geonear
Find or Count Specific Documents
4. Find how many listings have a real bed and are greater than
$100 but less than $200 per night by filling in the blanks
db.___.find( {
price: { ___: ___, $lt: ___ } } ,
{ ___ : "Real Bed" }).count()
db.austinListingsAndReviews.find( {
"price": { $gt: 100, $lt: 200 } } ,
{"bed_type" : "Real Bed"}).count()
Answer: 2892
Minimum price per night by number
of beds
Take a look at this example aggregation which computes minimum
price per night by number of beds
db.austinListingsAndReviews.aggregate( [
{
$group: {
_id: "$beds",
avg_by_beds: { $min: "$price" }
}
},
{ $sort: { avg_by_beds: 1 } }
] )
Average price per night by room type
Construct a very similar query to find average price per night
by room type by filling in the blanks.
db.austinListingsAndReviews.aggregate( [
{
$group: {
_id: "___",
avg_by_roomtype: { $avg: "___" }
}
}}
] )
ANSWER: Average price per night by
room type
Construct a very similar query to find average price per night
by room type by filling in the blanks.
db.austinListingsAndReviews.aggregate( [
{
$group: {
_id: "$room_type",
avg_by_roomtype: { $avg: "$price" }
}
}}
] )
Average price per night by suburb
Let’s practice some more.
This time write the query from scratch to find the average
price per night by suburb. Examine the document structure
first.
Hint: suburb is a field within a subdocument address; use dot
notation to reference it.
ANSWER: Average price per night by
suburb
db.austinListingsAndReviews.aggregate( [
{
$group: {
_id: "$address.suburb",
avg_by_suburb: { $avg: "$price" }
}
}
] )
Count and average price per night
Compute the count of properties and average price per night by
suburbs and number of bedrooms by filling in the blanks.
db.austinListingsAndReviews.aggregate([
{"$group" :
{
_id:{suburb:"$___", bedrooms:"$___"},
___:{$sum:1},
avg:{$___:"$price"}}
}
])
ANSWER: Count and average price
per night
db.austinListingsAndReviews.aggregate([
{"$group" :
{
_id:{suburb:"$address.suburb", bedrooms:"$bedrooms"},
count:{$sum:1},
avg:{$avg:"$price"}}
}
])
Add $match Stage
Use the query from previous step and add a $match stage to only
show those properties that have a pool
db.austinListingsAndReviews.aggregate([
{ $match : {amenities: "Pool"} },
{"$group" :
{
_id:{suburb:"$address.suburb",bedrooms:"$bedrooms"},
count:{$sum:1},
avg:{$avg:"$price"}}
}
])
Learn how to do aggregations you
cannot do in SQL
We want to determine average review scores for all properties
in Austin based on some amenities.
Remember that review_scores is a sub document. Luckily we have
$objectToArray to transform our object into an array which we
can then $unwind and $group.
Average review scores for all
properties in Austin with amenities
db.austinListingsAndReviews.aggregate([
{ $match : {
amenities: { $in: ["Cable TV", "Garden or backyard",
"Coffee maker"] } } },
{ $project: { scores: { $objectToArray: "$review_scores" } } },
{ $unwind: "$scores" },
{ $group: { _id: "$scores.k", average_score: { $avg:
"$scores.v" } } }
])
Compare two neighbourhoods
Compare two neighbourhoods - South Lamar and West Campus - and
decide based on average review where you would rather stay.
Assume you also want a place that has pool, wifi, and
air-conditioning, under $200
Hint: You will need to look up how to use $and operator for this
ANSWER: South Lamar
db.austinListingsAndReviews.aggregate([
{
$match: {
$and: [
{amenities: { $in: ["Pool", "Air conditioning", "Wifi"] } },
{price: { $lt: 200 } },
{"address.suburb": "South Lamar"}
]
}
},
{ $project: { scores: { $objectToArray: "$review_scores" } } },
{ $unwind: "$scores" },
{ $group: { _id: "$scores.k", average_score: { $avg: "$scores.v" } } }
])
ANSWER: West Campus
db.austinListingsAndReviews.aggregate([
{
$match: {
$and: [
{amenities: { $in: ["Pool", "Air conditioning", "Wifi"] } },
{price: { $lt: 200 } },
{"address.suburb": "West Campus"}
]
}
},
{ $project: { scores: { $objectToArray: "$review_scores" } } },
{ $unwind: "$scores" },
{ $group: { _id: "$scores.k", average_score: { $avg: "$scores.v" } } }
])
$geoNear
Use $geoNear to find airbnbs near location of your choice (or
near Alamo Drafthouse - 30.2988474,-97.7064386) and add the
same search criteria we used in in the prior exercise to only
show airbnbs that have pool, wifi, and air-conditioning, under
$200
ANSWER: $geoNear
db.austinListingsAndReviews.aggregate([
{
$geoNear: {
near: { type: "Point", coordinates: [ -97.7064386 , 30.2988474 ] },
distanceField: "dist.calculated",
maxDistance: 500,
query: { $and: [ {amenities: { $in: ["Pool", "Air conditioning", "Wifi"] } },
{price: { $lt: 200 } }]},
includeLocs: "dist.location",
num: 5,
spherical: true
}
}]).pretty()
$graphLookup
Let’s say we want to build a recommendation engine based on
users that reviewed the same properties.
We can use $graphLookup to run a recursive query that says:
“for a given reviewer, e.g. 7538 let’s find all listings
reviewed by users who also reviewed the same listings as 7538”
$graphLookup - example
db.austinListingsAndReviews.aggregate( [
{ $match: { "reviews.reviewer_id": "7538" } } ,{ $unwind: "$reviews" },{
$graphLookup: {
from: "austinListingsAndReviews",
startWith: "$reviews.reviewer_id",
connectFromField: "reviews.reviewer_id",
connectToField: "reviews.reviewer_id",
as: "reviewed_listings",
maxDepth: 1,
depthField: "depth",
restrictSearchWithMatch: { "reviews.reviewer_id": { $ne: "7538" } }
}},
{ $project: {
_id: 0,
listing: "$_id",
url: "$listing_url" }}]).pretty()
Now try $graphlookup yourself
Some host in Austin have multiple listings, for example,
host_id:100208662. Construct a $graphLookup query that finds
all the listings by that host.
ANSWER: $graphlookup
db.austinListingsAndReviews.aggregate( [
{ $match: { "host.host_id": "100208662" } },
{ $graphLookup:
{
from: "austinListingsAndReviews",
startWith: "$host.host_id",
connectFromField: "host.host_id",
connectToField: "host.host_id",
as: "multi-listing-hosts"
} }]).pretty()
Exploratory data analysis
Data preparation & basic manipulation
Data ingestion via JSON or CSV import
Day-to-day development/operations
Adding and understanding indexes
Adding validation rules
Authoring & troubleshooting aggregation pipelines
Viewing real-time server stats
10,000 → 1ft view of data
MongoDB Connector
for Apache Spark
Business Intelligence, Analytics, Machine Learning
Process data in MongoDB with the massive parallelism
of Spark, it's machine learning libraries, and streaming
API
● Process data “in place”, avoiding the
latency otherwise required by an
incremental ETL task.
● Reduced Operational Complexity and Faster
Time-To-Analytics
● Aggregation pre-filtering in combination with
secondary indexing means that an analytics
query only draws that data required
● Multiple Language APIs
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
Business Intelligence, Analytics, Machine Learning
Process data in MongoDB with the massive
parallelism of Spark, it's machine learning libraries,
and streaming API
● Process data “in place”, avoiding the latency
otherwise required by an incremental ETL task
● Aggregation pre-filtering in combination
with secondary indexing means that an
analytics query only draws that data
required
● Reads from secondaries isolate analytics
workload from business critical operations
● Shard aware for data locality
WRITE
READ
Primary
2ndary
2ndary
Business Intelligence, Analytics, Machine Learning
Process data in MongoDB with the massive
parallelism of Spark, it's machine learning libraries,
and streaming API
● Process data “in place”, avoiding the latency
otherwise required by an incremental ETL task
● Aggregation pre-filtering in combination with
secondary indexing means that an analytics
query only draws that data required
● Reads from secondaries isolate analytics
workload from business critical operations
● Shard aware for data locality
Business Intelligence, Analytics, Machine Learning
Process data in MongoDB with the massive
parallelism of Spark, it's machine learning libraries,
and streaming API
● Process data “in place”, avoiding the latency
otherwise required by an incremental ETL task
● Aggregation pre-filtering in combination with
secondary indexing means that an analytics
query only draws that data required
● Reads from secondaries isolate analytics
workload from business critical operations
● Multiple language APIs
Partners and Certifications
Coming soon,....
Using MongoDB Spark Connector
./spark-2.3.0/bin/spark-shell 
--conf 
"spark.mongodb.input.uri=mongodb+srv://demo:demo@dataanalyticsworks
hop-sshrq.mongodb.net/recommendation.ratings" 
--conf 
"spark.mongodb.output.uri=mongodb+srv://demo:demo@dataanalyticswork
shop-sshrq.mongodb.net/recommendation.perUser" 
--packages org.mongodb.spark:mongo-spark-connector_2.11:2.3.0
Exercise 4: MongoDB Spark
Connector
https://community.cloud.databricks.com/?o=2374217986360057#notebook/42327714871690/command/42327714871707
Databricks
Whats Next?
● Github: https://github.com/breinero/workshop
● M121: The MongoDB Aggregation Framework -
https://university.mongodb.com/courses/M121/about
● Videos
○ https://www.mongodb.com/presentations/world18-data-analytics-with-mongodb
○ Charts -
https://www.mongodb.com/presentations/world18-bringing-data-to-life-with-m
ongodb-charts
○ Spark -
https://www.mongodb.com/presentations/analytics-and-machine-learning-with-
spark-and-mongodb-bryan-reinero
Thank You!

More Related Content

What's hot

What's hot (20)

Power query
Power queryPower query
Power query
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
E-Book 25 Tips and Tricks MS Excel Functions & Formulaes
E-Book 25 Tips and Tricks MS Excel Functions & FormulaesE-Book 25 Tips and Tricks MS Excel Functions & Formulaes
E-Book 25 Tips and Tricks MS Excel Functions & Formulaes
 
ppt on data tab in ms.excel
ppt on data tab in ms.excelppt on data tab in ms.excel
ppt on data tab in ms.excel
 
DAX (Data Analysis eXpressions) from Zero to Hero
DAX (Data Analysis eXpressions) from Zero to HeroDAX (Data Analysis eXpressions) from Zero to Hero
DAX (Data Analysis eXpressions) from Zero to Hero
 
Workshop: Make the Most of Customer Data Platforms - David Raab
Workshop: Make the Most of Customer Data Platforms - David RaabWorkshop: Make the Most of Customer Data Platforms - David Raab
Workshop: Make the Most of Customer Data Platforms - David Raab
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 
Google Sheets in Python with gspread
Google Sheets in Python with gspreadGoogle Sheets in Python with gspread
Google Sheets in Python with gspread
 
Intro to DAX Patterns
Intro to DAX PatternsIntro to DAX Patterns
Intro to DAX Patterns
 
Microsoft Access.ppt
Microsoft Access.pptMicrosoft Access.ppt
Microsoft Access.ppt
 
Data analysis and Data Visualization using Microsoft Excel
Data analysis and Data Visualization using Microsoft ExcelData analysis and Data Visualization using Microsoft Excel
Data analysis and Data Visualization using Microsoft Excel
 
SQL for interview
SQL for interviewSQL for interview
SQL for interview
 
Automate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile wayAutomate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile way
 
Data Analysis Expressions (DAX) Training
Data Analysis Expressions (DAX) TrainingData Analysis Expressions (DAX) Training
Data Analysis Expressions (DAX) Training
 
Data visualization with sql analytics
Data visualization with sql analyticsData visualization with sql analytics
Data visualization with sql analytics
 
The what, why, and how of master data management
The what, why, and how of master data managementThe what, why, and how of master data management
The what, why, and how of master data management
 
Lakehouse in Azure
Lakehouse in AzureLakehouse in Azure
Lakehouse in Azure
 
Excelpresentationdatavalidation
ExcelpresentationdatavalidationExcelpresentationdatavalidation
Excelpresentationdatavalidation
 
Excel DATEDIFF Function
Excel DATEDIFF FunctionExcel DATEDIFF Function
Excel DATEDIFF Function
 

Similar to MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB

Eagle6 mongo dc revised
Eagle6 mongo dc revisedEagle6 mongo dc revised
Eagle6 mongo dc revised
MongoDB
 
Intro To Mongo Db
Intro To Mongo DbIntro To Mongo Db
Intro To Mongo Db
chriskite
 

Similar to MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB (20)

Data Analytics with MongoDB - Jane Fine
Data Analytics with MongoDB - Jane FineData Analytics with MongoDB - Jane Fine
Data Analytics with MongoDB - Jane Fine
 
MongoDB Meetup
MongoDB MeetupMongoDB Meetup
MongoDB Meetup
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
 
Simplifying & accelerating application development with MongoDB's intelligent...
Simplifying & accelerating application development with MongoDB's intelligent...Simplifying & accelerating application development with MongoDB's intelligent...
Simplifying & accelerating application development with MongoDB's intelligent...
 
Introduction to MongoDB and Workshop
Introduction to MongoDB and WorkshopIntroduction to MongoDB and Workshop
Introduction to MongoDB and Workshop
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
How to leverage what's new in MongoDB 3.6
How to leverage what's new in MongoDB 3.6How to leverage what's new in MongoDB 3.6
How to leverage what's new in MongoDB 3.6
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
 
MongoDB Evenings Dallas: What's the Scoop on MongoDB & Hadoop
MongoDB Evenings Dallas: What's the Scoop on MongoDB & HadoopMongoDB Evenings Dallas: What's the Scoop on MongoDB & Hadoop
MongoDB Evenings Dallas: What's the Scoop on MongoDB & Hadoop
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
MongoDB .local Chicago 2019: Best Practices for Working with IoT and Time-ser...
MongoDB .local Chicago 2019: Best Practices for Working with IoT and Time-ser...MongoDB .local Chicago 2019: Best Practices for Working with IoT and Time-ser...
MongoDB .local Chicago 2019: Best Practices for Working with IoT and Time-ser...
 
MongoDB
MongoDBMongoDB
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB Aggregations Indexing and Profiling
MongoDB Aggregations Indexing and ProfilingMongoDB Aggregations Indexing and Profiling
MongoDB Aggregations Indexing and Profiling
 
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
 
Eagle6 mongo dc revised
Eagle6 mongo dc revisedEagle6 mongo dc revised
Eagle6 mongo dc revised
 
Eagle6 Enterprise Situational Awareness
Eagle6 Enterprise Situational AwarenessEagle6 Enterprise Situational Awareness
Eagle6 Enterprise Situational Awareness
 
Intro To Mongo Db
Intro To Mongo DbIntro To Mongo Db
Intro To Mongo Db
 
MongoDB .local Munich 2019: Best Practices for Working with IoT and Time-seri...
MongoDB .local Munich 2019: Best Practices for Working with IoT and Time-seri...MongoDB .local Munich 2019: Best Practices for Working with IoT and Time-seri...
MongoDB .local Munich 2019: Best Practices for Working with IoT and Time-seri...
 
Online | MongoDB Atlas on GCP Workshop
Online | MongoDB Atlas on GCP Workshop Online | MongoDB Atlas on GCP Workshop
Online | MongoDB Atlas on GCP Workshop
 

More from MongoDB

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
 

Recently uploaded

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 

MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB

  • 2. Jane Fine Director of Product Marketing, Analytics jane.fine@mongodb.com @janeuyvova
  • 4. Data Analytics with MongoDB Custom Code + Charting Libraries ETL + 3rd Party BI Tools MongoDB BI Connector + 3rd Party BI Tools MongoDB Charts MongoDB Compass
  • 5. But First, Setup MongoDB Atlas Cluster MongoDB Connector for BI Charts Server MongoDB Compass https://github.com/breinero/workshop
  • 7. MongoDB BI Connector Visualize and explore MongoDB data in SQL-based BI tools: Automatically discovers the schema Translates complex SQL statements issued by the BI tool into MongoDB aggregation queries Converts the results into a tabular format for rendering inside the BI tool
  • 8. Atlas Hosted BI Connector ...
  • 10. Schema Translation { _id: ObjectId("5b9a6d75f68257c5a994558d"), id: 2318, listing_url: "https://www.airbnb.com/rooms/2318", name: "Casa Madrona", summary: "Gorgeous 1917 Dutch Colonial…” calendar: [ { listing_id: 2318, date: "2019-08-15", available: "t", price: 296 }, { listing_id: 2318, date: "2019-08-14", available: "t", price: 296 } ,... mysql> show tables; +----------------+ | Tables_in_test | +----------------+ | sea | | sea_calendar | | sea_ratings | +----------------+ 3 rows in set (0.06 sec)
  • 11. Demo ● use $ mongo to examine data in MongoDB ● use $ mysql to examine tabular representation of collections ● explain() ● mysql -h dataanalyticsworkshop-biconnector-sshrq.mongodb.net -P 27015 -u demo -p --default-auth=mongosql_auth
  • 12. Exercise 1: Analysing AirBnB Data with BI Connector Make a DSN connection to MongoDB Import Data Into Excel Identify the most expensive neighborhood group in Austin Determine average nightly price for an Airbed, per neighborhood
  • 13. MongoDB BI Connector - When to Use Want to speak SQL to MongoDB Multi data sources (not just MongoDB) Business analysts Reporting only Powerful but you lose some benefits of schema flexibility
  • 15. Options for Visualizing MongoDB Data Custom Code + Charting Libraries ETL + 3rd Party BI ToolsBI Connector + 3rd Party BI Tools
  • 16. Wouldn’t it be nice if... You could visualize your MongoDB Data… without needing to write your own code without needing to move your data into a different repository without needing to purchase and configure third-party tools without losing the richness of the Document Model
  • 17. MongoDB Charts beta The fastest way to build visualizations over your MongoDB data Built for the MongoDB Document Model Visualize live data from on-prem or Atlas DB
  • 18. Charts Basic Concepts A data source is a reference to a MongoDB collection or view that contains data you want to visualize. A chart is a visualization of data from a single data source. A dashboard is a collection of charts which you manage as a unit (name, layout, sharing)
  • 19. What to Expect Charting Capabilities Common chart types Aggregation functions Filtering Sample Mode Binning Sorting Document Model Support Type handling Polymorphic collections Nested documents Array reductions
  • 20. Sharing and Permissions All users log onto Charts with their own account • Users can be managed by anyone with the UserAdmin role Any user can add a data source • They must have a valid connection URI to connect to the MongoDB instance • A data source owner can choose to share with nobody, specific people or everybody Any user can add a dashboard • By default, the dashboard is only visible to the creator • A dashboard owner can choose to share with nobody, specific people or everybody
  • 21. Exercise 2: Analysing AirBnB Data with Charts Add a Data Source Create a Dashboard Add some charts Multi-Series Stacked Bar Chart Coloured Bar Chart Review Scores Histogram Area Chart with Binning
  • 22. The fastest way to build visualizations over your MongoDB data Ad hoc analyses Benefit from the Document Model Collaboration Self-service Intuitive enough for domain experts, non-devs to use!
  • 25. Rich Queries Point | Range | Geospatial | Faceted Search | Aggregations | JOINs | Graph Traversals JSON Documents Tabular Key-Value Text GraphGeospatial MongoDB Aggregation Framework
  • 26. MongoDB { customer_id : 1, first_name : "Mark", last_name : "Smith", city : "San Francisco", phones: [ { number : "1-212-777-1212", type : "work" }, { number : "1-212-777-1213", type : "cell" }] ……... Expressive Queries Find anyone with phone # “1-212…” Check if the person with number “555…” is on the “do not call” list Geospatial Find the best offer for the customer at geo coordinates of 42nd St. and 6th Ave Text Search Find all tweets that mention the firm within the last 2 days Aggregation Count and sort number of customers by city, compute min, max, and average spend Native Binary JSON Support Add an additional phone number to Mark Smith’s record without rewriting the document Update just 2 phone numbers out of 10 Sort on the modified date JOIN ($lookup) Query for all San Francisco residences, lookup their transactions, and sum the amount by person Graph Queries ($graphLookup) Query for all people within 3 degrees of separation from Mark Rich query functionality
  • 27. Advanced data processing pipeline for transformations and analytics Multiple stages Similar to a unix pipe Rich Expressions Collection db.orders.aggregate( [ $match stage {$match: { status: "A" } }, $group stage { $group: { _id: "$cust_id",total: { $sum: "$amount" } } } ] ) { cust_id: "A123", amount: 500, status: "A", } { cust_id: "A123", amount: 250, status: "A", } { cust_id: "B212", amount: 200, status: "A", } { cust_id: "A123", amount: 300, status: "D", } Orders { cust_id: "A123", amount: 500, status: "A", } { cust_id: "A123", amount: 250, status: "A", } { cust_id: "B212", amount: 200, status: "A", } { id: "A123", total: 750 } { id: "B212", total: 200 } $match $group Aggregations
  • 28. $match $group $facet $geoNear $graphLookup Pipeline Stages Mathematical $add, $abs, $substract, $multiply, $divide, $log, $log10, $stdDevPop, $stdDevSam, $avg, $sqrt, $pow, $sum, $zip, $convert, etc. Array $push, $reduce, $reverseArray, $addToSet, $arrayElemAt, $slice, etc. Operators $lookup $project $sort $unwind Conditionals $and, $or, $eq, $lt, $lte, $gt, $gte, $cmp, $cond, $switch, $in, etc Date $dateFromParts, $dateToParts, $dateFromString, $dateToString, $dayOfMonth, $isoWeek, $minute, $month, $year, etc. String $toUpper, $toLower, $substr, $strcasecmp, $concat, $split, etc. Laterals $exp, $let, $literal, $map, $type, etc Aggregation Features
  • 29. Developer / Data Analyst Tool Data management and manipulation document view table view Visual schema analyzer with query builder export to language Aggregation pipeline builder A good place to start
  • 30. Exercise 3: Building Powerful Aggregations Explore data by doing some basic finds Learn how to do aggregations you usually do in SQL $match $group Learn how to do aggregations you cannot do in SQL $objectToArray $graphlookup $geonear
  • 31. Find or Count Specific Documents 4. Find how many listings have a real bed and are greater than $100 but less than $200 per night by filling in the blanks db.___.find( { price: { ___: ___, $lt: ___ } } , { ___ : "Real Bed" }).count() db.austinListingsAndReviews.find( { "price": { $gt: 100, $lt: 200 } } , {"bed_type" : "Real Bed"}).count() Answer: 2892
  • 32. Minimum price per night by number of beds Take a look at this example aggregation which computes minimum price per night by number of beds db.austinListingsAndReviews.aggregate( [ { $group: { _id: "$beds", avg_by_beds: { $min: "$price" } } }, { $sort: { avg_by_beds: 1 } } ] )
  • 33. Average price per night by room type Construct a very similar query to find average price per night by room type by filling in the blanks. db.austinListingsAndReviews.aggregate( [ { $group: { _id: "___", avg_by_roomtype: { $avg: "___" } } }} ] )
  • 34. ANSWER: Average price per night by room type Construct a very similar query to find average price per night by room type by filling in the blanks. db.austinListingsAndReviews.aggregate( [ { $group: { _id: "$room_type", avg_by_roomtype: { $avg: "$price" } } }} ] )
  • 35. Average price per night by suburb Let’s practice some more. This time write the query from scratch to find the average price per night by suburb. Examine the document structure first. Hint: suburb is a field within a subdocument address; use dot notation to reference it.
  • 36. ANSWER: Average price per night by suburb db.austinListingsAndReviews.aggregate( [ { $group: { _id: "$address.suburb", avg_by_suburb: { $avg: "$price" } } } ] )
  • 37. Count and average price per night Compute the count of properties and average price per night by suburbs and number of bedrooms by filling in the blanks. db.austinListingsAndReviews.aggregate([ {"$group" : { _id:{suburb:"$___", bedrooms:"$___"}, ___:{$sum:1}, avg:{$___:"$price"}} } ])
  • 38. ANSWER: Count and average price per night db.austinListingsAndReviews.aggregate([ {"$group" : { _id:{suburb:"$address.suburb", bedrooms:"$bedrooms"}, count:{$sum:1}, avg:{$avg:"$price"}} } ])
  • 39. Add $match Stage Use the query from previous step and add a $match stage to only show those properties that have a pool db.austinListingsAndReviews.aggregate([ { $match : {amenities: "Pool"} }, {"$group" : { _id:{suburb:"$address.suburb",bedrooms:"$bedrooms"}, count:{$sum:1}, avg:{$avg:"$price"}} } ])
  • 40. Learn how to do aggregations you cannot do in SQL We want to determine average review scores for all properties in Austin based on some amenities. Remember that review_scores is a sub document. Luckily we have $objectToArray to transform our object into an array which we can then $unwind and $group.
  • 41. Average review scores for all properties in Austin with amenities db.austinListingsAndReviews.aggregate([ { $match : { amenities: { $in: ["Cable TV", "Garden or backyard", "Coffee maker"] } } }, { $project: { scores: { $objectToArray: "$review_scores" } } }, { $unwind: "$scores" }, { $group: { _id: "$scores.k", average_score: { $avg: "$scores.v" } } } ])
  • 42. Compare two neighbourhoods Compare two neighbourhoods - South Lamar and West Campus - and decide based on average review where you would rather stay. Assume you also want a place that has pool, wifi, and air-conditioning, under $200 Hint: You will need to look up how to use $and operator for this
  • 43. ANSWER: South Lamar db.austinListingsAndReviews.aggregate([ { $match: { $and: [ {amenities: { $in: ["Pool", "Air conditioning", "Wifi"] } }, {price: { $lt: 200 } }, {"address.suburb": "South Lamar"} ] } }, { $project: { scores: { $objectToArray: "$review_scores" } } }, { $unwind: "$scores" }, { $group: { _id: "$scores.k", average_score: { $avg: "$scores.v" } } } ])
  • 44. ANSWER: West Campus db.austinListingsAndReviews.aggregate([ { $match: { $and: [ {amenities: { $in: ["Pool", "Air conditioning", "Wifi"] } }, {price: { $lt: 200 } }, {"address.suburb": "West Campus"} ] } }, { $project: { scores: { $objectToArray: "$review_scores" } } }, { $unwind: "$scores" }, { $group: { _id: "$scores.k", average_score: { $avg: "$scores.v" } } } ])
  • 45. $geoNear Use $geoNear to find airbnbs near location of your choice (or near Alamo Drafthouse - 30.2988474,-97.7064386) and add the same search criteria we used in in the prior exercise to only show airbnbs that have pool, wifi, and air-conditioning, under $200
  • 46. ANSWER: $geoNear db.austinListingsAndReviews.aggregate([ { $geoNear: { near: { type: "Point", coordinates: [ -97.7064386 , 30.2988474 ] }, distanceField: "dist.calculated", maxDistance: 500, query: { $and: [ {amenities: { $in: ["Pool", "Air conditioning", "Wifi"] } }, {price: { $lt: 200 } }]}, includeLocs: "dist.location", num: 5, spherical: true } }]).pretty()
  • 47. $graphLookup Let’s say we want to build a recommendation engine based on users that reviewed the same properties. We can use $graphLookup to run a recursive query that says: “for a given reviewer, e.g. 7538 let’s find all listings reviewed by users who also reviewed the same listings as 7538”
  • 48. $graphLookup - example db.austinListingsAndReviews.aggregate( [ { $match: { "reviews.reviewer_id": "7538" } } ,{ $unwind: "$reviews" },{ $graphLookup: { from: "austinListingsAndReviews", startWith: "$reviews.reviewer_id", connectFromField: "reviews.reviewer_id", connectToField: "reviews.reviewer_id", as: "reviewed_listings", maxDepth: 1, depthField: "depth", restrictSearchWithMatch: { "reviews.reviewer_id": { $ne: "7538" } } }}, { $project: { _id: 0, listing: "$_id", url: "$listing_url" }}]).pretty()
  • 49. Now try $graphlookup yourself Some host in Austin have multiple listings, for example, host_id:100208662. Construct a $graphLookup query that finds all the listings by that host.
  • 50. ANSWER: $graphlookup db.austinListingsAndReviews.aggregate( [ { $match: { "host.host_id": "100208662" } }, { $graphLookup: { from: "austinListingsAndReviews", startWith: "$host.host_id", connectFromField: "host.host_id", connectToField: "host.host_id", as: "multi-listing-hosts" } }]).pretty()
  • 51. Exploratory data analysis Data preparation & basic manipulation Data ingestion via JSON or CSV import Day-to-day development/operations Adding and understanding indexes Adding validation rules Authoring & troubleshooting aggregation pipelines Viewing real-time server stats 10,000 → 1ft view of data
  • 53. Business Intelligence, Analytics, Machine Learning Process data in MongoDB with the massive parallelism of Spark, it's machine learning libraries, and streaming API ● Process data “in place”, avoiding the latency otherwise required by an incremental ETL task. ● Reduced Operational Complexity and Faster Time-To-Analytics ● Aggregation pre-filtering in combination with secondary indexing means that an analytics query only draws that data required ● Multiple Language APIs
  • 54. JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON Business Intelligence, Analytics, Machine Learning Process data in MongoDB with the massive parallelism of Spark, it's machine learning libraries, and streaming API ● Process data “in place”, avoiding the latency otherwise required by an incremental ETL task ● Aggregation pre-filtering in combination with secondary indexing means that an analytics query only draws that data required ● Reads from secondaries isolate analytics workload from business critical operations ● Shard aware for data locality
  • 55. WRITE READ Primary 2ndary 2ndary Business Intelligence, Analytics, Machine Learning Process data in MongoDB with the massive parallelism of Spark, it's machine learning libraries, and streaming API ● Process data “in place”, avoiding the latency otherwise required by an incremental ETL task ● Aggregation pre-filtering in combination with secondary indexing means that an analytics query only draws that data required ● Reads from secondaries isolate analytics workload from business critical operations ● Shard aware for data locality
  • 56. Business Intelligence, Analytics, Machine Learning Process data in MongoDB with the massive parallelism of Spark, it's machine learning libraries, and streaming API ● Process data “in place”, avoiding the latency otherwise required by an incremental ETL task ● Aggregation pre-filtering in combination with secondary indexing means that an analytics query only draws that data required ● Reads from secondaries isolate analytics workload from business critical operations ● Multiple language APIs
  • 58. Using MongoDB Spark Connector ./spark-2.3.0/bin/spark-shell --conf "spark.mongodb.input.uri=mongodb+srv://demo:demo@dataanalyticsworks hop-sshrq.mongodb.net/recommendation.ratings" --conf "spark.mongodb.output.uri=mongodb+srv://demo:demo@dataanalyticswork shop-sshrq.mongodb.net/recommendation.perUser" --packages org.mongodb.spark:mongo-spark-connector_2.11:2.3.0
  • 59. Exercise 4: MongoDB Spark Connector https://community.cloud.databricks.com/?o=2374217986360057#notebook/42327714871690/command/42327714871707 Databricks
  • 60. Whats Next? ● Github: https://github.com/breinero/workshop ● M121: The MongoDB Aggregation Framework - https://university.mongodb.com/courses/M121/about ● Videos ○ https://www.mongodb.com/presentations/world18-data-analytics-with-mongodb ○ Charts - https://www.mongodb.com/presentations/world18-bringing-data-to-life-with-m ongodb-charts ○ Spark - https://www.mongodb.com/presentations/analytics-and-machine-learning-with- spark-and-mongodb-bryan-reinero