SlideShare a Scribd company logo
www.centralinventions.com
Building Your First Data
Science App in MongoDB
Robyn Allen | @enrobyn | MDBW 2017
Outline
Motivation
Math app summary
Database schema
PyMongo queries
Math Literacy
+
Data Literacy
Math app goals
Create a framework for student-accessible data science
Improve STEM learning outcomes
Increase math literacy!!!!!
screenshot
Data
Difficulty
Speed of response
Timestamp
Result
Data
Difficulty
Speed of response
Timestamp
Result
Science
aptitude?
confidence?
fatigue?
improvement?
Is 2073 divisible by 3?
operand1
operator
operand2Is 2073 divisible by 3?
operand1: 2073
operator: "%"
operand2: 3
operand1: 2073
operator: "%"
operand2: 3
user_guess
correct
schema
detail
{
...,
{
"operand1" : 2073,
"user_guess" : false,
"correct" : false,
"operand2" : 3,
"start_time": NumberLong("1497055796831"),
"operator" : "%",
"end_time" : NumberLong("1497055798985")
},
...
}
{
...,
"all_problems" : [
{
"operand1" : 2073,
"user_guess" : false,
"correct" : false,
"operand2" : 3,
"start_time": NumberLong("1497055796831"),
"operator" : "%",
"end_time" : NumberLong("1497055798985")
},
...
schema
detail
{
...,
"all_problems" : [
{
"operand1" : 2073,
"user_guess" : false,
"correct" : false,
"operand2" : 3,
"start_time": NumberLong("1497055796831"),
"operator" : "%",
"end_time" : NumberLong("1497055798985")
},
{
"operand1" : 77,
"correct" : true,
"user_guess" : false,
"operand2" : 4,
"start_time" : NumberLong("1497055827450"),
"operator" : "%",
"end_time" : NumberLong("1497055828629")
},
{
"all_problems" : [{
"operand1" : 14,
"correct" : true,
"user_guess" : false,
"operand2" : 5,
"start_time" : NumberLong("1497055834697"),
"operator" : "%",
"end_time" : NumberLong("1497055835953")
},
{
"operand1" : 24,
"correct" : true,
"user_guess" : true,
"operand2" : 2,
"start_time" : NumberLong("1497055828630"),
"operator" : "%",
"end_time" : NumberLong("1497055830491")
},
{
"operand1" : 69,
"correct" : true,
"user_guess" : false,
"operand2" : 2,
"start_time" : NumberLong("1497055824300"),
"operator" : "%",
"end_time" : NumberLong("1497055825997")
},
{
"operand1" : 26,
"correct" : true,
"user_guess" : false,
"operand2" : 5,
"start_time" : NumberLong("1497055796831"),
"operator" : "%",
"end_time" : NumberLong("1497055798985")
},
{
"operand1" : 67,
"correct" : true,
"user_guess" : false,
"operand2" : 2,
"start_time" : NumberLong("1497055814628"),
"operator" : "%",
"end_time" : NumberLong("1497055816652")
},
{
"operand1" : 31,
"correct" : true,
"user_guess" : false,
"operand2" : 4,
"start_time" : NumberLong("1497055802959"),
"operator" : "%",
"end_time" : NumberLong("1497055804802")
}
],
...
{
"_id" : ObjectId("593b42366523ec06eed182b9"),
"session_start" : NumberLong("1497055796716"),
"uuid" : "urn:uuid:55e72720-c0b1-4e81-89d6-ac1896b06661",
"subtopic" : "divisibility superpowers"
"all_problems" : [{
"operand1" : 14,
"correct" : true,
"user_guess" : false,
"operand2" : 5,
"start_time" : NumberLong("1497055834697"),
"operator" : "%",
"end_time" : NumberLong("1497055835953")
},
{
"operand1" : 24,
"correct" : true,
"user_guess" : true,
"operand2" : 2,
"start_time" : NumberLong("1497055828630"),
"operator" : "%",
"end_time" : NumberLong("1497055830491")
},
{
"operand1" : 69,
"correct" : true,
"user_guess" : false,
"operand2" : 2,
"start_time" : NumberLong("1497055824300"),
"operator" : "%",
"end_time" : NumberLong("1497055825997")
},
{
"operand1" : 26,
"correct" : true,
"user_guess" : false,
"operand2" : 5,
"start_time" : NumberLong("1497055796831"),
"operator" : "%",
"end_time" : NumberLong("1497055798985")
},
{
"operand1" : 67,
"correct" : true,
"user_guess" : false,
"operand2" : 2,
"start_time" : NumberLong("1497055814628"),
"operator" : "%",
"end_time" : NumberLong("1497055816652")
},
{
"operand1" : 31,
"correct" : true,
"user_guess" : false,
"operand2" : 4,
"start_time" : NumberLong("1497055802959"),
"operator" : "%",
"end_time" : NumberLong("1497055804802")
}
],
{
"_id" : ObjectId("593b42366523ec06eed182b9"),
"session_start" : NumberLong("1497055796716"),
"uuid" : "urn:uuid:55e72720-c0b1-4e81-89d6-ac1896b06661",
"subtopic" : "divisibility superpowers"
"all_problems" : [{
"operand1" : 14,
"correct" : true,
"user_guess" : false,
"operand2" : 5,
"start_time" : NumberLong("1497055834697"),
"operator" : "%",
"end_time" : NumberLong("1497055835953")
},
{
"operand1" : 24,
"correct" : true,
"user_guess" : true,
"operand2" : 2,
"start_time" : NumberLong("1497055828630"),
"operator" : "%",
"end_time" : NumberLong("1497055830491")
},
{
"operand1" : 69,
"correct" : true,
"user_guess" : false,
"operand2" : 2,
"start_time" : NumberLong("1497055824300"),
"operator" : "%",
"end_time" : NumberLong("1497055825997")
},
{
"operand1" : 26,
"correct" : true,
"user_guess" : false,
"operand2" : 5,
"start_time" : NumberLong("1497055796831"),
"operator" : "%",
"end_time" : NumberLong("1497055798985")
},
{
"operand1" : 67,
"correct" : true,
"user_guess" : false,
"operand2" : 2,
"start_time" : NumberLong("1497055814628"),
"operator" : "%",
"end_time" : NumberLong("1497055816652")
},
{
"operand1" : 31,
"correct" : true,
"user_guess" : false,
"operand2" : 4,
"start_time" : NumberLong("1497055802959"),
"operator" : "%",
"end_time" : NumberLong("1497055804802")
}
],
{
"_id" : ObjectId("593b42366523ec06eed182b9"),
"session_start" : NumberLong("1497055796716"),
"uuid" : "urn:uuid:55e72720-c0b1-4e81-89d6-ac1896b06661",
"subtopic" : "divisibility superpowers"
"all_problems" : [{
"operand1" : 14,
"correct" : true,
"user_guess" : false,
"operand2" : 5,
"start_time" : NumberLong("1497055834697"),
"operator" : "%",
"end_time" : NumberLong("1497055835953")
},
{
"operand1" : 24,
"correct" : true,
"user_guess" : true,
"operand2" : 2,
"start_time" : NumberLong("1497055828630"),
"operator" : "%",
"end_time" : NumberLong("1497055830491")
},
{
"operand1" : 69,
"correct" : true,
"user_guess" : false,
"operand2" : 2,
"start_time" : NumberLong("1497055824300"),
"operator" : "%",
"end_time" : NumberLong("1497055825997")
},
{
"operand1" : 26,
"correct" : true,
"user_guess" : false,
"operand2" : 5,
"start_time" : NumberLong("1497055796831"),
"operator" : "%",
"end_time" : NumberLong("1497055798985")
},
{
"operand1" : 67,
"correct" : true,
"user_guess" : false,
"operand2" : 2,
"start_time" : NumberLong("1497055814628"),
"operator" : "%",
"end_time" : NumberLong("1497055816652")
},
{
"operand1" : 31,
"correct" : true,
"user_guess" : false,
"operand2" : 4,
"start_time" : NumberLong("1497055802959"),
"operator" : "%",
"end_time" : NumberLong("1497055804802")
}
],
MongoDB quick-look
MongoDB is a NoSQL database
Data is stored in documents
The schema can change! (even between documents)
PyMongo is the recommended Python driver
Document model
Database
Collection
Document(s)
Collection
Document(s)
...
Document model
Database
Collection
Document(s)
Collection
Document(s)
documents < collections < databases
from pymongo import MongoClient
# SET UP THE CONNECTION
client = MongoClient("localhost", 27017)
db = client["aprender"]
mathcards = client["mathcards"]
users = client["users"]
collections
from pymongo import MongoClient
from secure import MONGO_USERNAME, MONGO_PASSWORD
# SET UP THE CONNECTION
client = MongoClient("localhost", 27017)
db = client["aprender"]
mathcards = client["mathcards"]
users = client["users"]
# AUTHENTICATE THE CONNECTION
client.aprender.authenticate(MONGO_USERNAME,
MONGO_PASSWORD,
mechanism='SCRAM-SHA-1')
# find_one() returns one mathcard --> DICTIONARY
this_card = db.mathcards.find_one()
# find_one() returns one mathcard --> DICTIONARY
this_card = db.mathcards.find_one()
# find() returns 1+ mathcard(s) --> CURSOR
all_cards = db.mathcards.find()
# find_one() returns one mathcard --> DICTIONARY
this_card = db.mathcards.find_one()
# find() returns 1+ mathcard(s) --> CURSOR
all_cards = db.mathcards.find()
for card in all_cards:
for key in card.keys():
problem_data = card[key]
do_some_stuff(problem_data)
Queries, projections, etc. are documents
A document is like a Python dictionary
Example:
{ "uuid": some_uuid }
A document is like a Python dictionary
Example:
{ "uuid": some_uuid }
Usage:
.find({ "uuid": some_uuid })
Queries, projections, etc. are documents
# get data from a certain user
some_uuid = "urn:uuid:3f810ea0-3d27-43cc-87d7-0501635b3000"
my_data = db.mathcards.find(
{ "uuid": some_uuid }
)
'''cards greater than or equal to a certain
timestamp
'''
todays_cards = db.mathcards.find(
{ "session_start":
{ "$gte": 1493753538942 }
} )
AGGREGATION PIPELINES
MongoDB Aggregation Pipelines
A list of one or more stages
Very similar to UNIX pipes
Documents pass from one stage to the next
OVERALL CONCEPT OF THE AGGREGATION PIPELINE
RESULTS OF INTEREST
SOME DOCUMENTS
$group
ALL DOCUMENTS
$match
RESULTS
# task1a
all docs docs w/ specified
start time
result: the total
number of docs which
entered this stage
$count
$match
$project
$match
all docs docs w/ specified
start time
result: docs w/ new
info (number of probs
solved, by session)
# task2a
$match example
pipeline1 = [
{"$match": { "session_start": { "$gte":
this_morning } } },
]
name of key criteria
.aggregate() syntax
cursor1 = db.mathcards.aggregate(pipeline1)
for doc in cursor1:
print(doc)
Let’s code!!!
github.com/enrobyn/pymongo-tutorial
$count example
pipeline1a = [
{"$match": { "session_start": { "$gte":
this_morning } } },
{"$count": "total_sessions_today" }
]
# task1a
$project example
pipeline1b = [
{"$match": { "session_start": { "$gte":
this_morning } } },
{"$project": {"session_start": 1}}
]
# task1b
name of key
display flag
$project
pipeline2a = [
{"$match": { "session_start": { "$gte":
this_morning } } },
{"$project": {"session_probs" :
{"$size": "$___________"} } }
]
# task2a
$project
pipeline2a = [
{"$match": { "session_start": { "$gte":
this_morning } } },
{"$project": {"session_probs" :
{"$size": "$___________"} } }
]
# task2a
name of the list of problems
$project
pipeline2a = [
{"$match": { "session_start": { "$gte":
this_morning } } },
{"$project": {"session_probs" :
{"$size": "$all_problems"} } }
]
# task2a
$all_problems
$group
pipeline2b = [
{"$match" : {"session_start" : {"$gte" :
this_morning } } },
{"$project" : {"session_probs" : {"$size" :
"$all_problems"} } },
{"$group" : {
"_id" : None,
"total_problems" : {"$sum" :
"_____________"}
}}
]
# task2b
$group
pipeline2b = [
{"$match" : {"session_start" : {"$gte" :
this_morning } } },
{"$project" : {"session_probs" : {"$size" :
"$all_problems"} } },
{"$group" : {
"_id" : None,
"total_problems" : {"$sum" :
"$session_probs"}
}}
]
# task2b
$group
pipeline2b = [
{"$project" : {"session_probs" : {"$size" :
"$all_problems"} } },
{"$group" : {
"_id" : None,
"total_problems" : {"$sum" :
"$session_probs"}
}}
]
# task3
$avg
pipeline4 = [
{"$match" : {"session_start" : {"$gte" : this_morning } } },
{"$project" : {"session_probs" : {"$size" : "$all_problems"} } },
{"$group" : {
"_id" : None,
"avg_num_probs" : {"______": "__________"}
}}
]
# task4
?
$avg
pipeline4 = [
{"$match" : {"session_start" : {"$gte" : this_morning } } },
{"$project" : {"session_probs" : {"$size" : "$all_problems"} } },
{"$group" : {
"_id" : None,
"avg_num_probs" : {"$avg": "$session_probs"}
}}
]
# task4
$stdDevSamp
pipeline4 = [
{"$match" : {"session_start" : {"$gte" : this_morning } } },
{"$project" : {"session_probs" : {"$size" : "$all_problems"} } },
{"$group" : {
"_id" : None,
"std_dev_num_probs" :
{"_________": "_________"}
}}
]
# task5
?
$stdDevSamp
pipeline4 = [
{"$match" : {"session_start" : {"$gte" : this_morning } } },
{"$project" : {"session_probs" : {"$size" : "$all_problems"} } },
{"$group" : {
"_id" : None,
"std_dev_num_probs" :
{"$stdDevSamp": "$session_probs"}
}}
]
# task5
Individual work time
Search for tasks in the .py file
Take a moment to write one or more pipeline stages
Check end of file comments if stuck
Multi-stage aggregation pipelines
task6: Response time by operand2 [2,3,4,5,6,9] for one user
task7: Percent accuracy (“score”) by operand2 for one user
task8: Retrieve, for one user, operand2 w/ lowest score
task9: Retrieve, for one user, operand2 w/ fastest time
task10: Retrieve operand2 which challenged the most users
$match
# task6
all docs docs w/ a
certain uuid
$match
# task6
all docs docs w/ a
certain uuid
$unwind
$match
# task6
all docs docs w/ a
certain uuid
now every array
element in
all_problems
is a doc!
$unwind
$match
# task6
all docs docs w/ a
certain uuid
now every array
element in
all_problems
is a doc!
$project
$unwind
$match
# task6
all docs docs w/ a
certain uuid
now every array
element in
all_problems
is a doc!
add two fields
to the docs
(suppress
others)
$group
$project
$unwind
$match
# task6
all docs docs w/ a
certain uuid
now every array
element in
all_problems
is a doc!
add two fields
to the docs
(suppress
others)
RESULTS
$group
$project
$unwind
$match
# task6
all docs docs w/ a
certain uuid
now every array
element in
all_problems
is a doc!
add two fields
to the docs
(suppress
others)
group by
operand2,
get time spent
$unwind # task6
pipeline6 = [
{ "$match" : # $match on uuid_of_interest
{ "$unwind" : # $unwind array of problems
{ "$project": {
"operand2": # use dot notation
"time_spent": # compute time spent
},
{ "$group":{
"_id": # group on operand2
"avg_time_spent": # compute $avg
}
}
]
$unwind # task6pipeline6 = [
{ "$match" : {"uuid" : uuid_of_interest } },
{ "$unwind" : "$all_problems" },
{ "$project": {
"operand2": "$all_problems.operand2",
"time_spent": {"$subtract": ["$all_problems.end_time",
"$all_problems.start_time"]},
"session_start":1,
"_id":0}
},
{ "$group":{
"_id": {"operand2": "$operand2"},
"avg_time_spent": {"$avg": "$time_spent"},
}
}
]
$addFields # task7
pipeline7a = [
{ "$match" : ...
{ "$unwind" : ...
{ "$group":{
"_id": # $group on operand2
"total_attempted": # $sum
"total_correct":
}
},
{ "$addFields":{
"percent_accuracy":
}
}
]
$group for task7 # task7
{"$group":{
"_id": "$all_problems.operand2",
"total_attempted": {"$sum":1},
"total_correct":
{"$sum": { "$cond":
["$all_problems.correct", 1, 0]
} }
}
}
$addFields for task7 # task7
{"$addFields":{
"percent_accuracy": {"$divide":
["$total_correct",
"$total_attempted"]
}
}
}
task8 hints # task8
We want to find the operand2
which had the lowest score...
What stage(s) could you add to
pipeline7 in order to solve this?
$sort and $limit # task8
pipeline7.extend([
{"$sort": {"percent_accuracy": 1}},
{"$limit": 1}
])
task9 hints # task9
We want to find the operand2
which had the fastest time...
What stage(s) could you add to
pipeline6 in order to solve this?
$sort and $limit # task9
pipeline6.extend([
{"$sort":
{"avg_time_spent": 1}},
{"$limit": 1}
]}
Check out pipeline7a... # task10
Hint: Add a $sort stage to 7a
Check out pipeline7a... # task10
Hint: Add a $sort stage to 7a
{"$sort" : {"percent_accuracy": 1} }
Results! # task10
{'percent_accuracy': 0.7724425887265136, 'total_correct': 740,
'total_attempted': 958, '_id': {'op2': 3}}
{'percent_accuracy': 0.8316151202749141, 'total_correct': 726,
'total_attempted': 873, '_id': {'op2': 6}}
{'percent_accuracy': 0.8428731762065096, 'total_correct': 751,
'total_attempted': 891, '_id': {'op2': 9}}
{'percent_accuracy': 0.8562564632885212, 'total_correct': 828,
'total_attempted': 967, '_id': {'op2': 4}}
{'percent_accuracy': 0.9286510590858417, 'total_correct': 833,
'total_attempted': 897, '_id': {'op2': 5}}
{'percent_accuracy': 0.9333333333333333, 'total_correct': 882,
'total_attempted': 945, '_id': {'op2': 2}}
Results! # task10
{'percent_accuracy': 0.7724425887265136, 'total_correct': 740,
'total_attempted': 958, '_id': {'op2': 3}}
{'percent_accuracy': 0.8316151202749141, 'total_correct': 726,
'total_attempted': 873, '_id': {'op2': 6}}
{'percent_accuracy': 0.8428731762065096, 'total_correct': 751,
'total_attempted': 891, '_id': {'op2': 9}}
{'percent_accuracy': 0.8562564632885212, 'total_correct': 828,
'total_attempted': 967, '_id': {'op2': 4}}
{'percent_accuracy': 0.9286510590858417, 'total_correct': 833,
'total_attempted': 897, '_id': {'op2': 5}}
{'percent_accuracy': 0.9333333333333333, 'total_correct': 882,
'total_attempted': 945, '_id': {'op2': 2}}
Results! # task10
{'percent_accuracy': 0.7724425887265136, 'total_correct': 740,
'total_attempted': 958, '_id': {'op2': 3}}
{'percent_accuracy': 0.8316151202749141, 'total_correct': 726,
'total_attempted': 873, '_id': {'op2': 6}}
{'percent_accuracy': 0.8428731762065096, 'total_correct': 751,
'total_attempted': 891, '_id': {'op2': 9}}
{'percent_accuracy': 0.8562564632885212, 'total_correct': 828,
'total_attempted': 967, '_id': {'op2': 4}}
{'percent_accuracy': 0.9286510590858417, 'total_correct': 833,
'total_attempted': 897, '_id': {'op2': 5}}
{'percent_accuracy': 0.9333333333333333, 'total_correct': 882,
'total_attempted': 945, '_id': {'op2': 2}}
Conclusion
PyMongo = easy to learn
You can learn PyMongo
The aggregation pipeline enables you to run data science code
efficiently on your database servers without needing to move any
data
DATA LITERACY
THANK YOU
Resources
Asya Kamsky's talk! 4:30PM WED. in Grand Ballroom
“Powerful Analysis with the Aggregation Pipeline”
MongoDB University (free!)
https://university.mongodb.com/
Aggregation Pipeline Quick Reference
https://docs.mongodb.com/manual/meta/aggregation-quick-
reference/
MongoDB Day-long conferences
Operators useful for aggregation pipelines
$sort
$group
$map
$addFields
$let
$cond
$min
$max
$unwind
$limit
$project
$match
$push
$addToSet
$first
$sum
$eq
$divide
$multiply
$gt $lt $gte $lte
$cond
{"$cond":{
"if": {"$and":[ {"$eq": ["$$nth.label", "tryHarder"]},
{"$eq": ["$$nth.user_response", "yes"]},
{"$eq": ["$$nplus1.label", "tryHarder"]},
{"$eq": ["$$nplus1.user_response", "yes"]}
]},
"then": True,
"else": False
}}
$cond
{"$cond":{
"if": {"$and":[ {"$eq": ["$$nth.label", "tryHarder"]},
{"$eq": ["$$nth.user_response", "yes"]},
{"$eq": ["$$nplus1.label", "tryHarder"]},
{"$eq": ["$$nplus1.user_response", "yes"]}
]},
"then": True,
"else": False
}}
{"$eq": ["$$nth.label", "tryHarder"]},
{"$eq": ["$$nth.user_response", "yes"]},
{"$eq": ["$$nplus1.label", "tryHarder"]},
{"$eq": ["$$nplus1.user_response", "yes"]}
$eq
{"$and": [ {},{},{},{} ]}
$and
{
"_id" : ObjectId("593b42366523ec06eed182b9"),
"session_start" : NumberLong("1497055796716"),
"uuid" : "urn:uuid:55e72720-c0b1-4e81-89d6-ac1896b06661",
"subtopic" : "divisibility superpowers"
"all_problems" : [...],
"interventions" : [
{
"deploy_time" : NumberLong("1497055817986"),
"label" : "tryHarder",
"user_response" : "no",
"dismiss_time" : 2939
}
}
71
Sample
intervention

More Related Content

What's hot

MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB
 
Hack ASP.NET website
Hack ASP.NET websiteHack ASP.NET website
Hack ASP.NET website
Positive Hack Days
 
201913001 khairunnisa progres_harian
201913001 khairunnisa progres_harian201913001 khairunnisa progres_harian
201913001 khairunnisa progres_harian
KhairunnisaPekanbaru
 
Fia fabila
Fia fabilaFia fabila
Fia fabila
fiafabila
 
Tugas 2
Tugas 2Tugas 2
Tugas 2
Novi_Wahyuni
 
Qt Rest Server
Qt Rest ServerQt Rest Server
Qt Rest Server
Vasiliy Sorokin
 
Detection of REST Patterns and Antipatterns: A Heuristics-based Approach
Detection of REST Patterns and Antipatterns: A Heuristics-based ApproachDetection of REST Patterns and Antipatterns: A Heuristics-based Approach
Detection of REST Patterns and Antipatterns: A Heuristics-based Approach
Francis Palma
 
BlockChain implementation by python
BlockChain implementation by pythonBlockChain implementation by python
BlockChain implementation by python
wonyong hwang
 
WebCamp: Developer Day: Web Security: Cookies, Domains and CORS - Юрий Чайков...
WebCamp: Developer Day: Web Security: Cookies, Domains and CORS - Юрий Чайков...WebCamp: Developer Day: Web Security: Cookies, Domains and CORS - Юрий Чайков...
WebCamp: Developer Day: Web Security: Cookies, Domains and CORS - Юрий Чайков...
GeeksLab Odessa
 
Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018
Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018
Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018
Codemotion
 
Hatohol technical-brief-20130830-hbstudy
Hatohol technical-brief-20130830-hbstudyHatohol technical-brief-20130830-hbstudy
Hatohol technical-brief-20130830-hbstudy
koedoyoshida
 
Mug17 gurgaon
Mug17 gurgaonMug17 gurgaon
Mug17 gurgaon
Ankur Raina
 
Node.js - async for the rest of us.
Node.js - async for the rest of us.Node.js - async for the rest of us.
Node.js - async for the rest of us.
Mike Brevoort
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
Puneet Behl
 
MongoDB World 2016: Deciphering .explain() Output
MongoDB World 2016: Deciphering .explain() OutputMongoDB World 2016: Deciphering .explain() Output
MongoDB World 2016: Deciphering .explain() Output
MongoDB
 
Concept of BlockChain & Decentralized Application
Concept of BlockChain & Decentralized ApplicationConcept of BlockChain & Decentralized Application
Concept of BlockChain & Decentralized Application
Seiji Takahashi
 
Malcon2017
Malcon2017Malcon2017
Webinar slides: How to Secure MongoDB with ClusterControl
Webinar slides: How to Secure MongoDB with ClusterControlWebinar slides: How to Secure MongoDB with ClusterControl
Webinar slides: How to Secure MongoDB with ClusterControl
Severalnines
 
Beyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeBeyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the code
Wim Godden
 
Best Practices in Handling Performance Issues
Best Practices in Handling Performance IssuesBest Practices in Handling Performance Issues
Best Practices in Handling Performance Issues
Odoo
 

What's hot (20)

MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
 
Hack ASP.NET website
Hack ASP.NET websiteHack ASP.NET website
Hack ASP.NET website
 
201913001 khairunnisa progres_harian
201913001 khairunnisa progres_harian201913001 khairunnisa progres_harian
201913001 khairunnisa progres_harian
 
Fia fabila
Fia fabilaFia fabila
Fia fabila
 
Tugas 2
Tugas 2Tugas 2
Tugas 2
 
Qt Rest Server
Qt Rest ServerQt Rest Server
Qt Rest Server
 
Detection of REST Patterns and Antipatterns: A Heuristics-based Approach
Detection of REST Patterns and Antipatterns: A Heuristics-based ApproachDetection of REST Patterns and Antipatterns: A Heuristics-based Approach
Detection of REST Patterns and Antipatterns: A Heuristics-based Approach
 
BlockChain implementation by python
BlockChain implementation by pythonBlockChain implementation by python
BlockChain implementation by python
 
WebCamp: Developer Day: Web Security: Cookies, Domains and CORS - Юрий Чайков...
WebCamp: Developer Day: Web Security: Cookies, Domains and CORS - Юрий Чайков...WebCamp: Developer Day: Web Security: Cookies, Domains and CORS - Юрий Чайков...
WebCamp: Developer Day: Web Security: Cookies, Domains and CORS - Юрий Чайков...
 
Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018
Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018
Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018
 
Hatohol technical-brief-20130830-hbstudy
Hatohol technical-brief-20130830-hbstudyHatohol technical-brief-20130830-hbstudy
Hatohol technical-brief-20130830-hbstudy
 
Mug17 gurgaon
Mug17 gurgaonMug17 gurgaon
Mug17 gurgaon
 
Node.js - async for the rest of us.
Node.js - async for the rest of us.Node.js - async for the rest of us.
Node.js - async for the rest of us.
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
MongoDB World 2016: Deciphering .explain() Output
MongoDB World 2016: Deciphering .explain() OutputMongoDB World 2016: Deciphering .explain() Output
MongoDB World 2016: Deciphering .explain() Output
 
Concept of BlockChain & Decentralized Application
Concept of BlockChain & Decentralized ApplicationConcept of BlockChain & Decentralized Application
Concept of BlockChain & Decentralized Application
 
Malcon2017
Malcon2017Malcon2017
Malcon2017
 
Webinar slides: How to Secure MongoDB with ClusterControl
Webinar slides: How to Secure MongoDB with ClusterControlWebinar slides: How to Secure MongoDB with ClusterControl
Webinar slides: How to Secure MongoDB with ClusterControl
 
Beyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeBeyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the code
 
Best Practices in Handling Performance Issues
Best Practices in Handling Performance IssuesBest Practices in Handling Performance Issues
Best Practices in Handling Performance Issues
 

Similar to Building Your First Data Science Applicatino in MongoDB

2011 05-23 metrics-agilasverige-english
2011 05-23 metrics-agilasverige-english2011 05-23 metrics-agilasverige-english
2011 05-23 metrics-agilasverige-english
Mårten Gustafson
 
Mastering Spring Boot's Actuator with Madhura Bhave
Mastering Spring Boot's Actuator with Madhura BhaveMastering Spring Boot's Actuator with Madhura Bhave
Mastering Spring Boot's Actuator with Madhura Bhave
VMware Tanzu
 
Map reduce: beyond word count
Map reduce: beyond word countMap reduce: beyond word count
Map reduce: beyond word count
Jeff Patti
 
JavaScript Refactoring
JavaScript RefactoringJavaScript Refactoring
JavaScript Refactoring
Krzysztof Szafranek
 
Cs291 assignment solution
Cs291 assignment solutionCs291 assignment solution
Cs291 assignment solution
Kuntal Bhowmick
 
Tt subtemplates-caching
Tt subtemplates-cachingTt subtemplates-caching
Tt subtemplates-caching
Valeriy Studennikov
 
A miało być tak... bez wycieków
A miało być tak... bez wyciekówA miało być tak... bez wycieków
A miało być tak... bez wycieków
Konrad Kokosa
 
Splunk conf2014 - Lesser Known Commands in Splunk Search Processing Language ...
Splunk conf2014 - Lesser Known Commands in Splunk Search Processing Language ...Splunk conf2014 - Lesser Known Commands in Splunk Search Processing Language ...
Splunk conf2014 - Lesser Known Commands in Splunk Search Processing Language ...
Splunk
 
Lab manual operating system [cs 502 rgpv] (usefulsearch.org) (useful search)
Lab manual operating system [cs 502 rgpv] (usefulsearch.org)  (useful search)Lab manual operating system [cs 502 rgpv] (usefulsearch.org)  (useful search)
Lab manual operating system [cs 502 rgpv] (usefulsearch.org) (useful search)
Make Mannan
 
CSE 103 Project Presentation.pptx
CSE 103 Project Presentation.pptxCSE 103 Project Presentation.pptx
CSE 103 Project Presentation.pptx
TasnimSaimaRaita
 
Apache Kafka® 102 - Applied
Apache Kafka® 102 -  AppliedApache Kafka® 102 -  Applied
Apache Kafka® 102 - Applied
confluent
 
AngularJS and SPA
AngularJS and SPAAngularJS and SPA
AngularJS and SPA
Lorenzo Dematté
 
Insurance Optimization
Insurance OptimizationInsurance Optimization
Insurance Optimization
Albert Chu
 
E.D.D.I - Open Source Chatbot Platform
E.D.D.I - Open Source Chatbot PlatformE.D.D.I - Open Source Chatbot Platform
E.D.D.I - Open Source Chatbot Platform
Gregor Jarisch
 
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
Lucidworks
 
Password protected personal diary report
Password protected personal diary reportPassword protected personal diary report
Password protected personal diary report
Moueed Ahmed
 
Session 6 sv_randomization
Session 6 sv_randomizationSession 6 sv_randomization
Session 6 sv_randomization
Nirav Desai
 
Programming with Python and PostgreSQL
Programming with Python and PostgreSQLProgramming with Python and PostgreSQL
Programming with Python and PostgreSQL
Peter Eisentraut
 
Mgd08 lab01
Mgd08 lab01Mgd08 lab01
Mgd08 lab01
Hock Leng PUAH
 
Employee management system report
Employee management system reportEmployee management system report
Employee management system report
Prince Singh
 

Similar to Building Your First Data Science Applicatino in MongoDB (20)

2011 05-23 metrics-agilasverige-english
2011 05-23 metrics-agilasverige-english2011 05-23 metrics-agilasverige-english
2011 05-23 metrics-agilasverige-english
 
Mastering Spring Boot's Actuator with Madhura Bhave
Mastering Spring Boot's Actuator with Madhura BhaveMastering Spring Boot's Actuator with Madhura Bhave
Mastering Spring Boot's Actuator with Madhura Bhave
 
Map reduce: beyond word count
Map reduce: beyond word countMap reduce: beyond word count
Map reduce: beyond word count
 
JavaScript Refactoring
JavaScript RefactoringJavaScript Refactoring
JavaScript Refactoring
 
Cs291 assignment solution
Cs291 assignment solutionCs291 assignment solution
Cs291 assignment solution
 
Tt subtemplates-caching
Tt subtemplates-cachingTt subtemplates-caching
Tt subtemplates-caching
 
A miało być tak... bez wycieków
A miało być tak... bez wyciekówA miało być tak... bez wycieków
A miało być tak... bez wycieków
 
Splunk conf2014 - Lesser Known Commands in Splunk Search Processing Language ...
Splunk conf2014 - Lesser Known Commands in Splunk Search Processing Language ...Splunk conf2014 - Lesser Known Commands in Splunk Search Processing Language ...
Splunk conf2014 - Lesser Known Commands in Splunk Search Processing Language ...
 
Lab manual operating system [cs 502 rgpv] (usefulsearch.org) (useful search)
Lab manual operating system [cs 502 rgpv] (usefulsearch.org)  (useful search)Lab manual operating system [cs 502 rgpv] (usefulsearch.org)  (useful search)
Lab manual operating system [cs 502 rgpv] (usefulsearch.org) (useful search)
 
CSE 103 Project Presentation.pptx
CSE 103 Project Presentation.pptxCSE 103 Project Presentation.pptx
CSE 103 Project Presentation.pptx
 
Apache Kafka® 102 - Applied
Apache Kafka® 102 -  AppliedApache Kafka® 102 -  Applied
Apache Kafka® 102 - Applied
 
AngularJS and SPA
AngularJS and SPAAngularJS and SPA
AngularJS and SPA
 
Insurance Optimization
Insurance OptimizationInsurance Optimization
Insurance Optimization
 
E.D.D.I - Open Source Chatbot Platform
E.D.D.I - Open Source Chatbot PlatformE.D.D.I - Open Source Chatbot Platform
E.D.D.I - Open Source Chatbot Platform
 
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
 
Password protected personal diary report
Password protected personal diary reportPassword protected personal diary report
Password protected personal diary report
 
Session 6 sv_randomization
Session 6 sv_randomizationSession 6 sv_randomization
Session 6 sv_randomization
 
Programming with Python and PostgreSQL
Programming with Python and PostgreSQLProgramming with Python and PostgreSQL
Programming with Python and PostgreSQL
 
Mgd08 lab01
Mgd08 lab01Mgd08 lab01
Mgd08 lab01
 
Employee management system report
Employee management system reportEmployee management system report
Employee management system report
 

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 

Recently uploaded (20)

Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Artificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic WarfareArtificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic Warfare
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 

Building Your First Data Science Applicatino in MongoDB

  • 1. www.centralinventions.com Building Your First Data Science App in MongoDB Robyn Allen | @enrobyn | MDBW 2017
  • 4. Math app goals Create a framework for student-accessible data science Improve STEM learning outcomes Increase math literacy!!!!!
  • 12. schema detail { ..., { "operand1" : 2073, "user_guess" : false, "correct" : false, "operand2" : 3, "start_time": NumberLong("1497055796831"), "operator" : "%", "end_time" : NumberLong("1497055798985") }, ... }
  • 13. { ..., "all_problems" : [ { "operand1" : 2073, "user_guess" : false, "correct" : false, "operand2" : 3, "start_time": NumberLong("1497055796831"), "operator" : "%", "end_time" : NumberLong("1497055798985") }, ... schema detail
  • 14. { ..., "all_problems" : [ { "operand1" : 2073, "user_guess" : false, "correct" : false, "operand2" : 3, "start_time": NumberLong("1497055796831"), "operator" : "%", "end_time" : NumberLong("1497055798985") }, { "operand1" : 77, "correct" : true, "user_guess" : false, "operand2" : 4, "start_time" : NumberLong("1497055827450"), "operator" : "%", "end_time" : NumberLong("1497055828629") },
  • 15. { "all_problems" : [{ "operand1" : 14, "correct" : true, "user_guess" : false, "operand2" : 5, "start_time" : NumberLong("1497055834697"), "operator" : "%", "end_time" : NumberLong("1497055835953") }, { "operand1" : 24, "correct" : true, "user_guess" : true, "operand2" : 2, "start_time" : NumberLong("1497055828630"), "operator" : "%", "end_time" : NumberLong("1497055830491") }, { "operand1" : 69, "correct" : true, "user_guess" : false, "operand2" : 2, "start_time" : NumberLong("1497055824300"), "operator" : "%", "end_time" : NumberLong("1497055825997") }, { "operand1" : 26, "correct" : true, "user_guess" : false, "operand2" : 5, "start_time" : NumberLong("1497055796831"), "operator" : "%", "end_time" : NumberLong("1497055798985") }, { "operand1" : 67, "correct" : true, "user_guess" : false, "operand2" : 2, "start_time" : NumberLong("1497055814628"), "operator" : "%", "end_time" : NumberLong("1497055816652") }, { "operand1" : 31, "correct" : true, "user_guess" : false, "operand2" : 4, "start_time" : NumberLong("1497055802959"), "operator" : "%", "end_time" : NumberLong("1497055804802") } ], ...
  • 16. { "_id" : ObjectId("593b42366523ec06eed182b9"), "session_start" : NumberLong("1497055796716"), "uuid" : "urn:uuid:55e72720-c0b1-4e81-89d6-ac1896b06661", "subtopic" : "divisibility superpowers" "all_problems" : [{ "operand1" : 14, "correct" : true, "user_guess" : false, "operand2" : 5, "start_time" : NumberLong("1497055834697"), "operator" : "%", "end_time" : NumberLong("1497055835953") }, { "operand1" : 24, "correct" : true, "user_guess" : true, "operand2" : 2, "start_time" : NumberLong("1497055828630"), "operator" : "%", "end_time" : NumberLong("1497055830491") }, { "operand1" : 69, "correct" : true, "user_guess" : false, "operand2" : 2, "start_time" : NumberLong("1497055824300"), "operator" : "%", "end_time" : NumberLong("1497055825997") }, { "operand1" : 26, "correct" : true, "user_guess" : false, "operand2" : 5, "start_time" : NumberLong("1497055796831"), "operator" : "%", "end_time" : NumberLong("1497055798985") }, { "operand1" : 67, "correct" : true, "user_guess" : false, "operand2" : 2, "start_time" : NumberLong("1497055814628"), "operator" : "%", "end_time" : NumberLong("1497055816652") }, { "operand1" : 31, "correct" : true, "user_guess" : false, "operand2" : 4, "start_time" : NumberLong("1497055802959"), "operator" : "%", "end_time" : NumberLong("1497055804802") } ],
  • 17. { "_id" : ObjectId("593b42366523ec06eed182b9"), "session_start" : NumberLong("1497055796716"), "uuid" : "urn:uuid:55e72720-c0b1-4e81-89d6-ac1896b06661", "subtopic" : "divisibility superpowers" "all_problems" : [{ "operand1" : 14, "correct" : true, "user_guess" : false, "operand2" : 5, "start_time" : NumberLong("1497055834697"), "operator" : "%", "end_time" : NumberLong("1497055835953") }, { "operand1" : 24, "correct" : true, "user_guess" : true, "operand2" : 2, "start_time" : NumberLong("1497055828630"), "operator" : "%", "end_time" : NumberLong("1497055830491") }, { "operand1" : 69, "correct" : true, "user_guess" : false, "operand2" : 2, "start_time" : NumberLong("1497055824300"), "operator" : "%", "end_time" : NumberLong("1497055825997") }, { "operand1" : 26, "correct" : true, "user_guess" : false, "operand2" : 5, "start_time" : NumberLong("1497055796831"), "operator" : "%", "end_time" : NumberLong("1497055798985") }, { "operand1" : 67, "correct" : true, "user_guess" : false, "operand2" : 2, "start_time" : NumberLong("1497055814628"), "operator" : "%", "end_time" : NumberLong("1497055816652") }, { "operand1" : 31, "correct" : true, "user_guess" : false, "operand2" : 4, "start_time" : NumberLong("1497055802959"), "operator" : "%", "end_time" : NumberLong("1497055804802") } ],
  • 18. { "_id" : ObjectId("593b42366523ec06eed182b9"), "session_start" : NumberLong("1497055796716"), "uuid" : "urn:uuid:55e72720-c0b1-4e81-89d6-ac1896b06661", "subtopic" : "divisibility superpowers" "all_problems" : [{ "operand1" : 14, "correct" : true, "user_guess" : false, "operand2" : 5, "start_time" : NumberLong("1497055834697"), "operator" : "%", "end_time" : NumberLong("1497055835953") }, { "operand1" : 24, "correct" : true, "user_guess" : true, "operand2" : 2, "start_time" : NumberLong("1497055828630"), "operator" : "%", "end_time" : NumberLong("1497055830491") }, { "operand1" : 69, "correct" : true, "user_guess" : false, "operand2" : 2, "start_time" : NumberLong("1497055824300"), "operator" : "%", "end_time" : NumberLong("1497055825997") }, { "operand1" : 26, "correct" : true, "user_guess" : false, "operand2" : 5, "start_time" : NumberLong("1497055796831"), "operator" : "%", "end_time" : NumberLong("1497055798985") }, { "operand1" : 67, "correct" : true, "user_guess" : false, "operand2" : 2, "start_time" : NumberLong("1497055814628"), "operator" : "%", "end_time" : NumberLong("1497055816652") }, { "operand1" : 31, "correct" : true, "user_guess" : false, "operand2" : 4, "start_time" : NumberLong("1497055802959"), "operator" : "%", "end_time" : NumberLong("1497055804802") } ],
  • 19. MongoDB quick-look MongoDB is a NoSQL database Data is stored in documents The schema can change! (even between documents) PyMongo is the recommended Python driver
  • 22. from pymongo import MongoClient # SET UP THE CONNECTION client = MongoClient("localhost", 27017) db = client["aprender"] mathcards = client["mathcards"] users = client["users"] collections
  • 23. from pymongo import MongoClient from secure import MONGO_USERNAME, MONGO_PASSWORD # SET UP THE CONNECTION client = MongoClient("localhost", 27017) db = client["aprender"] mathcards = client["mathcards"] users = client["users"] # AUTHENTICATE THE CONNECTION client.aprender.authenticate(MONGO_USERNAME, MONGO_PASSWORD, mechanism='SCRAM-SHA-1')
  • 24. # find_one() returns one mathcard --> DICTIONARY this_card = db.mathcards.find_one()
  • 25. # find_one() returns one mathcard --> DICTIONARY this_card = db.mathcards.find_one() # find() returns 1+ mathcard(s) --> CURSOR all_cards = db.mathcards.find()
  • 26. # find_one() returns one mathcard --> DICTIONARY this_card = db.mathcards.find_one() # find() returns 1+ mathcard(s) --> CURSOR all_cards = db.mathcards.find() for card in all_cards: for key in card.keys(): problem_data = card[key] do_some_stuff(problem_data)
  • 27. Queries, projections, etc. are documents A document is like a Python dictionary Example: { "uuid": some_uuid }
  • 28. A document is like a Python dictionary Example: { "uuid": some_uuid } Usage: .find({ "uuid": some_uuid }) Queries, projections, etc. are documents
  • 29. # get data from a certain user some_uuid = "urn:uuid:3f810ea0-3d27-43cc-87d7-0501635b3000" my_data = db.mathcards.find( { "uuid": some_uuid } )
  • 30. '''cards greater than or equal to a certain timestamp ''' todays_cards = db.mathcards.find( { "session_start": { "$gte": 1493753538942 } } )
  • 32. MongoDB Aggregation Pipelines A list of one or more stages Very similar to UNIX pipes Documents pass from one stage to the next
  • 33. OVERALL CONCEPT OF THE AGGREGATION PIPELINE RESULTS OF INTEREST SOME DOCUMENTS $group ALL DOCUMENTS $match
  • 34. RESULTS # task1a all docs docs w/ specified start time result: the total number of docs which entered this stage $count $match
  • 35. $project $match all docs docs w/ specified start time result: docs w/ new info (number of probs solved, by session) # task2a
  • 36. $match example pipeline1 = [ {"$match": { "session_start": { "$gte": this_morning } } }, ] name of key criteria
  • 37. .aggregate() syntax cursor1 = db.mathcards.aggregate(pipeline1) for doc in cursor1: print(doc)
  • 39. $count example pipeline1a = [ {"$match": { "session_start": { "$gte": this_morning } } }, {"$count": "total_sessions_today" } ] # task1a
  • 40. $project example pipeline1b = [ {"$match": { "session_start": { "$gte": this_morning } } }, {"$project": {"session_start": 1}} ] # task1b name of key display flag
  • 41. $project pipeline2a = [ {"$match": { "session_start": { "$gte": this_morning } } }, {"$project": {"session_probs" : {"$size": "$___________"} } } ] # task2a
  • 42. $project pipeline2a = [ {"$match": { "session_start": { "$gte": this_morning } } }, {"$project": {"session_probs" : {"$size": "$___________"} } } ] # task2a name of the list of problems
  • 43. $project pipeline2a = [ {"$match": { "session_start": { "$gte": this_morning } } }, {"$project": {"session_probs" : {"$size": "$all_problems"} } } ] # task2a $all_problems
  • 44. $group pipeline2b = [ {"$match" : {"session_start" : {"$gte" : this_morning } } }, {"$project" : {"session_probs" : {"$size" : "$all_problems"} } }, {"$group" : { "_id" : None, "total_problems" : {"$sum" : "_____________"} }} ] # task2b
  • 45. $group pipeline2b = [ {"$match" : {"session_start" : {"$gte" : this_morning } } }, {"$project" : {"session_probs" : {"$size" : "$all_problems"} } }, {"$group" : { "_id" : None, "total_problems" : {"$sum" : "$session_probs"} }} ] # task2b
  • 46. $group pipeline2b = [ {"$project" : {"session_probs" : {"$size" : "$all_problems"} } }, {"$group" : { "_id" : None, "total_problems" : {"$sum" : "$session_probs"} }} ] # task3
  • 47. $avg pipeline4 = [ {"$match" : {"session_start" : {"$gte" : this_morning } } }, {"$project" : {"session_probs" : {"$size" : "$all_problems"} } }, {"$group" : { "_id" : None, "avg_num_probs" : {"______": "__________"} }} ] # task4 ?
  • 48. $avg pipeline4 = [ {"$match" : {"session_start" : {"$gte" : this_morning } } }, {"$project" : {"session_probs" : {"$size" : "$all_problems"} } }, {"$group" : { "_id" : None, "avg_num_probs" : {"$avg": "$session_probs"} }} ] # task4
  • 49. $stdDevSamp pipeline4 = [ {"$match" : {"session_start" : {"$gte" : this_morning } } }, {"$project" : {"session_probs" : {"$size" : "$all_problems"} } }, {"$group" : { "_id" : None, "std_dev_num_probs" : {"_________": "_________"} }} ] # task5 ?
  • 50. $stdDevSamp pipeline4 = [ {"$match" : {"session_start" : {"$gte" : this_morning } } }, {"$project" : {"session_probs" : {"$size" : "$all_problems"} } }, {"$group" : { "_id" : None, "std_dev_num_probs" : {"$stdDevSamp": "$session_probs"} }} ] # task5
  • 51. Individual work time Search for tasks in the .py file Take a moment to write one or more pipeline stages Check end of file comments if stuck
  • 52. Multi-stage aggregation pipelines task6: Response time by operand2 [2,3,4,5,6,9] for one user task7: Percent accuracy (“score”) by operand2 for one user task8: Retrieve, for one user, operand2 w/ lowest score task9: Retrieve, for one user, operand2 w/ fastest time task10: Retrieve operand2 which challenged the most users
  • 53. $match # task6 all docs docs w/ a certain uuid
  • 54. $match # task6 all docs docs w/ a certain uuid
  • 55. $unwind $match # task6 all docs docs w/ a certain uuid now every array element in all_problems is a doc!
  • 56. $unwind $match # task6 all docs docs w/ a certain uuid now every array element in all_problems is a doc!
  • 57. $project $unwind $match # task6 all docs docs w/ a certain uuid now every array element in all_problems is a doc! add two fields to the docs (suppress others)
  • 58. $group $project $unwind $match # task6 all docs docs w/ a certain uuid now every array element in all_problems is a doc! add two fields to the docs (suppress others)
  • 59. RESULTS $group $project $unwind $match # task6 all docs docs w/ a certain uuid now every array element in all_problems is a doc! add two fields to the docs (suppress others) group by operand2, get time spent
  • 60. $unwind # task6 pipeline6 = [ { "$match" : # $match on uuid_of_interest { "$unwind" : # $unwind array of problems { "$project": { "operand2": # use dot notation "time_spent": # compute time spent }, { "$group":{ "_id": # group on operand2 "avg_time_spent": # compute $avg } } ]
  • 61. $unwind # task6pipeline6 = [ { "$match" : {"uuid" : uuid_of_interest } }, { "$unwind" : "$all_problems" }, { "$project": { "operand2": "$all_problems.operand2", "time_spent": {"$subtract": ["$all_problems.end_time", "$all_problems.start_time"]}, "session_start":1, "_id":0} }, { "$group":{ "_id": {"operand2": "$operand2"}, "avg_time_spent": {"$avg": "$time_spent"}, } } ]
  • 62. $addFields # task7 pipeline7a = [ { "$match" : ... { "$unwind" : ... { "$group":{ "_id": # $group on operand2 "total_attempted": # $sum "total_correct": } }, { "$addFields":{ "percent_accuracy": } } ]
  • 63. $group for task7 # task7 {"$group":{ "_id": "$all_problems.operand2", "total_attempted": {"$sum":1}, "total_correct": {"$sum": { "$cond": ["$all_problems.correct", 1, 0] } } } }
  • 64. $addFields for task7 # task7 {"$addFields":{ "percent_accuracy": {"$divide": ["$total_correct", "$total_attempted"] } } }
  • 65. task8 hints # task8 We want to find the operand2 which had the lowest score... What stage(s) could you add to pipeline7 in order to solve this?
  • 66. $sort and $limit # task8 pipeline7.extend([ {"$sort": {"percent_accuracy": 1}}, {"$limit": 1} ])
  • 67. task9 hints # task9 We want to find the operand2 which had the fastest time... What stage(s) could you add to pipeline6 in order to solve this?
  • 68. $sort and $limit # task9 pipeline6.extend([ {"$sort": {"avg_time_spent": 1}}, {"$limit": 1} ]}
  • 69. Check out pipeline7a... # task10 Hint: Add a $sort stage to 7a
  • 70. Check out pipeline7a... # task10 Hint: Add a $sort stage to 7a {"$sort" : {"percent_accuracy": 1} }
  • 71. Results! # task10 {'percent_accuracy': 0.7724425887265136, 'total_correct': 740, 'total_attempted': 958, '_id': {'op2': 3}} {'percent_accuracy': 0.8316151202749141, 'total_correct': 726, 'total_attempted': 873, '_id': {'op2': 6}} {'percent_accuracy': 0.8428731762065096, 'total_correct': 751, 'total_attempted': 891, '_id': {'op2': 9}} {'percent_accuracy': 0.8562564632885212, 'total_correct': 828, 'total_attempted': 967, '_id': {'op2': 4}} {'percent_accuracy': 0.9286510590858417, 'total_correct': 833, 'total_attempted': 897, '_id': {'op2': 5}} {'percent_accuracy': 0.9333333333333333, 'total_correct': 882, 'total_attempted': 945, '_id': {'op2': 2}}
  • 72. Results! # task10 {'percent_accuracy': 0.7724425887265136, 'total_correct': 740, 'total_attempted': 958, '_id': {'op2': 3}} {'percent_accuracy': 0.8316151202749141, 'total_correct': 726, 'total_attempted': 873, '_id': {'op2': 6}} {'percent_accuracy': 0.8428731762065096, 'total_correct': 751, 'total_attempted': 891, '_id': {'op2': 9}} {'percent_accuracy': 0.8562564632885212, 'total_correct': 828, 'total_attempted': 967, '_id': {'op2': 4}} {'percent_accuracy': 0.9286510590858417, 'total_correct': 833, 'total_attempted': 897, '_id': {'op2': 5}} {'percent_accuracy': 0.9333333333333333, 'total_correct': 882, 'total_attempted': 945, '_id': {'op2': 2}}
  • 73. Results! # task10 {'percent_accuracy': 0.7724425887265136, 'total_correct': 740, 'total_attempted': 958, '_id': {'op2': 3}} {'percent_accuracy': 0.8316151202749141, 'total_correct': 726, 'total_attempted': 873, '_id': {'op2': 6}} {'percent_accuracy': 0.8428731762065096, 'total_correct': 751, 'total_attempted': 891, '_id': {'op2': 9}} {'percent_accuracy': 0.8562564632885212, 'total_correct': 828, 'total_attempted': 967, '_id': {'op2': 4}} {'percent_accuracy': 0.9286510590858417, 'total_correct': 833, 'total_attempted': 897, '_id': {'op2': 5}} {'percent_accuracy': 0.9333333333333333, 'total_correct': 882, 'total_attempted': 945, '_id': {'op2': 2}}
  • 74. Conclusion PyMongo = easy to learn You can learn PyMongo The aggregation pipeline enables you to run data science code efficiently on your database servers without needing to move any data
  • 77. Resources Asya Kamsky's talk! 4:30PM WED. in Grand Ballroom “Powerful Analysis with the Aggregation Pipeline” MongoDB University (free!) https://university.mongodb.com/ Aggregation Pipeline Quick Reference https://docs.mongodb.com/manual/meta/aggregation-quick- reference/ MongoDB Day-long conferences
  • 78. Operators useful for aggregation pipelines $sort $group $map $addFields $let $cond $min $max $unwind $limit $project $match $push $addToSet $first $sum $eq $divide $multiply $gt $lt $gte $lte
  • 79. $cond {"$cond":{ "if": {"$and":[ {"$eq": ["$$nth.label", "tryHarder"]}, {"$eq": ["$$nth.user_response", "yes"]}, {"$eq": ["$$nplus1.label", "tryHarder"]}, {"$eq": ["$$nplus1.user_response", "yes"]} ]}, "then": True, "else": False }}
  • 80. $cond {"$cond":{ "if": {"$and":[ {"$eq": ["$$nth.label", "tryHarder"]}, {"$eq": ["$$nth.user_response", "yes"]}, {"$eq": ["$$nplus1.label", "tryHarder"]}, {"$eq": ["$$nplus1.user_response", "yes"]} ]}, "then": True, "else": False }}
  • 81. {"$eq": ["$$nth.label", "tryHarder"]}, {"$eq": ["$$nth.user_response", "yes"]}, {"$eq": ["$$nplus1.label", "tryHarder"]}, {"$eq": ["$$nplus1.user_response", "yes"]} $eq
  • 83. { "_id" : ObjectId("593b42366523ec06eed182b9"), "session_start" : NumberLong("1497055796716"), "uuid" : "urn:uuid:55e72720-c0b1-4e81-89d6-ac1896b06661", "subtopic" : "divisibility superpowers" "all_problems" : [...], "interventions" : [ { "deploy_time" : NumberLong("1497055817986"), "label" : "tryHarder", "user_response" : "no", "dismiss_time" : 2939 } } 71

Editor's Notes

  1. From docs: “you can specify an _id value of null to calculate accumulated values for all the input documents as a whole"
  2. From docs: “you can specify an _id value of null to calculate accumulated values for all the input documents as a whole"
  3. “Deconstructs an array field from the input documents to output a document for each element. Each output document is the input document with the value of the array field replaced by the element.”