SlideShare a Scribd company logo
mongoDB Project
Relational databases & Document-Oriented databases
Athens University of Economics and Business
Dpt. Of Management Science and Technology
Prof. Damianos Chatziantoniou
| lkoutsokera@gmail.com
| stratos.gounidellis@gmail.com
Lamprini Koutsokera (8130074)
Stratos Gounidellis (8130029)
BDSMasters
SQL Server vs. mongoDB
2
Description Microsoft’s relational DBMS One of the most popular
document stores
Database model Relational DBMS Document store
Implementation language C++ C ++
Data scheme yes schema-free
Triggers yes no
Replication methods yes, depending the SQL-Server Edition Master-slave replication
Partitioning methods tables can be distributed across Sharding
several files, sharding through
federation
References: [1]
From theory to practice
3
Tools
4
Required installations
5
References: [2]
1. Determine which MongoDB build you need.
2. Download MongoDB for Windows
3. Install MongoDB Community Edition.
4. Set up the MongoDB environment.
Mongodb installation
Required python packages installation
From software configuration to coding
6
Python coding [1] – table parsing
7
Part 1 - Queries and the Aggregation Pipeline [1]
. . .
load() function
mongo shell
prep.js file
References: [3]
Python coding [1] – table parsing
8
Query 1 : How many students in your database are currently taking at least 1 class (i.e. have a class with
a course_status of “In Progress”)?
Part 1 - Queries and the Aggregation Pipeline [2]
Query 2 : Produce a grouping of the documents that contains the name of each home city and the
number of students enrolled from that home city.
Python coding [1] – table parsing
9
Query 3 : Which hobby or hobbies are the most popular?
Part 1 - Queries and the Aggregation Pipeline [3]
the most popular
the top 5 popular
Python coding [1] – table parsing
10
Query 4 : What is the GPA (ignoring dropped
classes and in progress classes) of the best student?
Part 1 - Queries and the Aggregation Pipeline [4]
Query 5 : Which student has the largest number of
grade 10’s?
Python coding [1] – table parsing
11
Query 6 : Which class has the highest average
GPA?
Part 1 - Queries and the Aggregation Pipeline [5]
Query 7 : Which class has been dropped the most
number of times?
Python coding [1] – table parsing
12
Query 8 : Produce of a count of classes that have been COMPLETED by class type. The class type is found
by taking the first letter of the course code so that M102 has type M.
Part 1 - Queries and the Aggregation Pipeline [6]
Python coding [1] – table parsing
13
Query 9 : Produce a transformation of the documents so that the documents now have an additional boolean
field called “hobbyist” that is true when the student has more than 3 hobbies and false otherwise.
Part 1 - Queries and the Aggregation Pipeline [7]
Python coding [1] – table parsing
14
Query 10 : Produce a transformation of the documents so that the documents now have an additional field that
contains the number of classes that the student has completed.
Part 1 - Queries and the Aggregation Pipeline [8]
Python coding [1] – table parsing
15
Query 11 : Produce a transformation of the documents in
the collection so that they look like the following output.
The GPA is the average grade of all the completed
classes. The other two computed fields are the number of
classes currently in progress and the number of classes
dropped. No other fields should be in there. No other fields
should be present.
Part 1 - Queries and the Aggregation Pipeline [9]
Python coding [1] – table parsing
16
Query 12 : Produce a NEW collection (HINT: Use $out in the aggregation pipeline) so that the new documents
in this correspond to the classes on offer. The structure of the documents should be like the following output.
The _id field should be the course code.
The course_title is what it was before. The numberOfDropouts is the number of students who dropped out. The
numberOfTimesCompleted is the number of students that completed this class. The currentlyRegistered array is
an array of ObjectID’s corresponding to the students who are currently taking the class. Finally, for the students
that completed the class, the maxGrade, minGrade and avgGrade are the summary statistics for that class.
Part 1 - Queries and the Aggregation Pipeline [10]
Python coding [1] – table parsing
17
Part 1 - Queries and the Aggregation Pipeline [11]
Python coding [1] – table parsing
18
Part 2 - Python & MongoDB [1]
python_mongodb.py: Implement simple operations on mongo database.
Connect to mongo database and collection. Connect to mongo database and collection and insert
a record.
Python coding [1] – table parsing
19
Part 2 - Python & MongoDB [2]
Connect to mongo database and collection and insert
multiple records.
Connect to mongo database and collection and print
its content.
python_mongodb.py: Implement simple operations on mongo database.
Python coding [1] – table parsing
20
Part 2 - Python & MongoDB [3]
Connect to mongo database and collection and update
Its documents.
Connect to mongo database and collection and print
specific field.
python_mongodb.py: Implement simple operations on mongo database.
Python coding [1] – table parsing
21
Part 2 - Python & MongoDB [4]
Connect to mongo database and collection and convert the
collection to a dataframe.
Connect to mongo database and collection and import
data from a dataframe.
python_mongodb.py: Implement simple operations on mongo database.
Python coding [1] – table parsing
22
Part 2 - Python & MongoDB [5]
1. Clone this repository:
2. Install the required python packages.
3. Run python_mongodb.py to implement basic operations (insert_one, insert_many,
update, delete_one, delete_many, etc.) on mongodb.
python_mongodb.py: Implement simple operations on mongo database.
Python coding [1] – table parsing
23
Part 2 - Python & MongoDB [6]
Output
Python coding [1] – table parsing
24
Part 3 - MapReduce (Word Count) [1]
MapReduce 1 : Write a map reduce job on the
students collection similar to the classic word
count example. More specifically, implement a
word count using the course title field as the text.
In addition, exclude stop words from this list. You
should find/write your own list of stop words. (Stop
words are the common words in the English
language like “a”, “in”, “to”, “the”, etc.)
References: [4,5]
key value
[word1, 1]
[word2, 1]
[word3, 1]
[word2, 1]
[word4, 1]
. . . . . . . .
[wordx, 1]
Mapper
map
map
map
.
.
.
25
Part 3 - MapReduce (Word Count) [2]
course_title
[Predictive Modeling]
[MongoDB Operations]
[Hadoop and MapReduce]
[Data Mining]
[MongoDB Operations]
[Data Mining]
. . . . . . . . . . . . . . . . . . . . . .
Reducer
reduce
reduce
reduce
.
.
.
26
Part 3 - MapReduce (Word Count) [3]
key value
[word1, count1]
[word2, count2]
[word3, count3]
. . . . . . . . . . . . .
key value
[word1, {1, 1, 1, …} ]
[word2, {1, 1, 1, …} ]
[word3, {1, 1, 1, …} ]
. . . . . . . . . . . . . . . . .
Python coding [1] – table parsing
27
Part 3 - MapReduce (Average grade) [1]
MapReduce 2 : Write a map reduce job on the students
collection whose goal is to compute average
GPA scores for completed courses by
home city and by course type (M, B, P, etc.).
References: [4,5]
key value
[{home_city: Athina, course_type: M},
{count: 1, sum: 8}]
[{home_city: Chania, course_type: P},
{count: 1, sum: 6}]
[{home_city: Thyra, course_type: V},
{count: 1, sum: 3}]
[{home_city: Arta, course_type: M},
{count: 1, sum: 10}]
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Mapper
map
map
map
.
.
.
28
Part 3 - MapReduce (Average grade) [2]
(course_code, home_city), grade
[(S201, Athina), 10]
[(M101, Mytilini), 9]
[(S202, Kavala), 3]
[(D102, Chania), 5]
[(P103, Athina), 6]
[(P101, Arta), 10]
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Reducer
reduce
reduce
reduce
.
.
.
29
Part 3 - MapReduce (Average grade) [3]
key value
[{home_city: Athina, course_type: M},
[{count: 1, sum: 8}, [{count: 1, sum:
3}, [{count: 1, sum: 7}]
[{home_city: Chania, course_type: P},
{count: 1, sum: 6} , [{count: 1, sum:
5}, [{count: 1, sum: 4}]]
[{home_city: Thyra, course_type: V},
{count: 1, sum: 3} , [{count: 1, sum:
7}, [{count: 1, sum: 10}]]
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
key value
[{home_city: Athina, course_type: M},
{count: 22, sum: 80}]
[{home_city: Chania, course_type: P},
{count: 8, sum: 100}]
[{home_city: Thyra, course_type: V},
{count: 47, sum: 300}]
[{home_city: Arta, course_type: M},
{count: 19, sum: 150}]
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Finalizer
finalize
finalize
finalize
.
.
.
30
Part 3 - MapReduce (Average grade) [4]
key value
[{home_city: Athina, course_type: M},
{count: 22, sum: 80}]
[{home_city: Chania, course_type: P},
{count: 8, sum: 100}]
[{home_city: Thyra, course_type: V},
{count: 47, sum: 300}]
[{home_city: Arta, course_type: M},
{count: 19, sum: 150}]
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
key value
[{home_city: Athina, course_type: M},
{avg: 8.5012}]
[{home_city: Chania, course_type: P},
{avg: 5.5314}]
[{home_city: Thyra, course_type: V},
{avg: 7.7713}]
[{home_city: Arta, course_type: M},
{avg: 6.4344}]
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References
[1] Db-engines.com. (n.d.). System Properties Comparison Microsoft SQL Server vs. MongoDB vs. Oracle NoSQL
[online] Available at: https://db-engines.com/en/system/Microsoft+SQL+Server%3BMongoDB%3BOracle+NoSQL
[Accessed 5 May 2017].
[2] Install MongoDB Community Edition on Windows — MongoDB Manual 3.4.
https://docs.mongodb.com/manual/tutorial/install-mongodb-on-windows/t [Accessed 2 May 2017].
[3] Aggregation Pipeline — MongoDB Manual 3.4 https://docs.mongodb.com/manual/core/aggregation-pipeline/
[Accessed 2 May 2017].
[4] Map-Reduce Examples — MongoDB Manual 3.4 https://docs.mongodb.com/manual/tutorial/map-reduce-
examples/ [Accessed 2 May 2017].
[5] Map-Reduce — MongoDB Manual 3.4 https://docs.mongodb.com/manual/core/map-reduce/ [Accessed 2 May
2017].
| lkoutsokera@gmail.com
| stratos.gounidellis@gmail.com
Lamprini Koutsokera (8130074)
Stratos Gounidellis (8130029)
BDSMasters

More Related Content

What's hot

Overview of query evaluation
Overview of query evaluationOverview of query evaluation
Overview of query evaluation
avniS
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Serban Tanasa
 
Visualization-Driven Data Aggregation
Visualization-Driven Data AggregationVisualization-Driven Data Aggregation
Visualization-Driven Data Aggregation
Zbigniew Jerzak
 
4.4 external hashing
4.4 external hashing4.4 external hashing
4.4 external hashing
Krish_ver2
 
A Fast and Dirty Intro to NetworkX (and D3)
A Fast and Dirty Intro to NetworkX (and D3)A Fast and Dirty Intro to NetworkX (and D3)
A Fast and Dirty Intro to NetworkX (and D3)
Lynn Cherny
 
CS375 Presentation-binary sort.pptx
CS375 Presentation-binary sort.pptxCS375 Presentation-binary sort.pptx
CS375 Presentation-binary sort.pptx
Liyu Ying
 
DBMS 9 | Extendible Hashing
DBMS 9 | Extendible HashingDBMS 9 | Extendible Hashing
DBMS 9 | Extendible Hashing
Mohammad Imam Hossain
 
Map reduce: beyond word count
Map reduce: beyond word countMap reduce: beyond word count
Map reduce: beyond word count
Jeff Patti
 
Query evaluation and optimization
Query evaluation and optimizationQuery evaluation and optimization
Query evaluation and optimization
lavanya marichamy
 
Python networkx library quick start guide
Python networkx library quick start guidePython networkx library quick start guide
Python networkx library quick start guide
Universiti Technologi Malaysia (UTM)
 
Hadoop in sigmod 2011
Hadoop in sigmod 2011Hadoop in sigmod 2011
Hadoop in sigmod 2011
Bin Cai
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classification
Yanchang Zhao
 
Dremel interactive analysis of web scale datasets
Dremel interactive analysis of web scale datasetsDremel interactive analysis of web scale datasets
Dremel interactive analysis of web scale datasets
Carl Lu
 
8 query processing and optimization
8 query processing and optimization8 query processing and optimization
8 query processing and optimization
Kumar
 
Introduction to Pandas and Time Series Analysis [Budapest BI Forum]
Introduction to Pandas and Time Series Analysis [Budapest BI Forum]Introduction to Pandas and Time Series Analysis [Budapest BI Forum]
Introduction to Pandas and Time Series Analysis [Budapest BI Forum]
Alexander Hendorf
 
2 R Tutorial Programming
2 R Tutorial Programming2 R Tutorial Programming
2 R Tutorial Programming
Sakthi Dasans
 
Document clustering for forensic analysis an approach for improving compute...
Document clustering for forensic   analysis an approach for improving compute...Document clustering for forensic   analysis an approach for improving compute...
Document clustering for forensic analysis an approach for improving compute...
Madan Golla
 
5. working on data using R -Cleaning, filtering ,transformation, Sampling
5. working on data using R -Cleaning, filtering ,transformation, Sampling5. working on data using R -Cleaning, filtering ,transformation, Sampling
5. working on data using R -Cleaning, filtering ,transformation, Sampling
krishna singh
 
Ahad Butt
Ahad ButtAhad Butt
Ahad Butt
AhadButt
 
Chapter15
Chapter15Chapter15
Chapter15
gourab87
 

What's hot (20)

Overview of query evaluation
Overview of query evaluationOverview of query evaluation
Overview of query evaluation
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
 
Visualization-Driven Data Aggregation
Visualization-Driven Data AggregationVisualization-Driven Data Aggregation
Visualization-Driven Data Aggregation
 
4.4 external hashing
4.4 external hashing4.4 external hashing
4.4 external hashing
 
A Fast and Dirty Intro to NetworkX (and D3)
A Fast and Dirty Intro to NetworkX (and D3)A Fast and Dirty Intro to NetworkX (and D3)
A Fast and Dirty Intro to NetworkX (and D3)
 
CS375 Presentation-binary sort.pptx
CS375 Presentation-binary sort.pptxCS375 Presentation-binary sort.pptx
CS375 Presentation-binary sort.pptx
 
DBMS 9 | Extendible Hashing
DBMS 9 | Extendible HashingDBMS 9 | Extendible Hashing
DBMS 9 | Extendible Hashing
 
Map reduce: beyond word count
Map reduce: beyond word countMap reduce: beyond word count
Map reduce: beyond word count
 
Query evaluation and optimization
Query evaluation and optimizationQuery evaluation and optimization
Query evaluation and optimization
 
Python networkx library quick start guide
Python networkx library quick start guidePython networkx library quick start guide
Python networkx library quick start guide
 
Hadoop in sigmod 2011
Hadoop in sigmod 2011Hadoop in sigmod 2011
Hadoop in sigmod 2011
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classification
 
Dremel interactive analysis of web scale datasets
Dremel interactive analysis of web scale datasetsDremel interactive analysis of web scale datasets
Dremel interactive analysis of web scale datasets
 
8 query processing and optimization
8 query processing and optimization8 query processing and optimization
8 query processing and optimization
 
Introduction to Pandas and Time Series Analysis [Budapest BI Forum]
Introduction to Pandas and Time Series Analysis [Budapest BI Forum]Introduction to Pandas and Time Series Analysis [Budapest BI Forum]
Introduction to Pandas and Time Series Analysis [Budapest BI Forum]
 
2 R Tutorial Programming
2 R Tutorial Programming2 R Tutorial Programming
2 R Tutorial Programming
 
Document clustering for forensic analysis an approach for improving compute...
Document clustering for forensic   analysis an approach for improving compute...Document clustering for forensic   analysis an approach for improving compute...
Document clustering for forensic analysis an approach for improving compute...
 
5. working on data using R -Cleaning, filtering ,transformation, Sampling
5. working on data using R -Cleaning, filtering ,transformation, Sampling5. working on data using R -Cleaning, filtering ,transformation, Sampling
5. working on data using R -Cleaning, filtering ,transformation, Sampling
 
Ahad Butt
Ahad ButtAhad Butt
Ahad Butt
 
Chapter15
Chapter15Chapter15
Chapter15
 

Similar to mongoDB Project: Relational databases & Document-Oriented databases

MongoDB Project: Relational databases to Document-Oriented databases
MongoDB Project: Relational databases to Document-Oriented databasesMongoDB Project: Relational databases to Document-Oriented databases
MongoDB Project: Relational databases to Document-Oriented databases
Lamprini Koutsokera
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using Python
NishantKumar1179
 
Lecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningLecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learning
my6305874
 
Computer Science CS Project Matrix CBSE Class 12th XII .pdf
Computer Science CS Project Matrix CBSE Class 12th XII .pdfComputer Science CS Project Matrix CBSE Class 12th XII .pdf
Computer Science CS Project Matrix CBSE Class 12th XII .pdf
PranavAnil9
 
Python Manuel-R2021.pdf
Python Manuel-R2021.pdfPython Manuel-R2021.pdf
Python Manuel-R2021.pdf
RamprakashSingaravel1
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
Julian Hyde
 
SE-IT JAVA LAB SYLLABUS
SE-IT JAVA LAB SYLLABUSSE-IT JAVA LAB SYLLABUS
SE-IT JAVA LAB SYLLABUS
nikshaikh786
 
Python-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxPython-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptx
ParveenShaik21
 
More on Pandas.pptx
More on Pandas.pptxMore on Pandas.pptx
More on Pandas.pptx
VirajPathania1
 
Python-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxPython-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptx
Sandeep Singh
 
Python for Data Analysis.pdf
Python for Data Analysis.pdfPython for Data Analysis.pdf
Python for Data Analysis.pdf
JulioRecaldeLara1
 
Python-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxPython-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptx
tangadhurai
 
Python for data analysis
Python for data analysisPython for data analysis
Python for data analysis
Savitribai Phule Pune University
 
CBSE Class 12 Computer Science(083) Sample Question Paper 2020-21
CBSE Class 12 Computer Science(083) Sample Question Paper 2020-21CBSE Class 12 Computer Science(083) Sample Question Paper 2020-21
CBSE Class 12 Computer Science(083) Sample Question Paper 2020-21
chinthala Vijaya Kumar
 
Mmt 001
Mmt 001Mmt 001
Mmt 001
sujatam8
 
XII - 2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdf
XII -  2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdfXII -  2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdf
XII - 2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdf
KrishnaJyotish1
 
BDS_QA.pdf
BDS_QA.pdfBDS_QA.pdf
BDS_QA.pdf
NikunjaParida1
 
Python-for-Data-Analysis.pdf
Python-for-Data-Analysis.pdfPython-for-Data-Analysis.pdf
Python-for-Data-Analysis.pdf
ssuser598883
 
Chapter 16-spreadsheet1 questions and answer
Chapter 16-spreadsheet1  questions and answerChapter 16-spreadsheet1  questions and answer
Chapter 16-spreadsheet1 questions and answer
RaajTech
 
Study material ip class 12th
Study material ip class 12thStudy material ip class 12th
Study material ip class 12th
animesh dwivedi
 

Similar to mongoDB Project: Relational databases & Document-Oriented databases (20)

MongoDB Project: Relational databases to Document-Oriented databases
MongoDB Project: Relational databases to Document-Oriented databasesMongoDB Project: Relational databases to Document-Oriented databases
MongoDB Project: Relational databases to Document-Oriented databases
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using Python
 
Lecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningLecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learning
 
Computer Science CS Project Matrix CBSE Class 12th XII .pdf
Computer Science CS Project Matrix CBSE Class 12th XII .pdfComputer Science CS Project Matrix CBSE Class 12th XII .pdf
Computer Science CS Project Matrix CBSE Class 12th XII .pdf
 
Python Manuel-R2021.pdf
Python Manuel-R2021.pdfPython Manuel-R2021.pdf
Python Manuel-R2021.pdf
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
 
SE-IT JAVA LAB SYLLABUS
SE-IT JAVA LAB SYLLABUSSE-IT JAVA LAB SYLLABUS
SE-IT JAVA LAB SYLLABUS
 
Python-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxPython-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptx
 
More on Pandas.pptx
More on Pandas.pptxMore on Pandas.pptx
More on Pandas.pptx
 
Python-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxPython-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptx
 
Python for Data Analysis.pdf
Python for Data Analysis.pdfPython for Data Analysis.pdf
Python for Data Analysis.pdf
 
Python-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxPython-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptx
 
Python for data analysis
Python for data analysisPython for data analysis
Python for data analysis
 
CBSE Class 12 Computer Science(083) Sample Question Paper 2020-21
CBSE Class 12 Computer Science(083) Sample Question Paper 2020-21CBSE Class 12 Computer Science(083) Sample Question Paper 2020-21
CBSE Class 12 Computer Science(083) Sample Question Paper 2020-21
 
Mmt 001
Mmt 001Mmt 001
Mmt 001
 
XII - 2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdf
XII -  2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdfXII -  2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdf
XII - 2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdf
 
BDS_QA.pdf
BDS_QA.pdfBDS_QA.pdf
BDS_QA.pdf
 
Python-for-Data-Analysis.pdf
Python-for-Data-Analysis.pdfPython-for-Data-Analysis.pdf
Python-for-Data-Analysis.pdf
 
Chapter 16-spreadsheet1 questions and answer
Chapter 16-spreadsheet1  questions and answerChapter 16-spreadsheet1  questions and answer
Chapter 16-spreadsheet1 questions and answer
 
Study material ip class 12th
Study material ip class 12thStudy material ip class 12th
Study material ip class 12th
 

Recently uploaded

Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 

Recently uploaded (20)

Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 

mongoDB Project: Relational databases & Document-Oriented databases

  • 1. mongoDB Project Relational databases & Document-Oriented databases Athens University of Economics and Business Dpt. Of Management Science and Technology Prof. Damianos Chatziantoniou | lkoutsokera@gmail.com | stratos.gounidellis@gmail.com Lamprini Koutsokera (8130074) Stratos Gounidellis (8130029) BDSMasters
  • 2. SQL Server vs. mongoDB 2 Description Microsoft’s relational DBMS One of the most popular document stores Database model Relational DBMS Document store Implementation language C++ C ++ Data scheme yes schema-free Triggers yes no Replication methods yes, depending the SQL-Server Edition Master-slave replication Partitioning methods tables can be distributed across Sharding several files, sharding through federation References: [1]
  • 3. From theory to practice 3
  • 5. Required installations 5 References: [2] 1. Determine which MongoDB build you need. 2. Download MongoDB for Windows 3. Install MongoDB Community Edition. 4. Set up the MongoDB environment. Mongodb installation Required python packages installation
  • 7. Python coding [1] – table parsing 7 Part 1 - Queries and the Aggregation Pipeline [1] . . . load() function mongo shell prep.js file References: [3]
  • 8. Python coding [1] – table parsing 8 Query 1 : How many students in your database are currently taking at least 1 class (i.e. have a class with a course_status of “In Progress”)? Part 1 - Queries and the Aggregation Pipeline [2] Query 2 : Produce a grouping of the documents that contains the name of each home city and the number of students enrolled from that home city.
  • 9. Python coding [1] – table parsing 9 Query 3 : Which hobby or hobbies are the most popular? Part 1 - Queries and the Aggregation Pipeline [3] the most popular the top 5 popular
  • 10. Python coding [1] – table parsing 10 Query 4 : What is the GPA (ignoring dropped classes and in progress classes) of the best student? Part 1 - Queries and the Aggregation Pipeline [4] Query 5 : Which student has the largest number of grade 10’s?
  • 11. Python coding [1] – table parsing 11 Query 6 : Which class has the highest average GPA? Part 1 - Queries and the Aggregation Pipeline [5] Query 7 : Which class has been dropped the most number of times?
  • 12. Python coding [1] – table parsing 12 Query 8 : Produce of a count of classes that have been COMPLETED by class type. The class type is found by taking the first letter of the course code so that M102 has type M. Part 1 - Queries and the Aggregation Pipeline [6]
  • 13. Python coding [1] – table parsing 13 Query 9 : Produce a transformation of the documents so that the documents now have an additional boolean field called “hobbyist” that is true when the student has more than 3 hobbies and false otherwise. Part 1 - Queries and the Aggregation Pipeline [7]
  • 14. Python coding [1] – table parsing 14 Query 10 : Produce a transformation of the documents so that the documents now have an additional field that contains the number of classes that the student has completed. Part 1 - Queries and the Aggregation Pipeline [8]
  • 15. Python coding [1] – table parsing 15 Query 11 : Produce a transformation of the documents in the collection so that they look like the following output. The GPA is the average grade of all the completed classes. The other two computed fields are the number of classes currently in progress and the number of classes dropped. No other fields should be in there. No other fields should be present. Part 1 - Queries and the Aggregation Pipeline [9]
  • 16. Python coding [1] – table parsing 16 Query 12 : Produce a NEW collection (HINT: Use $out in the aggregation pipeline) so that the new documents in this correspond to the classes on offer. The structure of the documents should be like the following output. The _id field should be the course code. The course_title is what it was before. The numberOfDropouts is the number of students who dropped out. The numberOfTimesCompleted is the number of students that completed this class. The currentlyRegistered array is an array of ObjectID’s corresponding to the students who are currently taking the class. Finally, for the students that completed the class, the maxGrade, minGrade and avgGrade are the summary statistics for that class. Part 1 - Queries and the Aggregation Pipeline [10]
  • 17. Python coding [1] – table parsing 17 Part 1 - Queries and the Aggregation Pipeline [11]
  • 18. Python coding [1] – table parsing 18 Part 2 - Python & MongoDB [1] python_mongodb.py: Implement simple operations on mongo database. Connect to mongo database and collection. Connect to mongo database and collection and insert a record.
  • 19. Python coding [1] – table parsing 19 Part 2 - Python & MongoDB [2] Connect to mongo database and collection and insert multiple records. Connect to mongo database and collection and print its content. python_mongodb.py: Implement simple operations on mongo database.
  • 20. Python coding [1] – table parsing 20 Part 2 - Python & MongoDB [3] Connect to mongo database and collection and update Its documents. Connect to mongo database and collection and print specific field. python_mongodb.py: Implement simple operations on mongo database.
  • 21. Python coding [1] – table parsing 21 Part 2 - Python & MongoDB [4] Connect to mongo database and collection and convert the collection to a dataframe. Connect to mongo database and collection and import data from a dataframe. python_mongodb.py: Implement simple operations on mongo database.
  • 22. Python coding [1] – table parsing 22 Part 2 - Python & MongoDB [5] 1. Clone this repository: 2. Install the required python packages. 3. Run python_mongodb.py to implement basic operations (insert_one, insert_many, update, delete_one, delete_many, etc.) on mongodb. python_mongodb.py: Implement simple operations on mongo database.
  • 23. Python coding [1] – table parsing 23 Part 2 - Python & MongoDB [6] Output
  • 24. Python coding [1] – table parsing 24 Part 3 - MapReduce (Word Count) [1] MapReduce 1 : Write a map reduce job on the students collection similar to the classic word count example. More specifically, implement a word count using the course title field as the text. In addition, exclude stop words from this list. You should find/write your own list of stop words. (Stop words are the common words in the English language like “a”, “in”, “to”, “the”, etc.) References: [4,5]
  • 25. key value [word1, 1] [word2, 1] [word3, 1] [word2, 1] [word4, 1] . . . . . . . . [wordx, 1] Mapper map map map . . . 25 Part 3 - MapReduce (Word Count) [2] course_title [Predictive Modeling] [MongoDB Operations] [Hadoop and MapReduce] [Data Mining] [MongoDB Operations] [Data Mining] . . . . . . . . . . . . . . . . . . . . . .
  • 26. Reducer reduce reduce reduce . . . 26 Part 3 - MapReduce (Word Count) [3] key value [word1, count1] [word2, count2] [word3, count3] . . . . . . . . . . . . . key value [word1, {1, 1, 1, …} ] [word2, {1, 1, 1, …} ] [word3, {1, 1, 1, …} ] . . . . . . . . . . . . . . . . .
  • 27. Python coding [1] – table parsing 27 Part 3 - MapReduce (Average grade) [1] MapReduce 2 : Write a map reduce job on the students collection whose goal is to compute average GPA scores for completed courses by home city and by course type (M, B, P, etc.). References: [4,5]
  • 28. key value [{home_city: Athina, course_type: M}, {count: 1, sum: 8}] [{home_city: Chania, course_type: P}, {count: 1, sum: 6}] [{home_city: Thyra, course_type: V}, {count: 1, sum: 3}] [{home_city: Arta, course_type: M}, {count: 1, sum: 10}] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mapper map map map . . . 28 Part 3 - MapReduce (Average grade) [2] (course_code, home_city), grade [(S201, Athina), 10] [(M101, Mytilini), 9] [(S202, Kavala), 3] [(D102, Chania), 5] [(P103, Athina), 6] [(P101, Arta), 10] . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  • 29. Reducer reduce reduce reduce . . . 29 Part 3 - MapReduce (Average grade) [3] key value [{home_city: Athina, course_type: M}, [{count: 1, sum: 8}, [{count: 1, sum: 3}, [{count: 1, sum: 7}] [{home_city: Chania, course_type: P}, {count: 1, sum: 6} , [{count: 1, sum: 5}, [{count: 1, sum: 4}]] [{home_city: Thyra, course_type: V}, {count: 1, sum: 3} , [{count: 1, sum: 7}, [{count: 1, sum: 10}]] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . key value [{home_city: Athina, course_type: M}, {count: 22, sum: 80}] [{home_city: Chania, course_type: P}, {count: 8, sum: 100}] [{home_city: Thyra, course_type: V}, {count: 47, sum: 300}] [{home_city: Arta, course_type: M}, {count: 19, sum: 150}] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  • 30. Finalizer finalize finalize finalize . . . 30 Part 3 - MapReduce (Average grade) [4] key value [{home_city: Athina, course_type: M}, {count: 22, sum: 80}] [{home_city: Chania, course_type: P}, {count: 8, sum: 100}] [{home_city: Thyra, course_type: V}, {count: 47, sum: 300}] [{home_city: Arta, course_type: M}, {count: 19, sum: 150}] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . key value [{home_city: Athina, course_type: M}, {avg: 8.5012}] [{home_city: Chania, course_type: P}, {avg: 5.5314}] [{home_city: Thyra, course_type: V}, {avg: 7.7713}] [{home_city: Arta, course_type: M}, {avg: 6.4344}] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  • 31. References [1] Db-engines.com. (n.d.). System Properties Comparison Microsoft SQL Server vs. MongoDB vs. Oracle NoSQL [online] Available at: https://db-engines.com/en/system/Microsoft+SQL+Server%3BMongoDB%3BOracle+NoSQL [Accessed 5 May 2017]. [2] Install MongoDB Community Edition on Windows — MongoDB Manual 3.4. https://docs.mongodb.com/manual/tutorial/install-mongodb-on-windows/t [Accessed 2 May 2017]. [3] Aggregation Pipeline — MongoDB Manual 3.4 https://docs.mongodb.com/manual/core/aggregation-pipeline/ [Accessed 2 May 2017]. [4] Map-Reduce Examples — MongoDB Manual 3.4 https://docs.mongodb.com/manual/tutorial/map-reduce- examples/ [Accessed 2 May 2017]. [5] Map-Reduce — MongoDB Manual 3.4 https://docs.mongodb.com/manual/core/map-reduce/ [Accessed 2 May 2017]. | lkoutsokera@gmail.com | stratos.gounidellis@gmail.com Lamprini Koutsokera (8130074) Stratos Gounidellis (8130029) BDSMasters