SlideShare a Scribd company logo
1 of 35
MongoDBOnline Trainings
What is MongoDB ?
Non Relational
> Alternative to traditional RDBMS as workload and data volume needs
change
Document Oriented Database
> Simple human readable JSON Data model
Dynamic Schemas
> Flexible schemas, agile development, developer friendly
Built-in Replication for High Availability
> Provides data redundancy; automatically recovers from node failures
Built-in Sharding for Horizontal Scalability
> Auto partition; supports very large data sets and high throughput
operations
High Performance
> 5X – 10X times faster than traditional RDMBS
Document Oriented Database
What it does not mean:
What it does means:
- Store “JSON” Objects
- JSON stands for JavaScript Object Notation
- JSON is lightweight data interchange format similar to
XML
- JSON is language independent
- JSON is "self-describing" and easy to understand
- Example of simple JSON Document:
{
“firstName” : “dbversity”,
“lastName” : “.com”,
“deals” : [“technology”, “certifications”]
}
Documents = Rows in RDBMS
> Store entire document directly into MongoDB rather than breaking data into multiple tables
> Store documents of varying types in the same collection (Polymorphism)
What can we do with documents ?
Remember: Documents are similar to Rows in RDBMS
{
“firstName” : “dbversity”,
“lastName” : “.com”,
“DBs” : [“MongoDB”, “MariaDB”]
}
Documents = Rows in RDBMS
What can we do with documents ?
Remember: Documents are similar to Rows in RDBMS
> Embed documents within documents > Embed arrays within documents
NoSQL Document Oriented DB with Dynamic Schemas
RDBMS
MongoDB
Features of Dynamic Schemas
Data Model can evolve easily
Faster Time To Market
Performance can be delivered at scale as it reduces the need for
joins & disk seeks
RDBMS Terms/Concepts MongoDB Terms/Concepts
Database Database
Table Collection
Row Document or BSON Document
Column Field
Index Index
Table Joins Embedded documents and linking
Primary key
Specify any unique column or column
combination as primary key.
Primary key
In MongoDB, the primary key is automatically set to
the _id field.
Aggregation (e.g. group by) Aggregation pipeline
Comparing RDBMS and MongoDB Terms
 Automatic Replication
 Read Scalability
 Data redundancy
Business Needs Replica Set Benefits
High Availability Automatic failover
Disaster Recovery Hot backups offsite
Maintenance Rolling Upgrades
Low Latency Locate data near users
Workload Isolation Read from non-primary replicas
Data Privacy Res data to physical location
Data Consistency Tunable Consistency
Built in Replication for
HA and Failover
Availability
Replica Sets
MongoDBDriver
Case 1 – All Nodes in the replica set are up and
running
Case 2 – Primary goes down, request automatic
failover to Secondary
SecondaryPrimary
Secondary
Primary
Key Benefits
 No downtime required for
• Maintenance
• Upgrade`s
• Node Failures
• Data Center outages
 Load balance read requests
End user is unaware
of any failures and is
able to get results even
If the primary goes down
MongoDB
automatically
chooses a new
Primary
10
Scalability
Built in Sharding for
Horizontal Scalability
 Automatically partitions data
 Write Scalability
 Multiple parallel writes
Q - Z
G - P
A - F
MongoS
Business Benefits due to Sharding
Increases or decrease capacity as you go
Automatic balancing
Three types: Hash-based, Range-based, Tag-aware
Sharding
• Sharding divides data and distributes the it
over multiple servers, or shards. Each shard
is an independent database, and collectively,
the shards make up a single logical database.
• Sharding reduces the number of operations
each shard handles. Each shard processes
fewer operations as the cluster grows. As a
result, a cluster can increase capacity and
throughput horizontally.
For example, to insert data, the application
only needs to access the shard responsible
for that record.
• Sharding reduces the amount of data that
each server needs to store. Each shard stores
less data as the cluster grows.
For example, if a database has a 1 terabyte
data set, and there are 4 shards, then each
shard might hold only 256GB of data. If there
are 40 shards, then each shard might hold
only 25GB of data.
Sharding
mongos mongos
Shard 1 Shard 2 Shard 3 Shard 4
MongoDB automatically
chunk and migrate
documents based
upon the
shard key to
balance the data
Distribution across the
cluster
Mongos routes queries
only to the shards
that can satisfy
the query
End User
Sharding distributes IO workload for read and write scalability
Key Range
0..25
Key Range
26..40
Key Range
41..75
Key Range
76..120
Key Benefits
 Near-linear
performance scaling for
reads and writes
 No need to manually
define the key range of
each shard
 Automatic data
redistribution and
balancing
 Supports high user
concurrency
READS/QUERIES
• By Shard Key: Routed
• By non-Shard key: Scatter gather
WRITES
• Inserts: Requires Shard Key, routed
• Remove: Routed or Scattered
• Update: Routed or Scattered
Key Range
0..40
Key Range
41..120
13
Sharded Configuration
Config servers
store cluster’s
metadata
Shards store the data.
They provide high availability and
data consistency.
in a production sharded cluster,
each shard is a replica set.
Query Routers, or mongos instances,
interface with client applications
and direct operations to the appropriate
shard or shards.
A client sends requests to one query
router.
Most sharded clusters have many query
routers.
MongoDB Architecture
Primary Secondary
A
Primary Secondary
A
Primary Secondary
A
Primary Secondary
A
Config1 Config2 Config3
Router-1 Router-2 Router-3 Router-N
Driver
Application
Shard-1 Shard-2 Shard-3 Shard-N
- - - - -
- - - - -
Replica Set
mongos
mongo
d
mongo
d Redundancy of Data
Automatic Failover
Read Scalability (Distributed Reads)
Automatic Leader Election
Automatically Partitions Data
Write Scalability (Distributed writes)
Aggregates queries across shards
Can have 1 or as much needed and are lightweight processes
Stores Meta Data
System is up as long as 1/3 are up
If anyone is down, Metadata goes read-only
Sharding Types
Range Based Sharding
Hash Based Sharding
Tag Aware Sharding
Sharding Types
Hash-based Sharding :-
db.collection.createIndex( { _id: "hashed" } )
Tag Aware sharding :
sh.addShardTag("shard0000", "NYC")
sh.addShardTag("shard0002", "SFO")
sh.addShardTag("shard0002", "NRT")
sh.addTagRange("records.users", { zipcode: "10001" }, { zipcode: "10281" }, "NYC")
sh.addTagRange("records.users", { zipcode: "11201" }, { zipcode: "11240" }, “SFO")
sh.addTagRange("records.users", { zipcode: "94102" }, { zipcode: "94135" }, “NRT")
What is Chunk ?
A chunk is a contiguous range of data from a particular collection.
Chunks are described as a triple of ”Collection, minKey, and maxKey”.
Thus, the shard key K of a given document assigns that document to the
chunk where
minKey <= K <= maxKey.
Chunks default to 64 MB/ 1,00,000 Objects.
If a Chunk gets too large ( >64 MB/chunk), it split into two new chunks.
Splitting happens on the basis of Median of the chunks number.
When sorting is specified, the relevant shards sort locally (in the Shards
itself. ), and then mongos merges the results. Thus the mongos resource
usage is not terribly high.
Drivers Support
RDBMS v/s MongoDB
RDBMS
Data Definition
1. Defining tables and columns.
2. Create relationships.
3. Define data types.
Data Integrity
1. Data Normalization
2. Maintain Referential Integrity
3. Validating Data
DataAnalysis
1. CREATE DATABASE
(Typically done by DBA – more co-ordination required)
2. CREATE TABLE (knowledge of specific columns
needed before creating tables)
3. INSERT Data
Bond Data
- INSERT INTO BOND (b1, b2..) values
(val_b1,val_b2…)
FX Data
- INSERT INTO FX (f1, f2..) values (val_fx1,val_fx2…)
Development
(1stIteration)
New Data
1. ALTER TABLE <new columns> to the tables
2. Ensure that new columns do not break the existing
code such as application Code, Stored Procedures etc.
3. Need to obtain a maintenance window for adding new
columns as it will most probably lock the entire table.
4. One-Many or Many-Many relation needs multiple tables
Development
(2nd,3rd...Iteration)
MongoDB
None to Minimal
1. use mydb
2. INSERT Data
Bond Data
- db.mytable.insert ( {b1:val_b1, b2:val_b2 } )
FX Data
- db.mytable.insert ( {f1:val_fx1, f2:val_fx2 } )
(creates the database and tables if not present)
None
Compare Create, Alter, Drop
SQL Schema Statements MongoDB Schema Statements
CREATE TABLE users (
id MEDIUMINT NOT NULL
AUTO_INCREMENT,
user_id Varchar(30),
age Number,
status char(1),
PRIMARY KEY (id)
)
[Explicitly create a collection]
db.createCollection("users")
OR
[Implicitly created on first insert() operation]
db.users.insert( {
user_id: "abc123",
age: 55,
status: "A"
} )
ALTER TABLE users
ADD join_date DATETIME
db.users.update(
{ },
{ $set: { join_date: new Date() } },
{ multi: true }
)
ALTER TABLE users
DROP COLUMN join_date
db.users.update(
{ },
{ $unset: { join_date: "" } },
{ multi: true }
)
CREATE INDEX
idx_user_id_asc_age_desc
ON users(user_id, age DESC)
db.users.ensureIndex( { user_id: 1, age: -1 } )
DROP TABLE users db.users.drop()
Compare Selects
SQL SELECT Statements MongoDB Equivalent
SELECT *
FROM users
db.users.find()
SELECT id, user_id, status
FROM users
db.users.find(
{ },
{ user_id: 1, status: 1 }
)
SELECT user_id, status
FROM users
WHERE status = "A"
db.users.find(
{ status: "A" },
{ user_id: 1, status: 1, _id: 0 }
)
SELECT *
FROM users
WHERE status != "A"
db.users.find(
{ status: { $ne: "A" } }
)
SELECT *
FROM users
WHERE status = "A"
AND age = 50
db.users.find(
{ status: "A",
age: 50 }
)
SELECT *
FROM users
WHERE status = "A"
OR age = 50
db.users.find(
{ $or: [ { status: "A" } ,
{ age: 50 } ] }
)
SQL SELECT Statements MongoDB Equivalent
SELECT *
FROM users
WHERE user_id like "bc%"
db.users.find(
{ user_id: /^bc/ }
)
SELECT *
FROM users
WHERE status = "A"
ORDER BY user_id ASC
db.users.find( { status: "A" }
).sort( { user_id: 1 } )
SELECT COUNT(*)
FROM users
db.users.count()
or
db.users.find().count()
SELECT DISTINCT(status)
FROM users
db.users.distinct( "status" )
SELECT *
FROM users
LIMIT 5
SKIP 10
db.users.find().limit(5).skip(10)
EXPLAIN SELECT *
FROM users WHERE status = "A"
db.users.find( { status: "A" }
).explain()
Compare Insert, Delete, Updates
SQL INSERT Statements MongoDB Equivalent
INSERT INTO users(user_id,
age,
status)
VALUES ("bcd001",
45,
"A")
db.users.insert( {
user_id: "bcd001",
age: 45,
status: "A"
} )
SQL Update Statements MongoDB Equivalent
UPDATE users
SET status = "C"
WHERE age > 25
db.users.update(
{ age: { $gt: 25 } },
{ $set: { status: "C" } },
{ multi: true }
)
UPDATE users
SET age = age + 3
WHERE status = "A"
db.users.update(
{ status: "A" } ,
{ $inc: { age: 3 } },
{ multi: true }
)
SQL Delete Statements MongoDB Equivalent
DELETE FROM users
WHERE status = "D"
db.users.remove( { status: "D" }
)
DELETE FROM users db.users.remove( )
Fully Features Queries
MongoDB – Rolling Upgrades, No Downtime
Replica Sets
Promotes as the Primary after syncing up.
Other members can now perform upgrades
as well.
All upgrades/maintenance work completed
without any downtime.
Perform upgrades
Primary
Secondary
Legends
Aggregation Framework
Other Interesting Features
 Capped Collections
– similar to circular buffers
 Tailable Cursors
– similar to unix tail -f
 TTL (Time To Live) for Collection
– remove data after specified time
– { expireAfterSeconds: n }
 Write Concerns
 findAndModify()
– atomically modifies and returns doc
 Read preference
– primary
– primaryPreferred
– secondary
– secondaryPreferred
– nearest
 Text Search
– tokenizes and stems the search terms
– assigns scores
Schema Design Criteria
How can we manipulate data ?
Dynamic Queries
Secondary indexes
Atomic updates
Map Reduce
Access Patterns :
Read/Write Ratio
Types of Updates
Types of Queries
Data life cycle
A simple start …
Map the documents to your Application
book = { author : “srinivas”,
date : new Date(),
text : “Dbversity best practices”
tags : [ “database” , “technology” ] }
> db.books.save(book)
Thank You !

More Related Content

Similar to DBVersity MongoDB Online Training Presentations

MongoDB - An Introduction
MongoDB - An IntroductionMongoDB - An Introduction
MongoDB - An Introductionsethfloydjr
 
Mongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMohan Rathour
 
2016 feb-23 pyugre-py_mongo
2016 feb-23 pyugre-py_mongo2016 feb-23 pyugre-py_mongo
2016 feb-23 pyugre-py_mongoMichael Bright
 
Using MongoDB and Python
Using MongoDB and PythonUsing MongoDB and Python
Using MongoDB and PythonMike Bright
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataGruter
 
Intro to mongo db
Intro to mongo dbIntro to mongo db
Intro to mongo dbChi Lee
 
Couchbase - Yet Another Introduction
Couchbase - Yet Another IntroductionCouchbase - Yet Another Introduction
Couchbase - Yet Another IntroductionKelum Senanayake
 
MongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo SeattleMongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo SeattleMongoDB
 
MongoDB - An Introduction
MongoDB - An IntroductionMongoDB - An Introduction
MongoDB - An Introductiondinkar thakur
 
MongoDB Knowledge share
MongoDB Knowledge shareMongoDB Knowledge share
MongoDB Knowledge shareMr Kyaing
 
About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"Jihyun Ahn
 
Introduction to MongoDB and its best practices
Introduction to MongoDB and its best practicesIntroduction to MongoDB and its best practices
Introduction to MongoDB and its best practicesAshishRathore72
 
Introduction to MongoDB and Workshop
Introduction to MongoDB and WorkshopIntroduction to MongoDB and Workshop
Introduction to MongoDB and WorkshopAhmedabadJavaMeetup
 

Similar to DBVersity MongoDB Online Training Presentations (20)

MongoDB - An Introduction
MongoDB - An IntroductionMongoDB - An Introduction
MongoDB - An Introduction
 
Mongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorial
 
2016 feb-23 pyugre-py_mongo
2016 feb-23 pyugre-py_mongo2016 feb-23 pyugre-py_mongo
2016 feb-23 pyugre-py_mongo
 
Using MongoDB and Python
Using MongoDB and PythonUsing MongoDB and Python
Using MongoDB and Python
 
No SQL and MongoDB - Hyderabad Scalability Meetup
No SQL and MongoDB - Hyderabad Scalability MeetupNo SQL and MongoDB - Hyderabad Scalability Meetup
No SQL and MongoDB - Hyderabad Scalability Meetup
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
 
MongoDB @ fliptop
MongoDB @ fliptopMongoDB @ fliptop
MongoDB @ fliptop
 
Intro to mongo db
Intro to mongo dbIntro to mongo db
Intro to mongo db
 
Couchbase - Yet Another Introduction
Couchbase - Yet Another IntroductionCouchbase - Yet Another Introduction
Couchbase - Yet Another Introduction
 
MongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo SeattleMongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo Seattle
 
MongoDB - An Introduction
MongoDB - An IntroductionMongoDB - An Introduction
MongoDB - An Introduction
 
MongoDB
MongoDBMongoDB
MongoDB
 
No sql - { If and Else }
No sql - { If and Else }No sql - { If and Else }
No sql - { If and Else }
 
Mongodb
MongodbMongodb
Mongodb
 
MongoDB Knowledge share
MongoDB Knowledge shareMongoDB Knowledge share
MongoDB Knowledge share
 
About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"
 
Introduction to MongoDB and its best practices
Introduction to MongoDB and its best practicesIntroduction to MongoDB and its best practices
Introduction to MongoDB and its best practices
 
Introduction to MongoDB and Workshop
Introduction to MongoDB and WorkshopIntroduction to MongoDB and Workshop
Introduction to MongoDB and Workshop
 
MongoDB 3.0
MongoDB 3.0 MongoDB 3.0
MongoDB 3.0
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 

Recently uploaded

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Recently uploaded (20)

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

DBVersity MongoDB Online Training Presentations

  • 2. What is MongoDB ? Non Relational > Alternative to traditional RDBMS as workload and data volume needs change Document Oriented Database > Simple human readable JSON Data model Dynamic Schemas > Flexible schemas, agile development, developer friendly Built-in Replication for High Availability > Provides data redundancy; automatically recovers from node failures Built-in Sharding for Horizontal Scalability > Auto partition; supports very large data sets and high throughput operations High Performance > 5X – 10X times faster than traditional RDMBS
  • 3. Document Oriented Database What it does not mean: What it does means: - Store “JSON” Objects - JSON stands for JavaScript Object Notation - JSON is lightweight data interchange format similar to XML - JSON is language independent - JSON is "self-describing" and easy to understand - Example of simple JSON Document: { “firstName” : “dbversity”, “lastName” : “.com”, “deals” : [“technology”, “certifications”] }
  • 4. Documents = Rows in RDBMS > Store entire document directly into MongoDB rather than breaking data into multiple tables > Store documents of varying types in the same collection (Polymorphism) What can we do with documents ? Remember: Documents are similar to Rows in RDBMS { “firstName” : “dbversity”, “lastName” : “.com”, “DBs” : [“MongoDB”, “MariaDB”] }
  • 5. Documents = Rows in RDBMS What can we do with documents ? Remember: Documents are similar to Rows in RDBMS > Embed documents within documents > Embed arrays within documents
  • 6. NoSQL Document Oriented DB with Dynamic Schemas RDBMS MongoDB
  • 7. Features of Dynamic Schemas Data Model can evolve easily Faster Time To Market Performance can be delivered at scale as it reduces the need for joins & disk seeks
  • 8. RDBMS Terms/Concepts MongoDB Terms/Concepts Database Database Table Collection Row Document or BSON Document Column Field Index Index Table Joins Embedded documents and linking Primary key Specify any unique column or column combination as primary key. Primary key In MongoDB, the primary key is automatically set to the _id field. Aggregation (e.g. group by) Aggregation pipeline Comparing RDBMS and MongoDB Terms
  • 9.  Automatic Replication  Read Scalability  Data redundancy Business Needs Replica Set Benefits High Availability Automatic failover Disaster Recovery Hot backups offsite Maintenance Rolling Upgrades Low Latency Locate data near users Workload Isolation Read from non-primary replicas Data Privacy Res data to physical location Data Consistency Tunable Consistency Built in Replication for HA and Failover Availability
  • 10. Replica Sets MongoDBDriver Case 1 – All Nodes in the replica set are up and running Case 2 – Primary goes down, request automatic failover to Secondary SecondaryPrimary Secondary Primary Key Benefits  No downtime required for • Maintenance • Upgrade`s • Node Failures • Data Center outages  Load balance read requests End user is unaware of any failures and is able to get results even If the primary goes down MongoDB automatically chooses a new Primary 10
  • 11. Scalability Built in Sharding for Horizontal Scalability  Automatically partitions data  Write Scalability  Multiple parallel writes Q - Z G - P A - F MongoS Business Benefits due to Sharding Increases or decrease capacity as you go Automatic balancing Three types: Hash-based, Range-based, Tag-aware
  • 12. Sharding • Sharding divides data and distributes the it over multiple servers, or shards. Each shard is an independent database, and collectively, the shards make up a single logical database. • Sharding reduces the number of operations each shard handles. Each shard processes fewer operations as the cluster grows. As a result, a cluster can increase capacity and throughput horizontally. For example, to insert data, the application only needs to access the shard responsible for that record. • Sharding reduces the amount of data that each server needs to store. Each shard stores less data as the cluster grows. For example, if a database has a 1 terabyte data set, and there are 4 shards, then each shard might hold only 256GB of data. If there are 40 shards, then each shard might hold only 25GB of data.
  • 13. Sharding mongos mongos Shard 1 Shard 2 Shard 3 Shard 4 MongoDB automatically chunk and migrate documents based upon the shard key to balance the data Distribution across the cluster Mongos routes queries only to the shards that can satisfy the query End User Sharding distributes IO workload for read and write scalability Key Range 0..25 Key Range 26..40 Key Range 41..75 Key Range 76..120 Key Benefits  Near-linear performance scaling for reads and writes  No need to manually define the key range of each shard  Automatic data redistribution and balancing  Supports high user concurrency READS/QUERIES • By Shard Key: Routed • By non-Shard key: Scatter gather WRITES • Inserts: Requires Shard Key, routed • Remove: Routed or Scattered • Update: Routed or Scattered Key Range 0..40 Key Range 41..120 13
  • 14. Sharded Configuration Config servers store cluster’s metadata Shards store the data. They provide high availability and data consistency. in a production sharded cluster, each shard is a replica set. Query Routers, or mongos instances, interface with client applications and direct operations to the appropriate shard or shards. A client sends requests to one query router. Most sharded clusters have many query routers.
  • 15. MongoDB Architecture Primary Secondary A Primary Secondary A Primary Secondary A Primary Secondary A Config1 Config2 Config3 Router-1 Router-2 Router-3 Router-N Driver Application Shard-1 Shard-2 Shard-3 Shard-N - - - - - - - - - - Replica Set mongos mongo d mongo d Redundancy of Data Automatic Failover Read Scalability (Distributed Reads) Automatic Leader Election Automatically Partitions Data Write Scalability (Distributed writes) Aggregates queries across shards Can have 1 or as much needed and are lightweight processes Stores Meta Data System is up as long as 1/3 are up If anyone is down, Metadata goes read-only
  • 16. Sharding Types Range Based Sharding Hash Based Sharding Tag Aware Sharding
  • 17. Sharding Types Hash-based Sharding :- db.collection.createIndex( { _id: "hashed" } ) Tag Aware sharding : sh.addShardTag("shard0000", "NYC") sh.addShardTag("shard0002", "SFO") sh.addShardTag("shard0002", "NRT") sh.addTagRange("records.users", { zipcode: "10001" }, { zipcode: "10281" }, "NYC") sh.addTagRange("records.users", { zipcode: "11201" }, { zipcode: "11240" }, “SFO") sh.addTagRange("records.users", { zipcode: "94102" }, { zipcode: "94135" }, “NRT")
  • 18. What is Chunk ? A chunk is a contiguous range of data from a particular collection. Chunks are described as a triple of ”Collection, minKey, and maxKey”. Thus, the shard key K of a given document assigns that document to the chunk where minKey <= K <= maxKey. Chunks default to 64 MB/ 1,00,000 Objects. If a Chunk gets too large ( >64 MB/chunk), it split into two new chunks. Splitting happens on the basis of Median of the chunks number. When sorting is specified, the relevant shards sort locally (in the Shards itself. ), and then mongos merges the results. Thus the mongos resource usage is not terribly high.
  • 20. RDBMS v/s MongoDB RDBMS Data Definition 1. Defining tables and columns. 2. Create relationships. 3. Define data types. Data Integrity 1. Data Normalization 2. Maintain Referential Integrity 3. Validating Data DataAnalysis 1. CREATE DATABASE (Typically done by DBA – more co-ordination required) 2. CREATE TABLE (knowledge of specific columns needed before creating tables) 3. INSERT Data Bond Data - INSERT INTO BOND (b1, b2..) values (val_b1,val_b2…) FX Data - INSERT INTO FX (f1, f2..) values (val_fx1,val_fx2…) Development (1stIteration) New Data 1. ALTER TABLE <new columns> to the tables 2. Ensure that new columns do not break the existing code such as application Code, Stored Procedures etc. 3. Need to obtain a maintenance window for adding new columns as it will most probably lock the entire table. 4. One-Many or Many-Many relation needs multiple tables Development (2nd,3rd...Iteration) MongoDB None to Minimal 1. use mydb 2. INSERT Data Bond Data - db.mytable.insert ( {b1:val_b1, b2:val_b2 } ) FX Data - db.mytable.insert ( {f1:val_fx1, f2:val_fx2 } ) (creates the database and tables if not present) None
  • 21. Compare Create, Alter, Drop SQL Schema Statements MongoDB Schema Statements CREATE TABLE users ( id MEDIUMINT NOT NULL AUTO_INCREMENT, user_id Varchar(30), age Number, status char(1), PRIMARY KEY (id) ) [Explicitly create a collection] db.createCollection("users") OR [Implicitly created on first insert() operation] db.users.insert( { user_id: "abc123", age: 55, status: "A" } ) ALTER TABLE users ADD join_date DATETIME db.users.update( { }, { $set: { join_date: new Date() } }, { multi: true } ) ALTER TABLE users DROP COLUMN join_date db.users.update( { }, { $unset: { join_date: "" } }, { multi: true } ) CREATE INDEX idx_user_id_asc_age_desc ON users(user_id, age DESC) db.users.ensureIndex( { user_id: 1, age: -1 } ) DROP TABLE users db.users.drop()
  • 22. Compare Selects SQL SELECT Statements MongoDB Equivalent SELECT * FROM users db.users.find() SELECT id, user_id, status FROM users db.users.find( { }, { user_id: 1, status: 1 } ) SELECT user_id, status FROM users WHERE status = "A" db.users.find( { status: "A" }, { user_id: 1, status: 1, _id: 0 } ) SELECT * FROM users WHERE status != "A" db.users.find( { status: { $ne: "A" } } ) SELECT * FROM users WHERE status = "A" AND age = 50 db.users.find( { status: "A", age: 50 } ) SELECT * FROM users WHERE status = "A" OR age = 50 db.users.find( { $or: [ { status: "A" } , { age: 50 } ] } ) SQL SELECT Statements MongoDB Equivalent SELECT * FROM users WHERE user_id like "bc%" db.users.find( { user_id: /^bc/ } ) SELECT * FROM users WHERE status = "A" ORDER BY user_id ASC db.users.find( { status: "A" } ).sort( { user_id: 1 } ) SELECT COUNT(*) FROM users db.users.count() or db.users.find().count() SELECT DISTINCT(status) FROM users db.users.distinct( "status" ) SELECT * FROM users LIMIT 5 SKIP 10 db.users.find().limit(5).skip(10) EXPLAIN SELECT * FROM users WHERE status = "A" db.users.find( { status: "A" } ).explain()
  • 23. Compare Insert, Delete, Updates SQL INSERT Statements MongoDB Equivalent INSERT INTO users(user_id, age, status) VALUES ("bcd001", 45, "A") db.users.insert( { user_id: "bcd001", age: 45, status: "A" } ) SQL Update Statements MongoDB Equivalent UPDATE users SET status = "C" WHERE age > 25 db.users.update( { age: { $gt: 25 } }, { $set: { status: "C" } }, { multi: true } ) UPDATE users SET age = age + 3 WHERE status = "A" db.users.update( { status: "A" } , { $inc: { age: 3 } }, { multi: true } ) SQL Delete Statements MongoDB Equivalent DELETE FROM users WHERE status = "D" db.users.remove( { status: "D" } ) DELETE FROM users db.users.remove( )
  • 25. MongoDB – Rolling Upgrades, No Downtime Replica Sets Promotes as the Primary after syncing up. Other members can now perform upgrades as well. All upgrades/maintenance work completed without any downtime. Perform upgrades Primary Secondary Legends
  • 27. Other Interesting Features  Capped Collections – similar to circular buffers  Tailable Cursors – similar to unix tail -f  TTL (Time To Live) for Collection – remove data after specified time – { expireAfterSeconds: n }  Write Concerns  findAndModify() – atomically modifies and returns doc  Read preference – primary – primaryPreferred – secondary – secondaryPreferred – nearest  Text Search – tokenizes and stems the search terms – assigns scores
  • 28. Schema Design Criteria How can we manipulate data ? Dynamic Queries Secondary indexes Atomic updates Map Reduce Access Patterns : Read/Write Ratio Types of Updates Types of Queries Data life cycle
  • 29. A simple start … Map the documents to your Application book = { author : “srinivas”, date : new Date(), text : “Dbversity best practices” tags : [ “database” , “technology” ] } > db.books.save(book)
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.