2. What is MongoDB ?
Non Relational
> Alternative to traditional RDBMS as workload and data volume needs
change
Document Oriented Database
> Simple human readable JSON Data model
Dynamic Schemas
> Flexible schemas, agile development, developer friendly
Built-in Replication for High Availability
> Provides data redundancy; automatically recovers from node failures
Built-in Sharding for Horizontal Scalability
> Auto partition; supports very large data sets and high throughput
operations
High Performance
> 5X – 10X times faster than traditional RDMBS
3. Document Oriented Database
What it does not mean:
What it does means:
- Store “JSON” Objects
- JSON stands for JavaScript Object Notation
- JSON is lightweight data interchange format similar to
XML
- JSON is language independent
- JSON is "self-describing" and easy to understand
- Example of simple JSON Document:
{
“firstName” : “dbversity”,
“lastName” : “.com”,
“deals” : [“technology”, “certifications”]
}
4. Documents = Rows in RDBMS
> Store entire document directly into MongoDB rather than breaking data into multiple tables
> Store documents of varying types in the same collection (Polymorphism)
What can we do with documents ?
Remember: Documents are similar to Rows in RDBMS
{
“firstName” : “dbversity”,
“lastName” : “.com”,
“DBs” : [“MongoDB”, “MariaDB”]
}
5. Documents = Rows in RDBMS
What can we do with documents ?
Remember: Documents are similar to Rows in RDBMS
> Embed documents within documents > Embed arrays within documents
7. Features of Dynamic Schemas
Data Model can evolve easily
Faster Time To Market
Performance can be delivered at scale as it reduces the need for
joins & disk seeks
8. RDBMS Terms/Concepts MongoDB Terms/Concepts
Database Database
Table Collection
Row Document or BSON Document
Column Field
Index Index
Table Joins Embedded documents and linking
Primary key
Specify any unique column or column
combination as primary key.
Primary key
In MongoDB, the primary key is automatically set to
the _id field.
Aggregation (e.g. group by) Aggregation pipeline
Comparing RDBMS and MongoDB Terms
9. Automatic Replication
Read Scalability
Data redundancy
Business Needs Replica Set Benefits
High Availability Automatic failover
Disaster Recovery Hot backups offsite
Maintenance Rolling Upgrades
Low Latency Locate data near users
Workload Isolation Read from non-primary replicas
Data Privacy Res data to physical location
Data Consistency Tunable Consistency
Built in Replication for
HA and Failover
Availability
10. Replica Sets
MongoDBDriver
Case 1 – All Nodes in the replica set are up and
running
Case 2 – Primary goes down, request automatic
failover to Secondary
SecondaryPrimary
Secondary
Primary
Key Benefits
No downtime required for
• Maintenance
• Upgrade`s
• Node Failures
• Data Center outages
Load balance read requests
End user is unaware
of any failures and is
able to get results even
If the primary goes down
MongoDB
automatically
chooses a new
Primary
10
11. Scalability
Built in Sharding for
Horizontal Scalability
Automatically partitions data
Write Scalability
Multiple parallel writes
Q - Z
G - P
A - F
MongoS
Business Benefits due to Sharding
Increases or decrease capacity as you go
Automatic balancing
Three types: Hash-based, Range-based, Tag-aware
12. Sharding
• Sharding divides data and distributes the it
over multiple servers, or shards. Each shard
is an independent database, and collectively,
the shards make up a single logical database.
• Sharding reduces the number of operations
each shard handles. Each shard processes
fewer operations as the cluster grows. As a
result, a cluster can increase capacity and
throughput horizontally.
For example, to insert data, the application
only needs to access the shard responsible
for that record.
• Sharding reduces the amount of data that
each server needs to store. Each shard stores
less data as the cluster grows.
For example, if a database has a 1 terabyte
data set, and there are 4 shards, then each
shard might hold only 256GB of data. If there
are 40 shards, then each shard might hold
only 25GB of data.
13. Sharding
mongos mongos
Shard 1 Shard 2 Shard 3 Shard 4
MongoDB automatically
chunk and migrate
documents based
upon the
shard key to
balance the data
Distribution across the
cluster
Mongos routes queries
only to the shards
that can satisfy
the query
End User
Sharding distributes IO workload for read and write scalability
Key Range
0..25
Key Range
26..40
Key Range
41..75
Key Range
76..120
Key Benefits
Near-linear
performance scaling for
reads and writes
No need to manually
define the key range of
each shard
Automatic data
redistribution and
balancing
Supports high user
concurrency
READS/QUERIES
• By Shard Key: Routed
• By non-Shard key: Scatter gather
WRITES
• Inserts: Requires Shard Key, routed
• Remove: Routed or Scattered
• Update: Routed or Scattered
Key Range
0..40
Key Range
41..120
13
14. Sharded Configuration
Config servers
store cluster’s
metadata
Shards store the data.
They provide high availability and
data consistency.
in a production sharded cluster,
each shard is a replica set.
Query Routers, or mongos instances,
interface with client applications
and direct operations to the appropriate
shard or shards.
A client sends requests to one query
router.
Most sharded clusters have many query
routers.
15. MongoDB Architecture
Primary Secondary
A
Primary Secondary
A
Primary Secondary
A
Primary Secondary
A
Config1 Config2 Config3
Router-1 Router-2 Router-3 Router-N
Driver
Application
Shard-1 Shard-2 Shard-3 Shard-N
- - - - -
- - - - -
Replica Set
mongos
mongo
d
mongo
d Redundancy of Data
Automatic Failover
Read Scalability (Distributed Reads)
Automatic Leader Election
Automatically Partitions Data
Write Scalability (Distributed writes)
Aggregates queries across shards
Can have 1 or as much needed and are lightweight processes
Stores Meta Data
System is up as long as 1/3 are up
If anyone is down, Metadata goes read-only
18. What is Chunk ?
A chunk is a contiguous range of data from a particular collection.
Chunks are described as a triple of ”Collection, minKey, and maxKey”.
Thus, the shard key K of a given document assigns that document to the
chunk where
minKey <= K <= maxKey.
Chunks default to 64 MB/ 1,00,000 Objects.
If a Chunk gets too large ( >64 MB/chunk), it split into two new chunks.
Splitting happens on the basis of Median of the chunks number.
When sorting is specified, the relevant shards sort locally (in the Shards
itself. ), and then mongos merges the results. Thus the mongos resource
usage is not terribly high.
20. RDBMS v/s MongoDB
RDBMS
Data Definition
1. Defining tables and columns.
2. Create relationships.
3. Define data types.
Data Integrity
1. Data Normalization
2. Maintain Referential Integrity
3. Validating Data
DataAnalysis
1. CREATE DATABASE
(Typically done by DBA – more co-ordination required)
2. CREATE TABLE (knowledge of specific columns
needed before creating tables)
3. INSERT Data
Bond Data
- INSERT INTO BOND (b1, b2..) values
(val_b1,val_b2…)
FX Data
- INSERT INTO FX (f1, f2..) values (val_fx1,val_fx2…)
Development
(1stIteration)
New Data
1. ALTER TABLE <new columns> to the tables
2. Ensure that new columns do not break the existing
code such as application Code, Stored Procedures etc.
3. Need to obtain a maintenance window for adding new
columns as it will most probably lock the entire table.
4. One-Many or Many-Many relation needs multiple tables
Development
(2nd,3rd...Iteration)
MongoDB
None to Minimal
1. use mydb
2. INSERT Data
Bond Data
- db.mytable.insert ( {b1:val_b1, b2:val_b2 } )
FX Data
- db.mytable.insert ( {f1:val_fx1, f2:val_fx2 } )
(creates the database and tables if not present)
None
21. Compare Create, Alter, Drop
SQL Schema Statements MongoDB Schema Statements
CREATE TABLE users (
id MEDIUMINT NOT NULL
AUTO_INCREMENT,
user_id Varchar(30),
age Number,
status char(1),
PRIMARY KEY (id)
)
[Explicitly create a collection]
db.createCollection("users")
OR
[Implicitly created on first insert() operation]
db.users.insert( {
user_id: "abc123",
age: 55,
status: "A"
} )
ALTER TABLE users
ADD join_date DATETIME
db.users.update(
{ },
{ $set: { join_date: new Date() } },
{ multi: true }
)
ALTER TABLE users
DROP COLUMN join_date
db.users.update(
{ },
{ $unset: { join_date: "" } },
{ multi: true }
)
CREATE INDEX
idx_user_id_asc_age_desc
ON users(user_id, age DESC)
db.users.ensureIndex( { user_id: 1, age: -1 } )
DROP TABLE users db.users.drop()
22. Compare Selects
SQL SELECT Statements MongoDB Equivalent
SELECT *
FROM users
db.users.find()
SELECT id, user_id, status
FROM users
db.users.find(
{ },
{ user_id: 1, status: 1 }
)
SELECT user_id, status
FROM users
WHERE status = "A"
db.users.find(
{ status: "A" },
{ user_id: 1, status: 1, _id: 0 }
)
SELECT *
FROM users
WHERE status != "A"
db.users.find(
{ status: { $ne: "A" } }
)
SELECT *
FROM users
WHERE status = "A"
AND age = 50
db.users.find(
{ status: "A",
age: 50 }
)
SELECT *
FROM users
WHERE status = "A"
OR age = 50
db.users.find(
{ $or: [ { status: "A" } ,
{ age: 50 } ] }
)
SQL SELECT Statements MongoDB Equivalent
SELECT *
FROM users
WHERE user_id like "bc%"
db.users.find(
{ user_id: /^bc/ }
)
SELECT *
FROM users
WHERE status = "A"
ORDER BY user_id ASC
db.users.find( { status: "A" }
).sort( { user_id: 1 } )
SELECT COUNT(*)
FROM users
db.users.count()
or
db.users.find().count()
SELECT DISTINCT(status)
FROM users
db.users.distinct( "status" )
SELECT *
FROM users
LIMIT 5
SKIP 10
db.users.find().limit(5).skip(10)
EXPLAIN SELECT *
FROM users WHERE status = "A"
db.users.find( { status: "A" }
).explain()
23. Compare Insert, Delete, Updates
SQL INSERT Statements MongoDB Equivalent
INSERT INTO users(user_id,
age,
status)
VALUES ("bcd001",
45,
"A")
db.users.insert( {
user_id: "bcd001",
age: 45,
status: "A"
} )
SQL Update Statements MongoDB Equivalent
UPDATE users
SET status = "C"
WHERE age > 25
db.users.update(
{ age: { $gt: 25 } },
{ $set: { status: "C" } },
{ multi: true }
)
UPDATE users
SET age = age + 3
WHERE status = "A"
db.users.update(
{ status: "A" } ,
{ $inc: { age: 3 } },
{ multi: true }
)
SQL Delete Statements MongoDB Equivalent
DELETE FROM users
WHERE status = "D"
db.users.remove( { status: "D" }
)
DELETE FROM users db.users.remove( )
25. MongoDB – Rolling Upgrades, No Downtime
Replica Sets
Promotes as the Primary after syncing up.
Other members can now perform upgrades
as well.
All upgrades/maintenance work completed
without any downtime.
Perform upgrades
Primary
Secondary
Legends
27. Other Interesting Features
Capped Collections
– similar to circular buffers
Tailable Cursors
– similar to unix tail -f
TTL (Time To Live) for Collection
– remove data after specified time
– { expireAfterSeconds: n }
Write Concerns
findAndModify()
– atomically modifies and returns doc
Read preference
– primary
– primaryPreferred
– secondary
– secondaryPreferred
– nearest
Text Search
– tokenizes and stems the search terms
– assigns scores
28. Schema Design Criteria
How can we manipulate data ?
Dynamic Queries
Secondary indexes
Atomic updates
Map Reduce
Access Patterns :
Read/Write Ratio
Types of Updates
Types of Queries
Data life cycle
29. A simple start …
Map the documents to your Application
book = { author : “srinivas”,
date : new Date(),
text : “Dbversity best practices”
tags : [ “database” , “technology” ] }
> db.books.save(book)