MongoDB Concepts

Concepts of
Juan Antonio Roy Couto
Twitter: @juanroycouto
Website: www.juanroy.es
September 2014

Juan Antonio Roy Couto 2
Concepts
Contents
Why?
Characteristics
Who?
DB Ranking
Shell Drivers
Utilities
Community
Terms
Failover Replication Schema design
Replica Set
Indexes
Sharding
Pre-splitting Questions?

Apps
● Horizontal scalability
● Real time analytics
● Better strategic decisions
Internet of Things
Wearables
Smartcities
Cloud computing
● Non structured data
● Reduce costs and time to
market
Concepts
Why?
MongoDB
● Faster development

Concepts
Who provides MongoDB in the cloud?
http://www.mongodb.com/partners/list
Who is using MongoDB?
http://www.mongodb.com/who-uses-mongodb
Who?

Concepts
DB Ranking
http://db-engines.com/en/ranking

Concepts
Community
8 Million +
Downloads
200k+
Education Registrations
30k+
MongoDB User Group Members

Concepts
Drivers
http://docs.mongodb.org/ecosystem/drivers/
Driver
MongoDB
● C
● C++
● C#
● Java
● Node.js
● Perl
● PHP
● Python
● Ruby
● Scala
App

Concepts
Characteristics
http://www.mongodbspain.com/en/2014/08/17/mongodb-characteristics-future/
General purpose NoSQL database Native replication
Document oriented (stores data as
documents in BSON – Binary JSON) Auto sharding & load balancing
Schemaless (dynamic schema) Security
Open source Automatic failover
High availability (replica sets) JSON objects
Horizontal scalability (commodity
servers) MMS (continuous monitoring in the cloud)
Aggregation framework Geospatial queries
Map Reduce In-memory performance
Hadoop connector (for processing large
volumes of data in batch) ACID compliant at the document level

Concepts
Advanced characteristics
Chunk 1
Chunk 2
Chunk 3
GridFS
TTL (special indexes that
MongoDB can use to
automatically remove
documents from a collection
after a certain amount of
time)
Capped collections
Index intersection
...

Concepts
Shell
MongoDB
● Administrative tasks
● Full featured
● Javascript interpreter
● Standalone MongoDB client
● Allows interaction with a MongoDB instance from the
command line

mmoonnggooeexxppoorrtt mongoimport mongodump mongorestore mongoexport Utility that generates a JSON or CSV file of data from a MongoDB instance
Imports content from a JSON, CSV or TSV export
Utility for creating a binary export
Writes data to a MongoDB instance from a binary file
Concepts
Utilities
MongoDB tools for backup:
MongoDB tools for tracking instances:
mongostat Provides a quick overview of the status of a running mongod or mongos
instance
mongotop
Provides a method to track the amount of time a MongoDB instance spends
reading and writing data. mongotop provides statistics on a per-collection level.
By default, mongotop returns values every second

Concepts
Basic terms to know
MongoDB SQL
database database
collection table
document row
field column
embedding join

Geospatial indexes
MongoDB has two types of indexes
for supporting geographical queries.
● 2d indexes: for calculations on a
flat surface
● 2dsphere indexes: for
calculations on a earth-like
sphere

Tables
Customers Addresses
Concepts
SQL Schema Design
Customer key
First name
Last name
Phone number
Address key
Customer key
Street
Number
Location
Postal Code
Pets
Pet key
Customer key
Type
Breed
Name
Age

Customers collection
Customer info Addresses
Concepts
MongoDB Schema Design
> db.customers.findOne()
{
"_id" : ObjectId("54131863041cd2e6181156ba"),
"first_name" : "Peter",
"last_name" : "Keil",
"phone_number" : 619123456,
"address" : {
"street" : "C/Alcalá",
"number" : 123,
"location" : "Madrid",
"postal_code" : 12345
},
"pets" : [
{
"type" : "Dog",
"breed" : "Airedale Terrier",
"name" : "Linda",
"age" : 2
},
{
"type" : "Dog",
"breed" : "Akita",
"name" : "Bruto",
"age" : 10
}
]
}
>
First name
Last name
Phone number
Street
Number
Location
Postal Code
Type
Breed
Name
Age
Type
Breed
Name
Age
Pets

Replica Set ● High availability
Concepts
Replication
Primary
Secondary 1
Secondary 2
● Data safety
● Read preference
● Asynchronus
● Single primary
● Statement based
● Master-slave
● Automatic failover
● Automatic node recovery

Replica Set
Concepts
Failover scenario
Replica Set
Primary
Secondary 1
Secondary 2
Secondary 2
Primary
Secondary 1
1) Primary goes
down
2) New election
(majority of the
set)
3) Primary comes
back (now as
secondary)
4) The new primary
assumes
replication tasks

Replica Set
Concepts
Failover scenario with rollback
Replica Set
Primary
Secondary 1
Secondary 2
Secondary 2
Primary
Secondary 1
Rollback
Hard Disk
mongorestore

Concepts
Replica Set principles
● Write is truly
committed
upon
application at
the majority of
the set

Concepts
Replica Set: read preference
Reasons
Geography dispersed
nodes
Separate a work load
Availability
Types
Primary
Primary preferred
Secondary
Secondary preferred
Nearest
Tags

Shard 2
Shard N-1
Concepts
Sharding
Shard 0
Secondary
Secondary
Primary
Shard 1
Secondary
Secondary
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Primary
Config server
Config server
Config server
Query router Query router
...
Client Client Client
CLUSTER

Sharding: concepts
Sharding concepts
Data are uniformely distributed across the
shards using the shard key
Each shard allocates those documents that
belongs to its own range
Sharding improves efficiency and, therefore,
the performance because queries are routed
only to the shards in where our data resides

Sharding: metadata
The config servers allocates the config database which contains the cluster metadata
Metadata describes what is in the cluster, what is contained in the shards
It is a map of the data itself
Range-based partitioning
Shard key:
lastname Low High Shard
Range 0 Martín Pérez 0
Range 1 Pérez Rodriguez 1

Sharding: chunks, split and migrate
Chunk Split Migrate
Range data subset Runs in background Runs in background
Aproximately 1 chunk per 60MB
When a chunk grows beyond
60MB it will be splitted in two
equal chunks
It will move the
chunks across the
shards in order to
achieve the balance
The MongoDB goal is to achieve a uniform data distribution
across all the shards
MongoDB balances the number of chunks pers shard (nor
documents nor bytes)
By default all collections belong to shard 0
An empty collection has only one chunk (shard 0)

Sharding: chunks, split and migrate (2)
mongos
Shard 0
chunk 0
chunk 0
chunk 1
Shard 1

Pre-splitting
 Utilized in batch/bulk loads
 Split and migration do not work
 Metadata are not altered
 Data are stored automatically in its
shard
Shard 0
Shard 1
Shard 2
mongos
data
data
data

Summary
Designed to be:
● Fast (no joins, in-memory performance),
● Flexible (schemaless),
● Scalable (horizontal vs vertical),
● Easy to learn
Designed to:
● Reduce administrative tasks (replica set, sharding, disaster recovery)
With powerful:
● Analysis tools (aggregation framework, map reduce, hadoop
connector),
● Characteristics such as geospatial indexes, GridFS, etc.

Questions?
Any questions?

Concepts
Thank you for your attention!
Juan Antonio Roy Couto
Email: juanroycouto@gmail.com September 2014

MongoDB Concepts

More Related Content

What's hot

Similar to MongoDB Concepts

Recently uploaded

MongoDB Concepts

Editor's Notes