MongoDB“Walking on water and developing software from a
specification are easy if both are frozen.”
BY :
NklMish
@nklmish
Why & What?
Features
Aggregation framework overview
Sharding and Replica Set
Summary
Agenda
This talk is not about :
NoSql
Relational vs Non-Relational
Comparison of other NoSql flavors
NOTE
“One size fits all” approach no longer applies
NoSql Database
SchemaLess != NoSchema
Document based Approach
Non-Relational Dbs scale more easily, especially
horizontally
In a nutshell focus on speed, performance, flexibility and
scalability.
What is MongoDB?
“Nothing endures but change”
Agility.
High Scalability => horizontal => thousand of nodes or clouds or across multiple data centers
Rich Indexing
Real life example:
Server Density
OTTO
Expedia, Forbes, MetLife,Bosch, etc.
Why?
“Luck is not a factor. Hope is not a strategy. Fear is not an option”
RDBMS
Tables
Records/rows
Queries return
records
Mapping
“You improvise. You adapt. You overcome.”
MongoDB
Collections
Documents/objects
Queries return a
cursor ???? Because
of performance,
efficiency
Standard Db Features
Docs are stored in BSON => Mongo understands JSON
natively => Any Valid JSON can be imported and
queried(E.g. mongoimport -f foo.json).
Map Reduce
Aggregation Framework
GridFS (for Efficient binary large objects )
GeoNear
Features
“Stable Velocity. Sustainable Pace.”
$match – filter docs
$project – reshape docs
$group – Summarize docs
$unwind – Expand docs
$sort – Order docs
$limit/$skip – paginate docs
$redact – Restrict docs
$geoNear – Proximity sort docs
$let, $map – Define variables
Aggregation In Nutshell
“Talk is cheap. Show me the code”
{
name : “Java”,
price : 250,
Type : “ebook”
}
{
name : “Php”,
price : 200,
Type : “ebook”
}
{
name : “Javascript”,
price : 150,
Type : “hardCopy”
}
Matching
{$match : {
Type : “ebook”
}}
{
name : “Java”,
price : 200,
Type : “ebook”
}
{
name : “Php”,
price : 200,
Type : “ebook”
}
{
name : “Java”,
price : 250,
Type : “ebook”
}
{
name : “Php”,
price : 200,
Type : “ebook”
}
{
name : “Javascript”,
price : 150,
Type : “hardCopy”
}
Query Operator
{$match : {
price : {$gt : 200}
}}
{
name : “Java”,
price : 250,
Type : “ebook”
}
{
name : “Php”,
price : 200,
Type : “ebook”
}
{
name : “Java”,
price : 250,
Type : “ebook”
}
{
name : “Php”,
price : 200,
Type : “ebook”
}
Including Excluding part of document
{$project : {
name :1,
price : 1,
Type : 0
}}
{
name : “Java”,
price : 200”
}
{
name : “Php”,
price : 200
}
{
name : “Java”,
price : 250,
Type : “ebook”,
quantity : 3
}
{
name : “Php”,
price : 200,
Type : “ebook”,
quantity : 2
}
Custom Field Computation
{$project : {
fullStock : {
$mul : [“$price”, “$quantity”]
},
Title : “$name”
}}
{
fullStock : 750,
Title: “Java”
}
{
price : 400,
Title: “Php”
}
{
name : “Java”,
price : 250,
Type : “ebook”,
quantity : 3
}
{
name : “Php”,
price : 200,
Type : “ebook”,
quantity : 2
}
Generating Sub-Document
{$project : {
fullStock : {
$mul : [“$price”, “$quantity”]
},
details : {Title : “$name”, quantity : “$quantity” }
}}
{
fullStock : 750,
details : {Title: “Java”, quantity : 3}
},
{,
fullStock : 400,
details : {Title: “Php”, quantity : 2}
}
$group
Group Docs by value
By default in memory processing
Helpful operators that go with:
$max, $min, $avg, $sum
$addToSet, $push
$first, $last
{
name : “Java Fun”,
publisher : “manning”,
price : 1000
}
{
name : “Oracle Fun”,
publisher : “manning”,
price : 1000
}
{
name : “Php Fun”,
publisher : “Orelly”,
price : 400
}
Compute Average
{$group : {
_id : “$publisher”,
avgPrice : {$avg : “$price” }
}}
{
_id: manning,
avgPrice :1000
},
{
_id : Php,,
avgPrice: 400
}
$unwind
Useful for doc containing array fields:
Create docs from array entries
Entries can be replace by value
{
publisher :“manning”,
title: [“java”, “Php”],
discount : 50%
}
$unwind
{$unwind : $category}
{
publisher : manning,
title: java,
discount :50%
},
{
publisher : manning,
title: Php,
discount :50%
}
$redact
Restrict access to docs based on doc fields
to define privileges
Useful terminology $$DESCEND, $$PRUNE,
$$KEEP
{
_id :123,
name : logo,
security : “ANYONE”,
Profit : {
security : “MARKETING”,
revenue : 500%,
ProfitByCountry : {
security : “BOARD_OF_DIRECTOR”
PL : 800 %
NO : 700 %
DK : 600%
SW : 500 %
}
}
}
$redact
db.products.aggregate([
{$match : {name : “logo”}},
$redact : {
$cond : {
if : {$eq : …},
then : “$$DESCEND”,
else : “$PRUNE
}
}
])
{...}
Sharding(Horizontal scaling)
Sharding
Allow to store data across multiple machines.
Ok but what for ?
Database systems with large data sets and high throughput applications
can challenge the capacity of a single server.
High query rates can exhaust the CPU capacity of the server.
Larger data sets exceed the storage capacity of a single machine
Working set sizes larger than the system’s RAM stress the I/O capacity of
disk drives.
Replica Set
Replica Set
Group of mongod processes that maintain the same data set. Provides redundancy and high
availability, in a nutshell basis for all production deployments.
Min 3 nodes required E.g.
Document oriented db.
Scale and performs well
Provide powerful aggregration framework
Tested on massive datasets.
Support for map reduce.
Summary
Questions ????

Mongo - an intermediate introduction

  • 1.
    MongoDB“Walking on waterand developing software from a specification are easy if both are frozen.” BY : NklMish @nklmish
  • 2.
    Why & What? Features Aggregationframework overview Sharding and Replica Set Summary Agenda
  • 3.
    This talk isnot about : NoSql Relational vs Non-Relational Comparison of other NoSql flavors NOTE
  • 4.
    “One size fitsall” approach no longer applies NoSql Database SchemaLess != NoSchema Document based Approach Non-Relational Dbs scale more easily, especially horizontally In a nutshell focus on speed, performance, flexibility and scalability. What is MongoDB? “Nothing endures but change”
  • 5.
    Agility. High Scalability =>horizontal => thousand of nodes or clouds or across multiple data centers Rich Indexing Real life example: Server Density OTTO Expedia, Forbes, MetLife,Bosch, etc. Why? “Luck is not a factor. Hope is not a strategy. Fear is not an option”
  • 6.
    RDBMS Tables Records/rows Queries return records Mapping “You improvise.You adapt. You overcome.” MongoDB Collections Documents/objects Queries return a cursor ???? Because of performance, efficiency
  • 7.
    Standard Db Features Docsare stored in BSON => Mongo understands JSON natively => Any Valid JSON can be imported and queried(E.g. mongoimport -f foo.json). Map Reduce Aggregation Framework GridFS (for Efficient binary large objects ) GeoNear Features “Stable Velocity. Sustainable Pace.”
  • 8.
    $match – filterdocs $project – reshape docs $group – Summarize docs $unwind – Expand docs $sort – Order docs $limit/$skip – paginate docs $redact – Restrict docs $geoNear – Proximity sort docs $let, $map – Define variables Aggregation In Nutshell “Talk is cheap. Show me the code”
  • 9.
    { name : “Java”, price: 250, Type : “ebook” } { name : “Php”, price : 200, Type : “ebook” } { name : “Javascript”, price : 150, Type : “hardCopy” } Matching {$match : { Type : “ebook” }} { name : “Java”, price : 200, Type : “ebook” } { name : “Php”, price : 200, Type : “ebook” }
  • 10.
    { name : “Java”, price: 250, Type : “ebook” } { name : “Php”, price : 200, Type : “ebook” } { name : “Javascript”, price : 150, Type : “hardCopy” } Query Operator {$match : { price : {$gt : 200} }} { name : “Java”, price : 250, Type : “ebook” } { name : “Php”, price : 200, Type : “ebook” }
  • 11.
    { name : “Java”, price: 250, Type : “ebook” } { name : “Php”, price : 200, Type : “ebook” } Including Excluding part of document {$project : { name :1, price : 1, Type : 0 }} { name : “Java”, price : 200” } { name : “Php”, price : 200 }
  • 12.
    { name : “Java”, price: 250, Type : “ebook”, quantity : 3 } { name : “Php”, price : 200, Type : “ebook”, quantity : 2 } Custom Field Computation {$project : { fullStock : { $mul : [“$price”, “$quantity”] }, Title : “$name” }} { fullStock : 750, Title: “Java” } { price : 400, Title: “Php” }
  • 13.
    { name : “Java”, price: 250, Type : “ebook”, quantity : 3 } { name : “Php”, price : 200, Type : “ebook”, quantity : 2 } Generating Sub-Document {$project : { fullStock : { $mul : [“$price”, “$quantity”] }, details : {Title : “$name”, quantity : “$quantity” } }} { fullStock : 750, details : {Title: “Java”, quantity : 3} }, {, fullStock : 400, details : {Title: “Php”, quantity : 2} }
  • 14.
    $group Group Docs byvalue By default in memory processing Helpful operators that go with: $max, $min, $avg, $sum $addToSet, $push $first, $last
  • 15.
    { name : “JavaFun”, publisher : “manning”, price : 1000 } { name : “Oracle Fun”, publisher : “manning”, price : 1000 } { name : “Php Fun”, publisher : “Orelly”, price : 400 } Compute Average {$group : { _id : “$publisher”, avgPrice : {$avg : “$price” } }} { _id: manning, avgPrice :1000 }, { _id : Php,, avgPrice: 400 }
  • 16.
    $unwind Useful for doccontaining array fields: Create docs from array entries Entries can be replace by value
  • 17.
    { publisher :“manning”, title: [“java”,“Php”], discount : 50% } $unwind {$unwind : $category} { publisher : manning, title: java, discount :50% }, { publisher : manning, title: Php, discount :50% }
  • 18.
    $redact Restrict access todocs based on doc fields to define privileges Useful terminology $$DESCEND, $$PRUNE, $$KEEP
  • 19.
    { _id :123, name :logo, security : “ANYONE”, Profit : { security : “MARKETING”, revenue : 500%, ProfitByCountry : { security : “BOARD_OF_DIRECTOR” PL : 800 % NO : 700 % DK : 600% SW : 500 % } } } $redact db.products.aggregate([ {$match : {name : “logo”}}, $redact : { $cond : { if : {$eq : …}, then : “$$DESCEND”, else : “$PRUNE } } ]) {...}
  • 20.
  • 21.
    Sharding Allow to storedata across multiple machines. Ok but what for ? Database systems with large data sets and high throughput applications can challenge the capacity of a single server. High query rates can exhaust the CPU capacity of the server. Larger data sets exceed the storage capacity of a single machine Working set sizes larger than the system’s RAM stress the I/O capacity of disk drives.
  • 22.
  • 23.
    Replica Set Group ofmongod processes that maintain the same data set. Provides redundancy and high availability, in a nutshell basis for all production deployments. Min 3 nodes required E.g.
  • 24.
    Document oriented db. Scaleand performs well Provide powerful aggregration framework Tested on massive datasets. Support for map reduce. Summary
  • 25.