MongoDB
How to model and extract your data
whoami
Francesco Lo Franco
Software developer
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
@__kekko
it.linkedin.com/in/francescolofranco/
What is MongoDB?
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
MongoDB
is an open source database
that uses a
document-oriented
data model.
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
MongoDB Data Model
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
MongoDB uses a Json-like
representation of his data
(Bson)
Bson > Json
● custom types (Date, ObjectID...)
● faster
● lightweight
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
collections
documents
key-value pairs
MongoDB Data Model example (BLOG POST):
{
"_id": ObjectId("508d27069cc1ae293b36928d"),
"title": "This is the title",
"tags": [
"chocolate",
"milk"
],
"created_date": ISODate("2012-10-28T12:41:39.110Z"),
"author_id": ObjectId("508d280e9cc1ae293b36928e"),
"comments": [
{
"content": "This is the body of comment",
"author_id": ObjectId("508d34"),
"tag": "coffee"},
{
"content": "This is the body of comment",
"author_id": ObjectId("508d35")}
]
}
MongoDB Data Model example (BLOG POST):
{
"_id": ObjectId("508d27069cc1ae293b36928d"),
"title": "This is the title",
"tags": [
"chocolate",
"milk"
],
"created_date": ISODate("2012-10-28T12:41:39.110Z"),
"author_id": ObjectId("508d280e9cc1ae293b36928e"),
"comments": [
{
"content": "This is the body of comment",
"author_id": ObjectId("508d34"),
"tag": "coffee"},
{
"content": "This is the body of comment",
"author_id": ObjectId("508d35")}
]
}
MongoDB Data Model example (BLOG POST):
{
"_id": ObjectId("508d27069cc1ae293b36928d"),
"title": "This is the title",
"tags": [
"chocolate",
"milk"
],
"created_date": ISODate("2012-10-28T12:41:39.110Z"),
"author_id": ObjectId("508d280e9cc1ae293b36928e"),
"comments": [
{
"content": "This is the body of comment",
"author_id": ObjectId("508d34"),
"tag": "coffee"},
{
"content": "This is the body of comment",
"author_id": ObjectId("508d35")}
]
}
REFERENCING
vs
EMBEDDING
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
One to few
> db.employee.findOne()
{
name: 'Kate Monster',
ssn: '123-456-7890',
addresses:
[{ street: 'Lombard Street, 26',
zip_code: '22545'
},
{ street: 'Abbey Road, 99',
zip_code: '33568'
}]
}
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Disadvantages:
- It’s really hard accessing the embedded
details as stand-alone entities
example:
“Show all addresses with a certain zip code”
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Advantages:
- One query to get them all
- embedded + value object =
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
One to many
> db.parts.findOne()
{
_id: ObjectID('AAAAF17CD2AAAAAAF17CD2'),
partno: '123-aff-456',
name: '#4 grommet',
qty: 94,
cost: 0.94,
price: 3.99
}
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
One to many
> db.products.findOne()
{
name: 'smoke shifter',
manufacturer: 'Acme Corp',
catalog_number: 1234,
parts: [
ObjectID('AAAAF17CD2AAAAAAF17CD2AA'),
ObjectID('F17CD2AAAAAAF17CD2AAAAAA'),
ObjectID('D2AAAAAAF17CD2AAAAAAF17C'),
// etc
]
}
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Disadvantages:
“find all parts that compose a product”
> product = db.products.findOne({
catalog_number: 1234
});
> product_parts = db.parts.find({
_id: { $in : product.parts }
} ).toArray() ;
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
DENORMALIZATION
Advantages:
- Easy to search and update an individual
referenced document (a single part)
- free N-to-N schema without join table
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
parts: [
ObjectID('AAAAF17CD2AAAAAAF17CD2AA'),
ObjectID('F17CD2AAAAAAF17CD2AAAAAA'),
ObjectID('D2AAAAAAF17CD2AAAAAAF17C')
]
One to squillions
(Logging)
- document limit size = 16M
- can be reached even if the referencing
array contains only the objectId field
(~ 1,300,000 references)
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Parent Referencing
> db.hosts.findOne()
{
_id: ObjectID('AAAB'),
name: 'goofy.example.com',
ipaddr: '127.66.66.66'
}
> db.logmsg.findOne()
{
time: ISODate("2014-03-28T09:42:41.382Z"),
message: 'cpu is on fire!',
host: ObjectID('AAAB')
}
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Disadvantages:
“find the most recent 5K messages for a host”
> host = db.hosts.findOne({
ipaddr : '127.66.66.66'
});
> last_5k_msg = db.logmsg.find({
host: host._id})
.sort({time : -1})
.limit(5000)
.toArray()
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
DENORMALIZATION
DENORMALIZATION
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
NORMALIZATION
To be denormalized
> db.products.findOne()
{
name: 'smoke shifter',
manufacturer: 'Acme Corp',
catalog_number: 1234,
parts: [
ObjectID('AAAA'),
ObjectID('F17C'),
ObjectID('D2AA'),
// etc
]
}
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Denormalized (partial + one side)
> db.products.findOne()
{
name: 'smoke shifter',
manufacturer: 'Acme Corp',
catalog_number: 1234,
parts: [
{ id: ObjectID('AAAA'), name: 'part1'},
{ id: ObjectID('F17C'), name: 'part2'},
{ id: ObjectID('D2AA'), name: 'part3'},
// etc
]
}
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Advantages:
- Easy query to get product part name
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Disadvantages:
- Updates become more expensive
- Cannot assure atomic and isolated
updates
MongoDB
it’s not
A.C.I.D. compliant
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
MongoDB supports only
single document level
transaction
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
So, how can I have an
(almost)
ACID Mongo?
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
1. Two Phase Commit (A+C)
2. $isolate operator (I)
3. enable journaling (D)
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Two Phase Commit (A+C)
If we make a multi-update, a
system failure between the 2
separate updates can bring to
unrecoverable inconsistency
Create a transaction document
tracking all the needed data
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Two Phase Commit Example
Uses a bridge “transaction”
document for retrying/rollback
operations not completed due to a
system failure
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Two Phase Commit Example
TODO: transfer 100$ from A to B
Account A:
total: 1000,
on_going_transactions: [];
Account B:
total: 500,
on_going_transactions: [];
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Two Phase Commit Example
Transaction document
from: “A”,
to: “B”,
amount: 100,
status: “initial”,
datetime: New Date();
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Two Phase Commit Example
Step 1: Update the transaction
_id: “zzzz”
from: “A”,
to: “B”,
amount: 100,
status: “pending”,
datetime: New Date();
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Two Phase Commit Example
Step 2: Update Account A
update total: -100;
push on_going_transactions:
{transaction where _id = “zzzz”}
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Two Phase Commit Example
Step 3: Update Account B
update total: +100;
push on_going_transactions:
{transaction where _id = “zzzz”}
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Two Phase Commit Example
Step 4: Update the transaction
_id: “zzzz”
from: “A”,
to: “B”,
amount: 100,
status: “applied”,
datetime: New Date();
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Two Phase Commit Example
Step 5: Update Account A
pull on_going_transactions:
{transaction where _id = “zzzz”}
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Two Phase Commit Example
Step 6: Update Account B
pull on_going_transactions:
{transaction where _id = “zzzz”}
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Two Phase Commit Example
Step 7: Update the transaction
_id: “zzzz”
from: “A”,
to: “B”,
amount: 100,
status: “done”,
datetime: New Date();
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Two Phase Commit
This pattern emulates the
sql transaction
management, achieving
Atomicity + Consistency
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
$isolate operator (I)
“You can ensure that no client sees
the changes until the operation
completes or errors out.”
db.car.update(
{ color : "RED" , $isolated : 1 },
{ $inc : { count : 1 } },
{ multi: true }
)
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Journaling (D)
Journaling is logging all writes
(every 100ms) for recovering
purpose in case of system failure
(crash)
If a clean shutdown is
accomplished, journal files are
erased
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Aggregation Framework
(finally)
def: “Aggregations are
operations that process data
records and return
computed results.”
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Aggregation Framework
1) C.R.U.D.
2) single purpose
aggregation operators
3) pipeline
4) map reduce
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Aggregation Framework
CRUD Operators:
- insert()
- find() / findOne()
- update()
- remove()
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Aggregation Framework
1) C.R.U.D.
2) single purpose
aggregation operators
3) pipeline
4) map reduce
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
SPAO
a) count
b) distinct
c) group
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
count
{ a: 1, b: 0 }
{ a: 1, b: 1 }
{ a: 1, b: 4 }
{ a: 2, b: 2 }
db.records.count( { a: 1 } ) = 3
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
distinct
{ name: "jim", age: 0 }
{ name: "kim", age: 1 }
{ name: "dim", age: 4 }
{ name: "sim", age: 2 }
db.foe.distinct("age")=[0, 1, 4, 2]
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
group
{ age: 12, count: 4 }
{ age: 12, count: 2 }
{ age: 14, count: 3 }
{ age: 14, count: 4 }
{ age: 16, count: 6 }
{ age: 18, count: 8 }
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
group
db.records.group({
key: { age: 1 },
cond: { age: { $lt: 16 } },
reduce:
function(cur,result)
{ result.count += cur.count },
initial: { count: 0 }
})
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
group
[
{ age: 12, count: 6 },
{ age: 14, count: 7 }
]
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Aggregation Framework
1) C.R.U.D.
2) single purpose
aggregation operators
3) pipeline
4) map reduce
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Pipeline
“Documents enter a multi-
stage pipeline that
transforms the documents
into an aggregated results”
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Pipeline
initial_doc $group
result1 $match
... ... ...
... ... ...
resultN $project
final
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Pipeline Example
> db.logs.findOne()
{
_id: ObjectId('a23ad345frt4'),
os: 'android',
token_id: 'ds2f43s4df',
at: ISODate("2012-10-28T12:41:39.110Z"),
event: “something just happened”,
}
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
“We need logs to be grouped by os, and
count how many in a single day
interval, sort by time”
Pipeline Example
Expected result:
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
os: 'android',
date: {
'year': 2012,
'month': 10
'day': 28
},
count: 125
Pipeline Example
$collection->aggregate(
array(
array('$project' => array(
'os' => 1,
'days' => array(
'year' => array('$year' => '$at'),
'month' => array('$month' => '$at'),
'day' => array('$dayOfMonth' => '$at')
)
)),
...
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Pipeline Example
...
array(
'$group' => array(
'_id' => array(
'os' => '$os',
'date' => '$days',
),
'count' => array('$sum' => 1)
)
)
),
...
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Pipeline Example
...
array(
'$sort' => array(
'_id.date' => 1
)
)
)
);
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Pipeline Optimization
…
{ $limit: 100 },
{ $skip: 5 },
{ $limit: 10 },
{ $skip: 2 }
...
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Pipeline Optimization
…
{ $limit: 100 },
{ $limit: 15 },
{ $skip: 5 },
{ $skip: 2 }
...
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Pipeline Optimization
…
{ $limit: 15 },
{ $skip: 7 }
...
Aggregation Framework
1) C.R.U.D.
2) single purpose
aggregation operators
3) pipeline
4) map reduce
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Map Reduce
“Map reduce is a data
processing paradigm for
condensing large volumes of
data into useful aggregated
results”
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Map Reduce Example
> db.orders.find()
{ sku: “01A”, qty: 8, total: 88 },
{ sku: “01A”, qty: 7, total: 79 },
{ sku: “02B”, qty: 9, total: 27 },
{ sku: “03C”, qty: 8, total: 24 },
{ sku: “03C”, qty: 3, total: 12 }
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Map Reduce Example
“Calculate the avg price we sell
our products, grouped by sku
code, with total quantity and
total income, starting from
1/1/2015”
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Map Reduce Example
db.orders.mapReduce(
mapFunction,
reduceFunction,
{
out: { merge: "reduced_orders" },
query: {
date:{ $gt: new Date('01/01/2015') }
},
finalize: finalizeFunction
}
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Map Reduce Example
var mapFunction =
function() {
var key = this.sku;
var value = {
tot: this.total
qty: this.qty
};
emit(key, value);
}
Result:
{ 01A: [{tot: 88, qty: 8}, {tot: 79, qty: 7}] },
{ 02B: {tot: 27, qty: 9} },
{ 03C: [{tot: 24, qty: 8}, {tot: 12, qty: 3}] }
Map Reduce Example
db.orders.mapReduce(
mapFunction,
reduceFunction,
{
out: { merge: "reduced_orders" },
query: {
date:{ $gt: new Date('01/01/2015') }
},
finalize: finalizeFunction
}
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Map Reduce Example
var reduceFunction =
reducedVal = { qty: 0, tot: 0}
function(key, values) {
for(var i, i < values.length, i++) {
reducedVal.qty += values[i].qty
reducedVal.tot += values[i].tot
};
return reducedVal;
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Map Reduce Example
Result:
{ 01A: {tot: 167, qty: 15} },
{ 02B: {tot: 27, qty: 9} },
{ 03C: {tot: 36, qty: 11} }
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Map Reduce Example
db.orders.mapReduce(
mapFunction,
reduceFunction,
{
out: { merge: "reduced_orders" },
query: {
date:{ $gt: new Date('01/01/2015') }
},
finalize: finalizeFunction
}
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Map Reduce Example
var finalizeFunction =
function(key, reducedVal) {
reducedVal.avg =
reducedVal.tot/reducedVal.qty;
return reducedVal;
};
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Map Reduce Example
Result:
{01A: {tot: 167, qty: 15, avg: 11.13} },
{02B: {tot: 27, qty: 9, avg: 3} },
{03C: {tot: 36, qty: 11, avg: 3.27} }
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Map Reduce Example
db.orders.mapReduce(
mapFunction,
reduceFunction,
{
out: { merge: "reduced_orders" },
query: {
date:{ $gt: new Date('01/01/2015') }
},
finalize: finalizeFunction
}
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
Map Reduce Example
> db.reduced_orders.find()
{01A: {tot: 167, qty: 15, avg: 11.13} },
{02B: {tot: 27, qty: 9, avg: 3} },
{03C: {tot: 36, qty: 11, avg: 3.27} }
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
thanks
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
References:
➔ http://docs.mongodb.org/manual
➔ http://blog.mongodb.org/post/87200945828/
➔ http://thejackalofjavascript.com/mapreduce-in-mongodb/
Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework

MongoDB - How to model and extract your data

  • 1.
    MongoDB How to modeland extract your data
  • 2.
    whoami Francesco Lo Franco Softwaredeveloper Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework @__kekko it.linkedin.com/in/francescolofranco/
  • 3.
    What is MongoDB? FrancescoLo Franco - @__kekko | MongoDB Aggregation Framework
  • 4.
    MongoDB is an opensource database that uses a document-oriented data model. Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 5.
    MongoDB Data Model FrancescoLo Franco - @__kekko | MongoDB Aggregation Framework
  • 6.
    MongoDB uses aJson-like representation of his data (Bson) Bson > Json ● custom types (Date, ObjectID...) ● faster ● lightweight Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 7.
    Francesco Lo Franco- @__kekko | MongoDB Aggregation Framework collections documents key-value pairs
  • 8.
    MongoDB Data Modelexample (BLOG POST): { "_id": ObjectId("508d27069cc1ae293b36928d"), "title": "This is the title", "tags": [ "chocolate", "milk" ], "created_date": ISODate("2012-10-28T12:41:39.110Z"), "author_id": ObjectId("508d280e9cc1ae293b36928e"), "comments": [ { "content": "This is the body of comment", "author_id": ObjectId("508d34"), "tag": "coffee"}, { "content": "This is the body of comment", "author_id": ObjectId("508d35")} ] }
  • 9.
    MongoDB Data Modelexample (BLOG POST): { "_id": ObjectId("508d27069cc1ae293b36928d"), "title": "This is the title", "tags": [ "chocolate", "milk" ], "created_date": ISODate("2012-10-28T12:41:39.110Z"), "author_id": ObjectId("508d280e9cc1ae293b36928e"), "comments": [ { "content": "This is the body of comment", "author_id": ObjectId("508d34"), "tag": "coffee"}, { "content": "This is the body of comment", "author_id": ObjectId("508d35")} ] }
  • 10.
    MongoDB Data Modelexample (BLOG POST): { "_id": ObjectId("508d27069cc1ae293b36928d"), "title": "This is the title", "tags": [ "chocolate", "milk" ], "created_date": ISODate("2012-10-28T12:41:39.110Z"), "author_id": ObjectId("508d280e9cc1ae293b36928e"), "comments": [ { "content": "This is the body of comment", "author_id": ObjectId("508d34"), "tag": "coffee"}, { "content": "This is the body of comment", "author_id": ObjectId("508d35")} ] }
  • 11.
    REFERENCING vs EMBEDDING Francesco Lo Franco- @__kekko | MongoDB Aggregation Framework
  • 12.
    One to few >db.employee.findOne() { name: 'Kate Monster', ssn: '123-456-7890', addresses: [{ street: 'Lombard Street, 26', zip_code: '22545' }, { street: 'Abbey Road, 99', zip_code: '33568' }] } Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 13.
    Disadvantages: - It’s reallyhard accessing the embedded details as stand-alone entities example: “Show all addresses with a certain zip code” Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 14.
    Advantages: - One queryto get them all - embedded + value object = Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 15.
    One to many >db.parts.findOne() { _id: ObjectID('AAAAF17CD2AAAAAAF17CD2'), partno: '123-aff-456', name: '#4 grommet', qty: 94, cost: 0.94, price: 3.99 } Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 16.
    One to many >db.products.findOne() { name: 'smoke shifter', manufacturer: 'Acme Corp', catalog_number: 1234, parts: [ ObjectID('AAAAF17CD2AAAAAAF17CD2AA'), ObjectID('F17CD2AAAAAAF17CD2AAAAAA'), ObjectID('D2AAAAAAF17CD2AAAAAAF17C'), // etc ] } Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 17.
    Disadvantages: “find all partsthat compose a product” > product = db.products.findOne({ catalog_number: 1234 }); > product_parts = db.parts.find({ _id: { $in : product.parts } } ).toArray() ; Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework DENORMALIZATION
  • 18.
    Advantages: - Easy tosearch and update an individual referenced document (a single part) - free N-to-N schema without join table Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework parts: [ ObjectID('AAAAF17CD2AAAAAAF17CD2AA'), ObjectID('F17CD2AAAAAAF17CD2AAAAAA'), ObjectID('D2AAAAAAF17CD2AAAAAAF17C') ]
  • 19.
    One to squillions (Logging) -document limit size = 16M - can be reached even if the referencing array contains only the objectId field (~ 1,300,000 references) Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 20.
    Parent Referencing > db.hosts.findOne() { _id:ObjectID('AAAB'), name: 'goofy.example.com', ipaddr: '127.66.66.66' } > db.logmsg.findOne() { time: ISODate("2014-03-28T09:42:41.382Z"), message: 'cpu is on fire!', host: ObjectID('AAAB') } Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 21.
    Disadvantages: “find the mostrecent 5K messages for a host” > host = db.hosts.findOne({ ipaddr : '127.66.66.66' }); > last_5k_msg = db.logmsg.find({ host: host._id}) .sort({time : -1}) .limit(5000) .toArray() Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework DENORMALIZATION
  • 22.
    DENORMALIZATION Francesco Lo Franco- @__kekko | MongoDB Aggregation Framework NORMALIZATION
  • 23.
    To be denormalized >db.products.findOne() { name: 'smoke shifter', manufacturer: 'Acme Corp', catalog_number: 1234, parts: [ ObjectID('AAAA'), ObjectID('F17C'), ObjectID('D2AA'), // etc ] } Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 24.
    Denormalized (partial +one side) > db.products.findOne() { name: 'smoke shifter', manufacturer: 'Acme Corp', catalog_number: 1234, parts: [ { id: ObjectID('AAAA'), name: 'part1'}, { id: ObjectID('F17C'), name: 'part2'}, { id: ObjectID('D2AA'), name: 'part3'}, // etc ] } Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 25.
    Advantages: - Easy queryto get product part name Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 26.
    Disadvantages: - Updates becomemore expensive - Cannot assure atomic and isolated updates MongoDB it’s not A.C.I.D. compliant Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 27.
    MongoDB supports only singledocument level transaction Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 28.
    So, how canI have an (almost) ACID Mongo? Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 29.
    1. Two PhaseCommit (A+C) 2. $isolate operator (I) 3. enable journaling (D) Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 30.
    Two Phase Commit(A+C) If we make a multi-update, a system failure between the 2 separate updates can bring to unrecoverable inconsistency Create a transaction document tracking all the needed data Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 31.
    Two Phase CommitExample Uses a bridge “transaction” document for retrying/rollback operations not completed due to a system failure Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 32.
    Two Phase CommitExample TODO: transfer 100$ from A to B Account A: total: 1000, on_going_transactions: []; Account B: total: 500, on_going_transactions: []; Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 33.
    Two Phase CommitExample Transaction document from: “A”, to: “B”, amount: 100, status: “initial”, datetime: New Date(); Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 34.
    Two Phase CommitExample Step 1: Update the transaction _id: “zzzz” from: “A”, to: “B”, amount: 100, status: “pending”, datetime: New Date(); Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 35.
    Two Phase CommitExample Step 2: Update Account A update total: -100; push on_going_transactions: {transaction where _id = “zzzz”} Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 36.
    Two Phase CommitExample Step 3: Update Account B update total: +100; push on_going_transactions: {transaction where _id = “zzzz”} Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 37.
    Two Phase CommitExample Step 4: Update the transaction _id: “zzzz” from: “A”, to: “B”, amount: 100, status: “applied”, datetime: New Date(); Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 38.
    Two Phase CommitExample Step 5: Update Account A pull on_going_transactions: {transaction where _id = “zzzz”} Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 39.
    Two Phase CommitExample Step 6: Update Account B pull on_going_transactions: {transaction where _id = “zzzz”} Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 40.
    Two Phase CommitExample Step 7: Update the transaction _id: “zzzz” from: “A”, to: “B”, amount: 100, status: “done”, datetime: New Date(); Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 41.
    Two Phase Commit Thispattern emulates the sql transaction management, achieving Atomicity + Consistency Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 42.
    $isolate operator (I) “Youcan ensure that no client sees the changes until the operation completes or errors out.” db.car.update( { color : "RED" , $isolated : 1 }, { $inc : { count : 1 } }, { multi: true } ) Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 43.
    Journaling (D) Journaling islogging all writes (every 100ms) for recovering purpose in case of system failure (crash) If a clean shutdown is accomplished, journal files are erased Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 44.
    Aggregation Framework (finally) def: “Aggregationsare operations that process data records and return computed results.” Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 45.
    Aggregation Framework 1) C.R.U.D. 2)single purpose aggregation operators 3) pipeline 4) map reduce Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 46.
    Aggregation Framework CRUD Operators: -insert() - find() / findOne() - update() - remove() Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 47.
    Aggregation Framework 1) C.R.U.D. 2)single purpose aggregation operators 3) pipeline 4) map reduce Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 48.
    SPAO a) count b) distinct c)group Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 49.
    count { a: 1,b: 0 } { a: 1, b: 1 } { a: 1, b: 4 } { a: 2, b: 2 } db.records.count( { a: 1 } ) = 3 Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 50.
    distinct { name: "jim",age: 0 } { name: "kim", age: 1 } { name: "dim", age: 4 } { name: "sim", age: 2 } db.foe.distinct("age")=[0, 1, 4, 2] Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 51.
    group { age: 12,count: 4 } { age: 12, count: 2 } { age: 14, count: 3 } { age: 14, count: 4 } { age: 16, count: 6 } { age: 18, count: 8 } Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 52.
    group db.records.group({ key: { age:1 }, cond: { age: { $lt: 16 } }, reduce: function(cur,result) { result.count += cur.count }, initial: { count: 0 } }) Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 53.
    group [ { age: 12,count: 6 }, { age: 14, count: 7 } ] Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 54.
    Aggregation Framework 1) C.R.U.D. 2)single purpose aggregation operators 3) pipeline 4) map reduce Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 55.
    Pipeline “Documents enter amulti- stage pipeline that transforms the documents into an aggregated results” Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 56.
    Pipeline initial_doc $group result1 $match ...... ... ... ... ... resultN $project final Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 57.
    Pipeline Example > db.logs.findOne() { _id:ObjectId('a23ad345frt4'), os: 'android', token_id: 'ds2f43s4df', at: ISODate("2012-10-28T12:41:39.110Z"), event: “something just happened”, } Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework “We need logs to be grouped by os, and count how many in a single day interval, sort by time”
  • 58.
    Pipeline Example Expected result: FrancescoLo Franco - @__kekko | MongoDB Aggregation Framework os: 'android', date: { 'year': 2012, 'month': 10 'day': 28 }, count: 125
  • 59.
    Pipeline Example $collection->aggregate( array( array('$project' =>array( 'os' => 1, 'days' => array( 'year' => array('$year' => '$at'), 'month' => array('$month' => '$at'), 'day' => array('$dayOfMonth' => '$at') ) )), ... Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 60.
    Pipeline Example ... array( '$group' =>array( '_id' => array( 'os' => '$os', 'date' => '$days', ), 'count' => array('$sum' => 1) ) ) ), ... Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 61.
    Pipeline Example ... array( '$sort' =>array( '_id.date' => 1 ) ) ) ); Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 62.
    Pipeline Optimization … { $limit:100 }, { $skip: 5 }, { $limit: 10 }, { $skip: 2 } ... Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 63.
    Francesco Lo Franco- @__kekko | MongoDB Aggregation Framework Pipeline Optimization … { $limit: 100 }, { $limit: 15 }, { $skip: 5 }, { $skip: 2 } ...
  • 64.
    Francesco Lo Franco- @__kekko | MongoDB Aggregation Framework Pipeline Optimization … { $limit: 15 }, { $skip: 7 } ...
  • 65.
    Aggregation Framework 1) C.R.U.D. 2)single purpose aggregation operators 3) pipeline 4) map reduce Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 66.
    Map Reduce “Map reduceis a data processing paradigm for condensing large volumes of data into useful aggregated results” Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 67.
    Map Reduce Example >db.orders.find() { sku: “01A”, qty: 8, total: 88 }, { sku: “01A”, qty: 7, total: 79 }, { sku: “02B”, qty: 9, total: 27 }, { sku: “03C”, qty: 8, total: 24 }, { sku: “03C”, qty: 3, total: 12 } Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 68.
    Map Reduce Example “Calculatethe avg price we sell our products, grouped by sku code, with total quantity and total income, starting from 1/1/2015” Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 69.
    Map Reduce Example db.orders.mapReduce( mapFunction, reduceFunction, { out:{ merge: "reduced_orders" }, query: { date:{ $gt: new Date('01/01/2015') } }, finalize: finalizeFunction } Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 70.
    Map Reduce Example varmapFunction = function() { var key = this.sku; var value = { tot: this.total qty: this.qty }; emit(key, value); } Result: { 01A: [{tot: 88, qty: 8}, {tot: 79, qty: 7}] }, { 02B: {tot: 27, qty: 9} }, { 03C: [{tot: 24, qty: 8}, {tot: 12, qty: 3}] }
  • 71.
    Map Reduce Example db.orders.mapReduce( mapFunction, reduceFunction, { out:{ merge: "reduced_orders" }, query: { date:{ $gt: new Date('01/01/2015') } }, finalize: finalizeFunction } Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 72.
    Map Reduce Example varreduceFunction = reducedVal = { qty: 0, tot: 0} function(key, values) { for(var i, i < values.length, i++) { reducedVal.qty += values[i].qty reducedVal.tot += values[i].tot }; return reducedVal; Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 73.
    Map Reduce Example Result: {01A: {tot: 167, qty: 15} }, { 02B: {tot: 27, qty: 9} }, { 03C: {tot: 36, qty: 11} } Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 74.
    Map Reduce Example db.orders.mapReduce( mapFunction, reduceFunction, { out:{ merge: "reduced_orders" }, query: { date:{ $gt: new Date('01/01/2015') } }, finalize: finalizeFunction } Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 75.
    Map Reduce Example varfinalizeFunction = function(key, reducedVal) { reducedVal.avg = reducedVal.tot/reducedVal.qty; return reducedVal; }; Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 76.
    Map Reduce Example Result: {01A:{tot: 167, qty: 15, avg: 11.13} }, {02B: {tot: 27, qty: 9, avg: 3} }, {03C: {tot: 36, qty: 11, avg: 3.27} } Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 77.
    Map Reduce Example db.orders.mapReduce( mapFunction, reduceFunction, { out:{ merge: "reduced_orders" }, query: { date:{ $gt: new Date('01/01/2015') } }, finalize: finalizeFunction } Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 78.
    Map Reduce Example >db.reduced_orders.find() {01A: {tot: 167, qty: 15, avg: 11.13} }, {02B: {tot: 27, qty: 9, avg: 3} }, {03C: {tot: 36, qty: 11, avg: 3.27} } Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework
  • 79.
    thanks Francesco Lo Franco- @__kekko | MongoDB Aggregation Framework
  • 80.
    References: ➔ http://docs.mongodb.org/manual ➔ http://blog.mongodb.org/post/87200945828/ ➔http://thejackalofjavascript.com/mapreduce-in-mongodb/ Francesco Lo Franco - @__kekko | MongoDB Aggregation Framework