Operational Intelligence with MongoDB Webinar

Operational Intelligence
with MongoDB
Edouard Servan-Schreiber, Ph.D.
Director for Solution Architecture
October 11th 2012

1

Real Time
Analytics Engine

Data
Source
Data
Source
Data
Source

2

• Lots of data sources
High write volume
• Lots of data from each source

Dynamic queries • Users can drill down into data

• Lots of clients
Fast queries
• High request rate

Minimize delay
between collection &
• How long before an event appears
query in a report?

4

Upserts avoid
unnecessary reads

Asynchronous writes

Data
Data
Sources
Data Writes buffered in
Sources
Data
Sources RAM and flushed to
Sources disk in bulk

Spread writes over
multiple shards

5

Original 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif
Event Data HTTP/1.0" 200 2326 “http://www.example.com/start.html" "Mozilla/4.08
[en] (Win98; I ;Nav)”

As BSON doc = {
_id: ObjectId('4f442120eb03305789000000'),
host: "127.0.0.1",
time: ISODate("2000-10-10T20:55:36Z"),
path: "/apache_pb.gif",
referer: “http://www.example.com/start.html",
user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)”
}

Insert to
db.logs.insert( doc )
MongoDB

6

Find all db.logs.find( { ‘path’ : ‘/index.html’ } )
logs for a
URL

Find all db.logs.find( { ‘time’ :
logs for a { ‘$gte’ : new Date(2012,0),
time ‘$lt’ : new Date(2012,1) } } );
range

Find all db.logs.find( {
logs for a ‘host’ : ‘127.0.0.1’,
host over ‘time’ : { ‘$gte’ : new Date(2012,0),
‘$lt’ : new Date(2012, 1) } } );
a range of
dates

7

• Aggregation Framework for on-demand
rollups

• Map/Reduce Framework for background
rollups

• Pre-Aggregation for real-time reporting

8

$project $match $limit $skip
$unwind $group $sort
Requests db.logs.aggregate( [
per day by { '$match': {
URL 'time': {
'$gte': new Date(2012,0),
'$lt': new Date(2012,1) } } },
{ '$project': {
'path': 1,
'date': {
'y': { '$year': '$time' },
'm': { '$month': '$time' },
'd': { '$dayOfMonth': '$time' } } } },
{ '$group': {
'_id': {
'p':'$path’,
'y': '$date.y',
'm': '$date.m',
'd': '$date.d' },
'hits': { '$sum': 1 } } },
]) 9

{
‘ok’: 1,
‘result’: [
{ '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 1 },'hits’: 124 } },
{ '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 2 },'hits’: 245} },
{ '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 5 },'hits’: 94} }
]
}

10

Generate var map = function() {
hourly var key = {
rollups p: this.path,
d: new Date(
from log
this.ts.getFullYear(),
data this.ts.getMonth(),
this.ts.getDate(),
this.ts.getHours(),
0, 0, 0) };
emit( key, { hits: 1 } );
}

11

Generate var reduce = function(key, values) {
hourly var r = { hits: 0 };
rollups values.forEach(function(v) {
r.hits += v.hits;
from log
});
data return r;
}
)

12

Generate cutoff = new Date(2012,0,1)
hourly
rollups query = { 'ts': { '$gt': last_run, '$lt': cutoff } }
from log
db.logs.mapReduce( map, reduce, {
data ‘query’: query,
‘out’: { ‘reduce’ : ‘stats.hourly’ } } )

last_run = cutoff

13

> db.stats.hourly.find()
{ '_id': {'p':’/index.html’,’d’:ISODate(‚2012-0-1 00:00:00‛) },
’value': { ’hits’: 124 } },
’value': { ’hits’: 245} },
’value': { ’hits’: 322} },
’value': { ’hits’: 175} },
... More ...

14

Runs Runs
every every day
hour
Map Map
Reduce Reduce

Collection 1 : Collection 2: Collection 3:
Raw Logs Hourly Stats Daily Stats

15

Data for {
URL / _id: "20101010/site-1/apache_pb.gif",
Date metadata: {
date: ISODate("2000-10-10T00:00:00Z"),
site: "site-1",
page: "/apache_pb.gif" },
daily: 5468426,
hourly: {
"0": 227850,
"1": 210231,
...
"23": 20457 },
minute: {
"0": 3612,
"1": 3241,
...
"1439": 2819 }
}

WARNING: arrays are not random accessed in MongoDB….
16

Data for {
URL / _id: "20101010/site-1/apache_pb.gif",
Date metadata: {
date: ISODate("2000-10-10T00:00:00Z"),
site: "site-1",
page: "/apache_pb.gif" },
daily: 5468426,
hourly: {
"0": {
‚0‛ : 3612,
‚1‛ : 3241
…
‚59‛ : 2130 }
"1": {
… }
….
‚23‛: {
….}
}

17

Data for id_daily = dt_utc.strftime('%Y%m%d/') + site + page
URL / hour = dt_utc.hour
Date minute = dt_utc.minute

# Get a datetime that only includes date info
d = datetime.combine(dt_utc.date(), time.min)
query = {
'_id': id_daily,
'metadata': { 'date': d, 'site': site, 'page': page } }
update = { '$inc': {
'hourly.%d' % (hour,): 1,
'minute.%d.%d' % (hour,minute): 1 } }

db.stats.daily.update(query, update, upsert=True)

18

Javascript Charting

19

Log Aggregation
with MongoDB as
sink

More complex
aggregations or
integration with
tools like Mahout

20

Operational Intelligence with MongoDB Webinar

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Operational Intelligence with MongoDB Webinar

Similar to Operational Intelligence with MongoDB Webinar (20)

More from MongoDB

More from MongoDB (20)

Operational Intelligence with MongoDB Webinar