SlideShare a Scribd company logo
1 of 21
Operational Intelligence
        with MongoDB
 Edouard Servan-Schreiber, Ph.D.
 Director for Solution Architecture
                 October 11th 2012

                                      1
Real Time
             Analytics Engine




 Data
Source
   Data
  Source
     Data
    Source




                                2
3
• Lots of data sources
 High write volume
                       • Lots of data from each source


 Dynamic queries       • Users can drill down into data


                       • Lots of clients
    Fast queries
                       • High request rate

   Minimize delay
between collection &
                       • How long before an event appears
       query             in a report?

                                                            4
Upserts avoid
                               unnecessary reads

   Asynchronous writes




 Data
  Data
Sources
    Data                         Writes buffered in
 Sources
     Data
  Sources                       RAM and flushed to
    Sources                         disk in bulk




          Spread writes over
           multiple shards



                                                      5
Original     127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif
Event Data   HTTP/1.0" 200 2326 “http://www.example.com/start.html" "Mozilla/4.08
             [en] (Win98; I ;Nav)”



As BSON      doc = {
                  _id: ObjectId('4f442120eb03305789000000'),
                  host: "127.0.0.1",
                  time: ISODate("2000-10-10T20:55:36Z"),
                  path: "/apache_pb.gif",
                  referer: “http://www.example.com/start.html",
                  user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)”
             }



Insert to
             db.logs.insert( doc )
MongoDB


                                                                                    6
Find all     db.logs.find( { ‘path’ : ‘/index.html’ } )
logs for a
URL


Find all     db.logs.find( { ‘time’ :
logs for a     { ‘$gte’ : new Date(2012,0),
time             ‘$lt’ : new Date(2012,1) } } );
range



Find all   db.logs.find( {
logs for a   ‘host’ : ‘127.0.0.1’,
host over    ‘time’ : { ‘$gte’ : new Date(2012,0),
                        ‘$lt’ : new Date(2012, 1) } } );
a range of
dates


                                                           7
• Aggregation Framework for on-demand
  rollups

• Map/Reduce Framework for background
  rollups

• Pre-Aggregation for real-time reporting


                                            8
$project        $match            $limit            $skip
$unwind        $group             $sort
Requests db.logs.aggregate( [
per day by { '$match': {
URL             'time': {
                    '$gte': new Date(2012,0),
                    '$lt': new Date(2012,1) } } },
             { '$project': {
                  'path': 1,
                  'date': {
                     'y': { '$year': '$time' },
                     'm': { '$month': '$time' },
                     'd': { '$dayOfMonth': '$time' } } } },
             { '$group': {
                 '_id': {
                   'p':'$path’,
                   'y': '$date.y',
                   'm': '$date.m',
                   'd': '$date.d' },
                 'hits': { '$sum': 1 } } },
           ])                                                 9
{
    ‘ok’: 1,
    ‘result’: [
      { '_id': {'p':’/index.html’,'y':   2012,'m':   1,'d':   1   },'hits’:   124 } },
      { '_id': {'p':’/index.html’,'y':   2012,'m':   1,'d':   2   },'hits’:   245} },
      { '_id': {'p':’/index.html’,'y':   2012,'m':   1,'d':   3   },'hits’:   322} },
      { '_id': {'p':’/index.html’,'y':   2012,'m':   1,'d':   4   },'hits’:   175} },
      { '_id': {'p':’/index.html’,'y':   2012,'m':   1,'d':   5   },'hits’:   94} }
    ]
}




                                                                                         10
Generate   var map = function() {
hourly       var key = {
rollups         p: this.path,
                d: new Date(
from log
                         this.ts.getFullYear(),
data                     this.ts.getMonth(),
                         this.ts.getDate(),
                         this.ts.getHours(),
                         0, 0, 0) };
             emit( key, { hits: 1 } );
           }




                                                  11
Generate   var reduce = function(key, values) {
hourly            var r = { hits: 0 };
rollups           values.forEach(function(v) {
                      r.hits += v.hits;
from log
                  });
data              return r;
              }
           )




                                                  12
Generate   cutoff = new Date(2012,0,1)
hourly
rollups    query = { 'ts': { '$gt': last_run, '$lt': cutoff } }
from log
           db.logs.mapReduce( map, reduce, {
data              ‘query’: query,
                  ‘out’: { ‘reduce’ : ‘stats.hourly’ } } )

           last_run = cutoff




                                                                  13
> db.stats.hourly.find()
    { '_id': {'p':’/index.html’,’d’:ISODate(‚2012-0-1   00:00:00‛) },
      ’value': { ’hits’: 124 } },
    { '_id': {'p':’/index.html’,’d’:ISODate(‚2012-0-1   01:00:00‛) },
      ’value': { ’hits’: 245} },
    { '_id': {'p':’/index.html’,’d’:ISODate(‚2012-0-1   02:00:00‛) },
      ’value': { ’hits’: 322} },
    { '_id': {'p':’/index.html’,’d’:ISODate(‚2012-0-1   03:00:00‛) },
      ’value': { ’hits’: 175} },
... More ...




                                                                        14
Runs                      Runs
                   every                   every day
                   hour
                  Map                       Map
                 Reduce                    Reduce




Collection 1 :             Collection 2:               Collection 3:
 Raw Logs                  Hourly Stats                Daily Stats




                                                                       15
Data for    {
URL /               _id: "20101010/site-1/apache_pb.gif",
Date                metadata: {
                        date: ISODate("2000-10-10T00:00:00Z"),
                        site: "site-1",
                        page: "/apache_pb.gif" },
                    daily: 5468426,
                    hourly: {
                        "0": 227850,
                        "1": 210231,
                        ...
                        "23": 20457 },
                    minute: {
                        "0": 3612,
                        "1": 3241,
                        ...
                        "1439": 2819 }
                }

       WARNING: arrays are not random accessed in MongoDB….
                                                                 16
Data for   {
URL /          _id: "20101010/site-1/apache_pb.gif",
Date           metadata: {
                   date: ISODate("2000-10-10T00:00:00Z"),
                   site: "site-1",
                   page: "/apache_pb.gif" },
               daily: 5468426,
               hourly: {
                   "0": {
                        ‚0‛ : 3612,
                        ‚1‛ : 3241
                        …
                        ‚59‛ : 2130 }
                   "1": {
                        … }
                    ….
                   ‚23‛: {
                         ….}
           }

                                                            17
Data for   id_daily = dt_utc.strftime('%Y%m%d/') + site + page
URL /      hour = dt_utc.hour
Date       minute = dt_utc.minute

           # Get a datetime that only includes date info
           d = datetime.combine(dt_utc.date(), time.min)
           query = {
             '_id': id_daily,
             'metadata': { 'date': d, 'site': site, 'page': page } }
           update = { '$inc': {
                         'hourly.%d' % (hour,): 1,
                         'minute.%d.%d' % (hour,minute): 1 } }

           db.stats.daily.update(query, update, upsert=True)




                                                                       18
Javascript Charting




                      19
Log Aggregation
with MongoDB as
      sink




          More complex
         aggregations or
         integration with
        tools like Mahout




                            20
Q&A


      21

More Related Content

What's hot

はじめてのMongoDB
はじめてのMongoDBはじめてのMongoDB
はじめてのMongoDBTakahiro Inoue
 
Hadoop - MongoDB Webinar June 2014
Hadoop - MongoDB Webinar June 2014Hadoop - MongoDB Webinar June 2014
Hadoop - MongoDB Webinar June 2014MongoDB
 
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...MongoDB
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance TuningMongoDB
 
2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamsky2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamskyData Con LA
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkBack to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkMongoDB
 
MongoDB Performance Debugging
MongoDB Performance DebuggingMongoDB Performance Debugging
MongoDB Performance DebuggingMongoDB
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineJason Terpko
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB
 
MySQL flexible schema and JSON for Internet of Things
MySQL flexible schema and JSON for Internet of ThingsMySQL flexible schema and JSON for Internet of Things
MySQL flexible schema and JSON for Internet of ThingsAlexander Rubin
 
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation FrameworkConceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation FrameworkMongoDB
 
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...MongoDB
 
Inside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source DatabaseInside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source DatabaseMike Dirolf
 
Webinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsWebinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsMongoDB
 
MongoDB dla administratora
MongoDB dla administratora MongoDB dla administratora
MongoDB dla administratora 3camp
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB
 
MongoDBで作るソーシャルデータ新解析基盤
MongoDBで作るソーシャルデータ新解析基盤MongoDBで作るソーシャルデータ新解析基盤
MongoDBで作るソーシャルデータ新解析基盤Takahiro Inoue
 
Mongodb debugging-performance-problems
Mongodb debugging-performance-problemsMongodb debugging-performance-problems
Mongodb debugging-performance-problemsMongoDB
 

What's hot (20)

はじめてのMongoDB
はじめてのMongoDBはじめてのMongoDB
はじめてのMongoDB
 
Hadoop - MongoDB Webinar June 2014
Hadoop - MongoDB Webinar June 2014Hadoop - MongoDB Webinar June 2014
Hadoop - MongoDB Webinar June 2014
 
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
Mongo db presentation
Mongo db presentationMongo db presentation
Mongo db presentation
 
2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamsky2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamsky
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkBack to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation Framework
 
MongoDB Performance Debugging
MongoDB Performance DebuggingMongoDB Performance Debugging
MongoDB Performance Debugging
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
 
MySQL flexible schema and JSON for Internet of Things
MySQL flexible schema and JSON for Internet of ThingsMySQL flexible schema and JSON for Internet of Things
MySQL flexible schema and JSON for Internet of Things
 
Indexing
IndexingIndexing
Indexing
 
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation FrameworkConceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
 
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
 
Inside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source DatabaseInside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source Database
 
Webinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsWebinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev Teams
 
MongoDB dla administratora
MongoDB dla administratora MongoDB dla administratora
MongoDB dla administratora
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
 
MongoDBで作るソーシャルデータ新解析基盤
MongoDBで作るソーシャルデータ新解析基盤MongoDBで作るソーシャルデータ新解析基盤
MongoDBで作るソーシャルデータ新解析基盤
 
Mongodb debugging-performance-problems
Mongodb debugging-performance-problemsMongodb debugging-performance-problems
Mongodb debugging-performance-problems
 

Viewers also liked

Horizon 2013 P2P Re-imagined: No User Pain, but Real Compliance Gains, A Cas...
Horizon 2013 P2P Re-imagined:  No User Pain, but Real Compliance Gains, A Cas...Horizon 2013 P2P Re-imagined:  No User Pain, but Real Compliance Gains, A Cas...
Horizon 2013 P2P Re-imagined: No User Pain, but Real Compliance Gains, A Cas...Zycus
 
Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...
Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...
Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...Kai Wähner
 
JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...
JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...
JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...Kai Wähner
 
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...Kai Wähner
 
Enterprise Integration Patterns Revisited (EIP, Apache Camel, Talend ESB)
Enterprise Integration Patterns Revisited (EIP, Apache Camel, Talend ESB)Enterprise Integration Patterns Revisited (EIP, Apache Camel, Talend ESB)
Enterprise Integration Patterns Revisited (EIP, Apache Camel, Talend ESB)Kai Wähner
 

Viewers also liked (6)

Horizon 2013 P2P Re-imagined: No User Pain, but Real Compliance Gains, A Cas...
Horizon 2013 P2P Re-imagined:  No User Pain, but Real Compliance Gains, A Cas...Horizon 2013 P2P Re-imagined:  No User Pain, but Real Compliance Gains, A Cas...
Horizon 2013 P2P Re-imagined: No User Pain, but Real Compliance Gains, A Cas...
 
101 lecture 10
101 lecture 10101 lecture 10
101 lecture 10
 
Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...
Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...
Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...
 
JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...
JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...
JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...
 
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...
 
Enterprise Integration Patterns Revisited (EIP, Apache Camel, Talend ESB)
Enterprise Integration Patterns Revisited (EIP, Apache Camel, Talend ESB)Enterprise Integration Patterns Revisited (EIP, Apache Camel, Talend ESB)
Enterprise Integration Patterns Revisited (EIP, Apache Camel, Talend ESB)
 

Similar to Operational Intelligence with MongoDB Webinar

How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...Gianfranco Palumbo
 
Social Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDBSocial Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDBTakahiro Inoue
 
Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework MongoDB
 
MongoDB for Analytics
MongoDB for AnalyticsMongoDB for Analytics
MongoDB for AnalyticsMongoDB
 
Intravert Server side processing for Cassandra
Intravert Server side processing for CassandraIntravert Server side processing for Cassandra
Intravert Server side processing for CassandraEdward Capriolo
 
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"DataStax Academy
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorHenrik Ingo
 
Maintenance for MongoDB Replica Sets
Maintenance for MongoDB Replica SetsMaintenance for MongoDB Replica Sets
Maintenance for MongoDB Replica SetsIgor Donchovski
 
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry PiMonitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry PiInfluxData
 
How to leverage what's new in MongoDB 3.6
How to leverage what's new in MongoDB 3.6How to leverage what's new in MongoDB 3.6
How to leverage what's new in MongoDB 3.6Maxime Beugnet
 
MongoDB Evenings Dallas: What's the Scoop on MongoDB & Hadoop
MongoDB Evenings Dallas: What's the Scoop on MongoDB & HadoopMongoDB Evenings Dallas: What's the Scoop on MongoDB & Hadoop
MongoDB Evenings Dallas: What's the Scoop on MongoDB & HadoopMongoDB
 
Mongodb intro
Mongodb introMongodb intro
Mongodb introchristkv
 
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...MongoDB
 
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...MongoDB
 
Map/Confused? A practical approach to Map/Reduce with MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDBMap/Confused? A practical approach to Map/Reduce with MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDBUwe Printz
 
Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)MongoSF
 
Marc s01 e02-crud-database
Marc s01 e02-crud-databaseMarc s01 e02-crud-database
Marc s01 e02-crud-databaseMongoDB
 
Building and Scaling Node.js Applications
Building and Scaling Node.js ApplicationsBuilding and Scaling Node.js Applications
Building and Scaling Node.js ApplicationsOhad Kravchick
 

Similar to Operational Intelligence with MongoDB Webinar (20)

How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
 
Social Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDBSocial Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDB
 
Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework
 
MongoDB for Analytics
MongoDB for AnalyticsMongoDB for Analytics
MongoDB for Analytics
 
Intravert Server side processing for Cassandra
Intravert Server side processing for CassandraIntravert Server side processing for Cassandra
Intravert Server side processing for Cassandra
 
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
 
Maintenance for MongoDB Replica Sets
Maintenance for MongoDB Replica SetsMaintenance for MongoDB Replica Sets
Maintenance for MongoDB Replica Sets
 
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry PiMonitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
 
How to leverage what's new in MongoDB 3.6
How to leverage what's new in MongoDB 3.6How to leverage what's new in MongoDB 3.6
How to leverage what's new in MongoDB 3.6
 
MongoDB Evenings Dallas: What's the Scoop on MongoDB & Hadoop
MongoDB Evenings Dallas: What's the Scoop on MongoDB & HadoopMongoDB Evenings Dallas: What's the Scoop on MongoDB & Hadoop
MongoDB Evenings Dallas: What's the Scoop on MongoDB & Hadoop
 
Mongodb intro
Mongodb introMongodb intro
Mongodb intro
 
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
 
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
 
Mongo db dla administratora
Mongo db dla administratoraMongo db dla administratora
Mongo db dla administratora
 
Map/Confused? A practical approach to Map/Reduce with MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDBMap/Confused? A practical approach to Map/Reduce with MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDB
 
Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)
 
Marc s01 e02-crud-database
Marc s01 e02-crud-databaseMarc s01 e02-crud-database
Marc s01 e02-crud-database
 
Building and Scaling Node.js Applications
Building and Scaling Node.js ApplicationsBuilding and Scaling Node.js Applications
Building and Scaling Node.js Applications
 
Coding serbia
Coding serbiaCoding serbia
Coding serbia
 

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Operational Intelligence with MongoDB Webinar

  • 1. Operational Intelligence with MongoDB Edouard Servan-Schreiber, Ph.D. Director for Solution Architecture October 11th 2012 1
  • 2. Real Time Analytics Engine Data Source Data Source Data Source 2
  • 3. 3
  • 4. • Lots of data sources High write volume • Lots of data from each source Dynamic queries • Users can drill down into data • Lots of clients Fast queries • High request rate Minimize delay between collection & • How long before an event appears query in a report? 4
  • 5. Upserts avoid unnecessary reads Asynchronous writes Data Data Sources Data Writes buffered in Sources Data Sources RAM and flushed to Sources disk in bulk Spread writes over multiple shards 5
  • 6. Original 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif Event Data HTTP/1.0" 200 2326 “http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)” As BSON doc = { _id: ObjectId('4f442120eb03305789000000'), host: "127.0.0.1", time: ISODate("2000-10-10T20:55:36Z"), path: "/apache_pb.gif", referer: “http://www.example.com/start.html", user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)” } Insert to db.logs.insert( doc ) MongoDB 6
  • 7. Find all db.logs.find( { ‘path’ : ‘/index.html’ } ) logs for a URL Find all db.logs.find( { ‘time’ : logs for a { ‘$gte’ : new Date(2012,0), time ‘$lt’ : new Date(2012,1) } } ); range Find all db.logs.find( { logs for a ‘host’ : ‘127.0.0.1’, host over ‘time’ : { ‘$gte’ : new Date(2012,0), ‘$lt’ : new Date(2012, 1) } } ); a range of dates 7
  • 8. • Aggregation Framework for on-demand rollups • Map/Reduce Framework for background rollups • Pre-Aggregation for real-time reporting 8
  • 9. $project $match $limit $skip $unwind $group $sort Requests db.logs.aggregate( [ per day by { '$match': { URL 'time': { '$gte': new Date(2012,0), '$lt': new Date(2012,1) } } }, { '$project': { 'path': 1, 'date': { 'y': { '$year': '$time' }, 'm': { '$month': '$time' }, 'd': { '$dayOfMonth': '$time' } } } }, { '$group': { '_id': { 'p':'$path’, 'y': '$date.y', 'm': '$date.m', 'd': '$date.d' }, 'hits': { '$sum': 1 } } }, ]) 9
  • 10. { ‘ok’: 1, ‘result’: [ { '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 1 },'hits’: 124 } }, { '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 2 },'hits’: 245} }, { '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 3 },'hits’: 322} }, { '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 4 },'hits’: 175} }, { '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 5 },'hits’: 94} } ] } 10
  • 11. Generate var map = function() { hourly var key = { rollups p: this.path, d: new Date( from log this.ts.getFullYear(), data this.ts.getMonth(), this.ts.getDate(), this.ts.getHours(), 0, 0, 0) }; emit( key, { hits: 1 } ); } 11
  • 12. Generate var reduce = function(key, values) { hourly var r = { hits: 0 }; rollups values.forEach(function(v) { r.hits += v.hits; from log }); data return r; } ) 12
  • 13. Generate cutoff = new Date(2012,0,1) hourly rollups query = { 'ts': { '$gt': last_run, '$lt': cutoff } } from log db.logs.mapReduce( map, reduce, { data ‘query’: query, ‘out’: { ‘reduce’ : ‘stats.hourly’ } } ) last_run = cutoff 13
  • 14. > db.stats.hourly.find() { '_id': {'p':’/index.html’,’d’:ISODate(‚2012-0-1 00:00:00‛) }, ’value': { ’hits’: 124 } }, { '_id': {'p':’/index.html’,’d’:ISODate(‚2012-0-1 01:00:00‛) }, ’value': { ’hits’: 245} }, { '_id': {'p':’/index.html’,’d’:ISODate(‚2012-0-1 02:00:00‛) }, ’value': { ’hits’: 322} }, { '_id': {'p':’/index.html’,’d’:ISODate(‚2012-0-1 03:00:00‛) }, ’value': { ’hits’: 175} }, ... More ... 14
  • 15. Runs Runs every every day hour Map Map Reduce Reduce Collection 1 : Collection 2: Collection 3: Raw Logs Hourly Stats Daily Stats 15
  • 16. Data for { URL / _id: "20101010/site-1/apache_pb.gif", Date metadata: { date: ISODate("2000-10-10T00:00:00Z"), site: "site-1", page: "/apache_pb.gif" }, daily: 5468426, hourly: { "0": 227850, "1": 210231, ... "23": 20457 }, minute: { "0": 3612, "1": 3241, ... "1439": 2819 } } WARNING: arrays are not random accessed in MongoDB…. 16
  • 17. Data for { URL / _id: "20101010/site-1/apache_pb.gif", Date metadata: { date: ISODate("2000-10-10T00:00:00Z"), site: "site-1", page: "/apache_pb.gif" }, daily: 5468426, hourly: { "0": { ‚0‛ : 3612, ‚1‛ : 3241 … ‚59‛ : 2130 } "1": { … } …. ‚23‛: { ….} } 17
  • 18. Data for id_daily = dt_utc.strftime('%Y%m%d/') + site + page URL / hour = dt_utc.hour Date minute = dt_utc.minute # Get a datetime that only includes date info d = datetime.combine(dt_utc.date(), time.min) query = { '_id': id_daily, 'metadata': { 'date': d, 'site': site, 'page': page } } update = { '$inc': { 'hourly.%d' % (hour,): 1, 'minute.%d.%d' % (hour,minute): 1 } } db.stats.daily.update(query, update, upsert=True) 18
  • 20. Log Aggregation with MongoDB as sink More complex aggregations or integration with tools like Mahout 20
  • 21. Q&A 21