Real Time AnalyticsChad Tindelchad.tindel@10gen.com
The goalReal TimeAnalytics EngineReal TimeAnalytics EngineDataSourceDataSourceDataSource
Solution goals
Simple log storageDesign Pattern
Aggregation - PipelinesAggregation - Pipelines• Aggregation requests specify a pipeline• A pipeline is a series of operati...
Aggregation PipelineAggregation Pipeline
Aggregation - PipelinesAggregation - Pipelinesdb.collection.aggregate([ {$match: … },{$group: … },{$limit: …}, etc]
Pipeline OperationsPipeline Operations• $match– Uses a query predicate (like .find({…})) as afilter{ $match : { author : "...
Pipeline OperationsPipeline Operations• $project– Uses a sample document to determine theshape of the result (similar to ....
Pipeline OperationsPipeline Operations• $unwind– Hands out array elements one at a time{ $unwind : {"$myarray" } }• $unwin...
Pipeline OperationsPipeline Operations• $group– Aggregates items into buckets defined by akey
GroupingGrouping• $group aggregation expressions– Define a grouping key as the _id of the result– Total grouped column val...
Pipeline OperationsPipeline Operations• $sort– Sort documents– Sort specifications are the same as today,e.g., $sort:{ key...
Pipeline OperationsPipeline Operations• $limit– Only allow the specified number of documentsto pass{ $limit : 20 }
Pipeline OperationsPipeline Operations• $skip– Skip over the specified number of documents{ $skip : 10 }
Computed ExpressionsComputed Expressions• Available in $project operations• Prefix expression language– Add two fields: $a...
Computed ExpressionsComputed Expressions(continued)(continued)• String functions– toUpper, toLower, substr• Date field ext...
Sample dataOriginalEventData127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gifHTTP/1.0" 200 2326 “http://w...
Dynamic QueriesFind alllogs fora URLdb.logs.find( { ‘path’ : ‘/index.html’ } )Find alllogs fora timerangedb.logs.find( { ‘...
Aggregation FrameworkRequests perday byURLdb.logs.aggregate( [{ $match: {time: {$gte: new Date(2012,0),$lt: new Date(2012,...
Aggregation Framework{‘ok’: 1,‘result’: [{ _id: {p:’/index.html’,y: 2012,m: 1,d: 1 },hits’: 124 } },{ _id: {p:’/index.html...
Roll-ups with map-reduceDesign Pattern
Map Reduce – Map PhaseGenerate hourlyrollupsfrom logdatavar map = function() {var key = {p: this.path,d: new Date(this.ts....
Map Reduce – Reduce PhaseGenerate hourlyrollupsfrom logdatavar reduce = function(key, values) {var r = { hits: 0 };values....
Map ReduceGenerate hourlyrollupsfrom logdatacutoff = new Date(2012,0,1)query = { ts: { $gt: last_run, $lt: cutoff } }db.lo...
Map Reduce Output> db.stats.hourly.find(){ _id: {p:’/index.html’,’d’:ISODate(“2012-0-1 00:00:00”) },’value: { ’hits’: 124 ...
Chained Map ReduceCollection 1 :Raw LogsCollection 1 :Raw LogsMapReduceMapReduceCollection 2:Hourly StatsCollection 2:Hour...
Pre-aggregateddocumentsDesign Pattern
Pre-AggregationData forURL /Date{_id: "20101010/site-1/apache_pb.gif",metadata: {date: ISODate("2000-10-10T00:00:00Z"),sit...
Pre-AggregationData forURL /Dateid_daily = dt_utc.strftime(%Y%m%d/) + site + pagehour = dt_utc.hourminute = dt_utc.minute#...
Pre-AggregationData forURL /Datedb.stats.daily.findOne({metadata: {date:dt,site:site-1,page:/index.html}},{ minute: 1 });
Solution Architect, 10gen
Upcoming SlideShare
Loading in …5
×

Schema Design by Chad Tindel, Solution Architect, 10gen

5,582 views
5,459 views

Published on

MongoDB’s basic unit of storage is a document. Documents can represent rich, schema-free data structures, meaning that we have several viable alternatives to the normalized, relational model. In this talk, we’ll discuss the tradeoff of various data modeling strategies in MongoDB using a library as a sample application. You will learn how to work with documents, evolve your schema, and common schema design patterns.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,582
On SlideShare
0
From Embeds
0
Number of Embeds
171
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Schema Design by Chad Tindel, Solution Architect, 10gen

  1. 1. Real Time AnalyticsChad Tindelchad.tindel@10gen.com
  2. 2. The goalReal TimeAnalytics EngineReal TimeAnalytics EngineDataSourceDataSourceDataSource
  3. 3. Solution goals
  4. 4. Simple log storageDesign Pattern
  5. 5. Aggregation - PipelinesAggregation - Pipelines• Aggregation requests specify a pipeline• A pipeline is a series of operations• Conceptually, the members of a collectionare passed through a pipeline to producea result– Similar to a Unix command-line pipe
  6. 6. Aggregation PipelineAggregation Pipeline
  7. 7. Aggregation - PipelinesAggregation - Pipelinesdb.collection.aggregate([ {$match: … },{$group: … },{$limit: …}, etc]
  8. 8. Pipeline OperationsPipeline Operations• $match– Uses a query predicate (like .find({…})) as afilter{ $match : { author : "dave" } }{ $match : { score : { $gt : 50, $lte : 90 } } }
  9. 9. Pipeline OperationsPipeline Operations• $project– Uses a sample document to determine theshape of the result (similar to .find()’s 2ndoptional argument)• Include or exclude fields• Compute new fields– Arithmetic expressions, including built-in functions– Pull fields from nested documents to the top– Push fields from the top down into new virtual documents
  10. 10. Pipeline OperationsPipeline Operations• $unwind– Hands out array elements one at a time{ $unwind : {"$myarray" } }• $unwind “streams” arrays– Array values are doled out one at time in thecontext of their surrounding document– Makes it possible to filter out elements beforereturning
  11. 11. Pipeline OperationsPipeline Operations• $group– Aggregates items into buckets defined by akey
  12. 12. GroupingGrouping• $group aggregation expressions– Define a grouping key as the _id of the result– Total grouped column values: $sum– Average grouped column values: $avg– Collect grouped column values in an array orset: $push, $addToSet– Other functions• $min, $max, $first, $last
  13. 13. Pipeline OperationsPipeline Operations• $sort– Sort documents– Sort specifications are the same as today,e.g., $sort:{ key1: 1, key2: -1, …}{ $sort : {“total”:-1} }
  14. 14. Pipeline OperationsPipeline Operations• $limit– Only allow the specified number of documentsto pass{ $limit : 20 }
  15. 15. Pipeline OperationsPipeline Operations• $skip– Skip over the specified number of documents{ $skip : 10 }
  16. 16. Computed ExpressionsComputed Expressions• Available in $project operations• Prefix expression language– Add two fields: $add:[“$field1”, “$field2”]– Provide a value for a missing field: $ifNull:[“$field1”, “$field2”]– Nesting: $add:[“$field1”, $ifNull:[“$field2”,“$field3”]](continued)
  17. 17. Computed ExpressionsComputed Expressions(continued)(continued)• String functions– toUpper, toLower, substr• Date field extraction– Get year, month, day, hour, etc, from ISODate• Date arithmetic• Null value substitution (like MySQL ifnull(),Oracle nvl())• Ternary conditional– Return one of two values based on a predicate• Other functions….– And we can easily add more as required
  18. 18. Sample dataOriginalEventData127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gifHTTP/1.0" 200 2326 “http://www.example.com/start.html" "Mozilla/4.08[en] (Win98; I ;Nav)”As JSON doc = {_id: ObjectId(4f442120eb03305789000000),host: "127.0.0.1",time: ISODate("2000-10-10T20:55:36Z"),path: "/apache_pb.gif",referer: “http://www.example.com/start.html",user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)”}Insert toMongoDBdb.logs.insert( doc )
  19. 19. Dynamic QueriesFind alllogs fora URLdb.logs.find( { ‘path’ : ‘/index.html’ } )Find alllogs fora timerangedb.logs.find( { ‘time’ :{ ‘$gte’ : new Date(2012,0),‘$lt’ : new Date(2012,1) } } );Find alllogs fora hostover arange ofdatesdb.logs.find( {‘host’ : ‘127.0.0.1’,‘time’ : { ‘$gte’ : new Date(2012,0),‘$lt’ : new Date(2012, 1) } } );
  20. 20. Aggregation FrameworkRequests perday byURLdb.logs.aggregate( [{ $match: {time: {$gte: new Date(2012,0),$lt: new Date(2012,1) } } },{ $project: {path: 1,date: {y: { $year: $time },m: { $month: $time },d: { $dayOfMonth: $time } } } },{ $group: {_id: {p:$path’,y: $date.y,m: $date.m,d: $date.d },hits: { $sum: 1 } } },])
  21. 21. Aggregation Framework{‘ok’: 1,‘result’: [{ _id: {p:’/index.html’,y: 2012,m: 1,d: 1 },hits’: 124 } },{ _id: {p:’/index.html’,y: 2012,m: 1,d: 2 },hits’: 245} },{ _id: {p:’/index.html’,y: 2012,m: 1,d: 3 },hits’: 322} },{ _id: {p:’/index.html’,y: 2012,m: 1,d: 4 },hits’: 175} },{ _id: {p:’/index.html’,y: 2012,m: 1,d: 5 },hits’: 94} }]}
  22. 22. Roll-ups with map-reduceDesign Pattern
  23. 23. Map Reduce – Map PhaseGenerate hourlyrollupsfrom logdatavar map = function() {var key = {p: this.path,d: new Date(this.ts.getFullYear(),this.ts.getMonth(),this.ts.getDate(),this.ts.getHours(),0, 0, 0) };emit( key, { hits: 1 } );}
  24. 24. Map Reduce – Reduce PhaseGenerate hourlyrollupsfrom logdatavar reduce = function(key, values) {var r = { hits: 0 };values.forEach(function(v) {r.hits += v.hits;});return r;})
  25. 25. Map ReduceGenerate hourlyrollupsfrom logdatacutoff = new Date(2012,0,1)query = { ts: { $gt: last_run, $lt: cutoff } }db.logs.mapReduce( map, reduce, {‘query’: query,‘out’: { ‘reduce’ : ‘stats.hourly’ } } )last_run = cutoff
  26. 26. Map Reduce Output> db.stats.hourly.find(){ _id: {p:’/index.html’,’d’:ISODate(“2012-0-1 00:00:00”) },’value: { ’hits’: 124 } },{ _id: {p:’/index.html’,’d’:ISODate(“2012-0-1 01:00:00”) },’value: { ’hits’: 245} },{ _id: {p:’/index.html’,’d’:ISODate(“2012-0-1 02:00:00”) },’value: { ’hits’: 322} },{ _id: {p:’/index.html’,’d’:ISODate(“2012-0-1 03:00:00”) },’value: { ’hits’: 175} },... More ...
  27. 27. Chained Map ReduceCollection 1 :Raw LogsCollection 1 :Raw LogsMapReduceMapReduceCollection 2:Hourly StatsCollection 2:Hourly StatsCollection 3:Daily StatsCollection 3:Daily StatsMapReduceMapReduceRunsevery hourRunsevery day
  28. 28. Pre-aggregateddocumentsDesign Pattern
  29. 29. Pre-AggregationData forURL /Date{_id: "20101010/site-1/apache_pb.gif",metadata: {date: ISODate("2000-10-10T00:00:00Z"),site: "site-1",page: "/apache_pb.gif" },daily: 5468426,hourly: {"0": 227850,"1": 210231,..."23": 20457 },minute: {"0": 3612,"1": 3241,..."1439": 2819 }}
  30. 30. Pre-AggregationData forURL /Dateid_daily = dt_utc.strftime(%Y%m%d/) + site + pagehour = dt_utc.hourminute = dt_utc.minute# Get a datetime that only includes date infod = datetime.combine(dt_utc.date(), time.min)query = {_id: id_daily,metadata: { date: d, site: site, page: page } }update = { $inc: {‘daily’ : 1,hourly.%d % (hour,): 1,minute.%d.%d % (hour,minute): 1 } }db.stats.daily.update(query, update, upsert=True)
  31. 31. Pre-AggregationData forURL /Datedb.stats.daily.findOne({metadata: {date:dt,site:site-1,page:/index.html}},{ minute: 1 });
  32. 32. Solution Architect, 10gen

×