4. • Lots of data sources
High write volume
• Lots of data from each source
Dynamic queries • Users can drill down into data
• Lots of clients
Fast queries
• High request rate
Minimize delay
between collection &
• How long before an event appears
query in a report?
4
5. Upserts avoid
unnecessary reads
Asynchronous writes
Data
Data
Sources
Data Writes buffered in
Sources
Data
Sources RAM and flushed to
Sources disk in bulk
Spread writes over
multiple shards
5
6. Original 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif
Event Data HTTP/1.0" 200 2326 “http://www.example.com/start.html" "Mozilla/4.08
[en] (Win98; I ;Nav)”
As BSON doc = {
_id: ObjectId('4f442120eb03305789000000'),
host: "127.0.0.1",
time: ISODate("2000-10-10T20:55:36Z"),
path: "/apache_pb.gif",
referer: “http://www.example.com/start.html",
user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)”
}
Insert to
db.logs.insert( doc )
MongoDB
6
7. Find all db.logs.find( { ‘path’ : ‘/index.html’ } )
logs for a
URL
Find all db.logs.find( { ‘time’ :
logs for a { ‘$gte’ : new Date(2012,0),
time ‘$lt’ : new Date(2012,1) } } );
range
Find all db.logs.find( {
logs for a ‘host’ : ‘127.0.0.1’,
host over ‘time’ : { ‘$gte’ : new Date(2012,0),
‘$lt’ : new Date(2012, 1) } } );
a range of
dates
7
8. • Aggregation Framework for on-demand
rollups
• Map/Reduce Framework for background
rollups
• Pre-Aggregation for real-time reporting
8