Your SlideShare is downloading. ×
MongoDB Real-time Data Collection and Stats Generation
MongoDB Real-time Data Collection and Stats Generation
MongoDB Real-time Data Collection and Stats Generation
MongoDB Real-time Data Collection and Stats Generation
MongoDB Real-time Data Collection and Stats Generation
MongoDB Real-time Data Collection and Stats Generation
MongoDB Real-time Data Collection and Stats Generation
MongoDB Real-time Data Collection and Stats Generation
MongoDB Real-time Data Collection and Stats Generation
MongoDB Real-time Data Collection and Stats Generation
MongoDB Real-time Data Collection and Stats Generation
MongoDB Real-time Data Collection and Stats Generation
MongoDB Real-time Data Collection and Stats Generation
MongoDB Real-time Data Collection and Stats Generation
MongoDB Real-time Data Collection and Stats Generation
MongoDB Real-time Data Collection and Stats Generation
MongoDB Real-time Data Collection and Stats Generation
MongoDB Real-time Data Collection and Stats Generation
MongoDB Real-time Data Collection and Stats Generation
MongoDB Real-time Data Collection and Stats Generation
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

MongoDB Real-time Data Collection and Stats Generation

38,360

Published on

My talk from #mongoseattle on how I've used MongoDB for real-time data collection and stats generation. Includes basic usage of increment modifiers as well as map/reduce example.

My talk from #mongoseattle on how I've used MongoDB for real-time data collection and stats generation. Includes basic usage of increment modifiers as well as map/reduce example.

Published in: Technology
3 Comments
52 Likes
Statistics
Notes
  • I can't seem to convert your Keynote to PDF / view it on my Keynote program (error displayed is 'too old version')
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • is it well with analytic queries on data about 3 GB and RAM 2 GB? how it perform in this scenario?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • are you using mongoid orm here?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
38,360
On Slideshare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
261
Comments
3
Likes
52
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide








































  • Transcript

    • 1. MongoDB for real-time data collection and stats generation Damon Cortesi (@dacort) damon@rowfeeder.com
    • 2. The Past • Lots of data • 2M rows/day • Post-computation is slow • TweetStats == 1B tweets
    • 3. mysql> select count(*) from gnip_activity_2_1 WHERE created_at BETWEEN '2010-03-01' AND '2010-03-02'; +----------+ | count(*) | +----------+ | 1480294 | +----------+ 1 row in set (6.32 sec) mysql> SELECT app,(count(*)/(select count(*) from gnip_activity_2_1 WHERE created_at BETWEEN '2010-03-01' AND '2010-03-02'))*100 AS percent, count(*) as count -> FROM gnip_activity_2_1 -> WHERE created_at BETWEEN '2010-03-01' AND '2010-03-02' -> GROUP BY app -> ORDER BY count DESC -> LIMIT 10; +-------------+---------+--------+ | app | percent | count | +-------------+---------+--------+ | web | 31.1618 | 461286 | | TweetDeck | 12.4685 | 184570 | | UberTwitter | 7.3333 | 108555 | This is NOT real-time | twitterfeed | 7.1350 | 105619 | | API | 5.3761 | 79582 | | Echofon | 4.2635 | 63112 | | Tweetie | 3.4734 | 51416 | | Seesmic | 2.1913 | 32438 | | mobile web | 1.8382 | 27211 | | HootSuite | 1.7951 | 26573 | +-------------+---------+--------+ 10 rows in set (1 min 27.07 sec)
    • 4. So now what? Tokyo Riak CouchDB MongoDB http://www.flickr.com/photos/bob_august/4307291275/
    • 5. • Initially simple - Tweets -> Spreadsheet • “Maybe we should save this data...” • Real-time updates > db.serverStatus() • Minimal stats generation "opcounters" : { "insert" : 36687455, "query" : 857059, • Writes > Reads (200x) "update" : 189207744, "delete" : 0, "getmore" : 4176334, "command" : 36734580 }
    • 6. Streaming Twitter curl http://stream.twitter.com/1/statuses/sample.json - u<user>:<pass> | mongoimport -c twitter_live Courtesy @eliothorowitz ...
    • 7. Slightly more complex ;) { Twitter/Facebook/etc Redis/Resque Queues Process MongoDB { Process Process Save/Update/Stats
    • 8. Modifier Operations No query/retrieve Just $set or $inc
    • 9. Stats
    • 10. Other Benefits • New field? No multi-day ALTER statement. • Auto-sharding in 1.6 • --notablescan • aka, don’t pull a Twitter • “On Monday, our users database, where we store millions of user records, got hung up running a long-running query” -- 7/21
    • 11. So...that earlier SQL? Incrementers Much Better
    • 12. Simple. Pre-Computed. Wait, can’t I pre-compute in MySQL? Mongo == Fire/forget and async
    • 13. But sometimes... • Aggregation might still be necessary • This works...but what if we need more?
    • 14. lnkby.me ‣ Problem statement: 1. Aggregate stats on shortened links 2. Top domains based on # clicks Easy Slightly more difficult 3. Top users driving traffic to those top domains for the past seven days
    • 15. Solution • Gather stats • Map/Reduce! • Server-side JavaScript • Temporary collection to hold output • Can be written to permanent collection • temp collection renamed atomically
    • 16. click
    • 17. aggregate Get Top 10 Domains, then...
    • 18. Caveat • Indexes, indexes, indexes • Compound indexes • a,b,c • Query on a; a,b; or a,b,c • Sort on last field
    • 19. Mongo++ • High-volume updates -- win • Stats generation -- win • Mutable schema && json -- win

    ×