Yieldbot Tech Talk, Sept 20, 2012

4,179 views

Published on

Published in: Technology, Education
1 Comment
0 Likes
Statistics
Notes
  • Talk given at Yieldbot Tech Talks meetup group:

    http://www.meetup.com/Yieldbot-Tech-Talks/
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total views
4,179
On SlideShare
0
From Embeds
0
Number of Embeds
2,757
Actions
Shares
0
Downloads
10
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

Yieldbot Tech Talk, Sept 20, 2012

  1. 1. Yieldbot Tech Talk – MongoDB to k/v © 2012 Yieldbot © 2012 Yieldbot / CONFIDENTIAL
  2. 2. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 What We Do• Yieldbot technology creates marketplaces where advertisers target realtime consumer intent flowing through premium publishers.• At a high level: Analytics + Ad Serving – Geo-distributed • Data collection • Realtime ad matching – Cascalog batch analytics – Rich Analytics Results visualizations © 2012 Yieldbot
  3. 3. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Why MongoDB (Dec 2009)• Needed manageable by dev team (1 person!)• Flexible• Easy to get started, run on laptop or deploy• Scale wasn’t initially biggest concern• Could focus on other stuff – Lucene – Analytics – Ad serving dynamics © 2012 Yieldbot
  4. 4. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 How MongoDB Used Initially• Configuration – Publisher profiles, ad matching rules, etc.• Data collection – Pageviews, impressions, clicks• Analytics results• Task state tracking• Lookup tables for ad serving• Real-time ad stats © 2012 Yieldbot
  5. 5. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Couple Aspects of Note• Master/Slave – convenient for simple durability – convenient for geo distribution – not unique to Mongo, now similar redis topology• Indexing – Easy to set up, but eventually RAM scaling issue – initially great for efficient views of data in UI – moved analytics results as key/value in mongo• Durable sharded config (replica sets) expensive © 2012 Yieldbot
  6. 6. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Data Collection• Mongo: collections for pageviews, impressions, clicks – Wasn’t archived anywhere else – Not where you want to infinitely scale• Now flows through redis, to files, to S3 © 2012 Yieldbot
  7. 7. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Data Collection with redis Assist• redis lists populated as events come in• Daemons pull off lists and write to files• Periodically compress and archive files to S3• S3 files used for input later – Hadoop (Cascalog) batch analytics – Advertising Stats Calculations © 2012 Yieldbot
  8. 8. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Matching Lookup Tables• Mongo: collections for different lookup types – Eg., geo, url – Built periodically, updated on config change – Lookup in each, correlate results• redis – Ability to pipeline operations in single server call – Set intersection across lookup dimensions and one response back – Same master/slave as Mongo for distribution © 2012 Yieldbot
  9. 9. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Configuration• Mongo – Database per publisher – Collections for objects – Denormalized where possible – Manual Foreign Keys – Obviously best candidate for relational model• History and Versioning was paramount to us – Roll our own: HeroDB © 2012 Yieldbot
  10. 10. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 HeroDB• History and granular versioning highest goal• Database built on top of git – Golden database is a bare repo – Can clone to anywhere, make changes, push – Changes in single commit are atomic• How, when, and who changed it• Ability to set to specific previous state of DB• Much more to do, in production 6+ months – Recent change, caching © 2012 Yieldbot
  11. 11. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Analytics Results• ARCv1, Mongo: indexed collections – Very easy to code to – Initially with everything else in same server – Moved out to dedicated server – Memory became an issue • Indexes bigger than data itself – Overhead of importing Cascalog results • Pull json files from S3 to local disk • mongoimport files into DB © 2012 Yieldbot
  12. 12. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Analytics Results Cont’d• ARCv2, Mongo: paged data, key/value – Migrated app to key/value access pattern – Much better memory usage – Application sharded, publishers spread around – DB per day per publisher, most recent 7 held – Still overhead of importing Hadoop results • Pull json files from S3 to local disk • mongoimport files into DB © 2012 Yieldbot
  13. 13. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Analytics Results - ElephantDB• Cascalog support to directly write EDB format – Berkeley DB or LevelDB• Ring Topology – Shards distributed around ring, consistent hashing – Configurable replication factor – Request to any node, forwards as necessary – Incrementally increase ring size• Import from S3 efficient – Copy shard from S3 to local disk © 2012 Yieldbot
  14. 14. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Real-time Ad Stats• Mongo: DB per day, collection by entity type – Document per entity instance – stat_type.hour.minute nested values, atomic increment – Never a good story around aggregating at larger timeframes• Enter redis again © 2012 Yieldbot
  15. 15. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Real-time Ad Stats Cont’d• redis has robust access patterns – More pipelining• Initially realtime and aggregated kept in redis• Issue with redis scaling is DB has to fit in memory• Time-period aggregations now kept in HBase• Only most recent hours kept in redis © 2012 Yieldbot
  16. 16. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Task State Tracking• The last holdout• Collection of tasks – Each task is a document – Indexed as needed – Mongo query and update syntax convenient • Both in static code, but also in Python or Mongo repl © 2012 Yieldbot
  17. 17. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Honorable Mention• redis for the celery backend, used for task messaging infrastructure• but was never mongo anyway... © 2012 Yieldbot
  18. 18. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 MongoDB Migration Summary• Configuration  HeroDB• Data Collection  to S3 via redis• Analytics Results  ElephantDB• Task State Tracking  still Mongo• Matcher Lookup Tables  redis• Real-time Ad Stats  redis/HBase © 2012 Yieldbot
  19. 19. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Thanks!Site: yieldbot.comBlog: blog.yieldbot.comTwitter: @yieldbotEmail: info@yieldbot.com © 2012 Yieldbot

×