SlideShare a Scribd company logo
Yieldbot Tech Talk – MongoDB to k/v




                        © 2012 Yieldbot
            © 2012 Yieldbot / CONFIDENTIAL
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


                   What We Do
• Yieldbot technology creates marketplaces where
  advertisers target realtime consumer intent flowing
  through premium publishers.
• At a high level: Analytics + Ad Serving
   – Geo-distributed
      • Data collection
      • Realtime ad matching
   – Cascalog batch analytics
   – Rich Analytics Results visualizations



                          © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


          Why MongoDB (Dec 2009)
•   Needed manageable by dev team (1 person!)
•   Flexible
•   Easy to get started, run on laptop or deploy
•   Scale wasn’t initially biggest concern
•   Could focus on other stuff
     – Lucene
     – Analytics
     – Ad serving dynamics




                            © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


       How MongoDB Used Initially
• Configuration
   – Publisher profiles, ad matching rules, etc.
• Data collection
   – Pageviews, impressions, clicks
• Analytics results
• Task state tracking
• Lookup tables for ad serving
• Real-time ad stats




                           © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


          Couple Aspects of Note
• Master/Slave
   – convenient for simple durability
   – convenient for geo distribution
   – not unique to Mongo, now similar redis topology
• Indexing
   – Easy to set up, but eventually RAM scaling issue
   – initially great for efficient views of data in UI
   – moved analytics results as key/value in mongo
• Durable sharded config (replica sets) expensive



                          © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


                 Data Collection
• Mongo: collections for pageviews, impressions, clicks
   – Wasn’t archived anywhere else
   – Not where you want to infinitely scale
• Now flows through redis, to files, to S3




                          © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


     Data Collection with redis Assist
•   redis lists populated as events come in
•   Daemons pull off lists and write to files
•   Periodically compress and archive files to S3
•   S3 files used for input later
     – Hadoop (Cascalog) batch analytics
     – Advertising Stats Calculations




                            © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


          Matching Lookup Tables
• Mongo: collections for different lookup types
   – Eg., geo, url
   – Built periodically, updated on config change
   – Lookup in each, correlate results
• redis
   – Ability to pipeline operations in single server call
   – Set intersection across lookup dimensions and one
     response back
   – Same master/slave as Mongo for distribution



                           © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


                  Configuration
• Mongo
   – Database per publisher
   – Collections for objects
   – Denormalized where possible
   – Manual Foreign Keys
   – Obviously best candidate for relational model
• History and Versioning was paramount to us
   – Roll our own: HeroDB




                          © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


                        HeroDB
• History and granular versioning highest goal
• Database built on top of git
   – Golden database is a bare repo
   – Can clone to anywhere, make changes, push
   – Changes in single commit are atomic
• How, when, and who changed it
• Ability to set to specific previous state of DB
• Much more to do, in production 6+ months
   – Recent change, caching



                          © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


                Analytics Results
• ARCv1, Mongo: indexed collections
   – Very easy to code to
   – Initially with everything else in same server
   – Moved out to dedicated server
   – Memory became an issue
       • Indexes bigger than data itself
   – Overhead of importing Cascalog results
       • Pull json files from S3 to local disk
       • mongoimport files into DB



                           © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


         Analytics Results Cont’d
• ARCv2, Mongo: paged data, key/value
   – Migrated app to key/value access pattern
   – Much better memory usage
   – Application sharded, publishers spread around
   – DB per day per publisher, most recent 7 held
   – Still overhead of importing Hadoop results
      • Pull json files from S3 to local disk
      • mongoimport files into DB




                          © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


    Analytics Results - ElephantDB
• Cascalog support to directly write EDB format
   – Berkeley DB or LevelDB
• Ring Topology
   – Shards distributed around ring, consistent hashing
   – Configurable replication factor
   – Request to any node, forwards as necessary
   – Incrementally increase ring size
• Import from S3 efficient
   – Copy shard from S3 to local disk



                          © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


              Real-time Ad Stats
• Mongo: DB per day, collection by entity type
   – Document per entity instance
   – stat_type.hour.minute nested values, atomic
     increment
   – Never a good story around aggregating at larger
     timeframes
• Enter redis again




                          © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


          Real-time Ad Stats Cont’d
• redis has robust access patterns
    – More pipelining
•   Initially realtime and aggregated kept in redis
•   Issue with redis scaling is DB has to fit in memory
•   Time-period aggregations now kept in HBase
•   Only most recent hours kept in redis




                             © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


              Task State Tracking
• The last holdout
• Collection of tasks
   – Each task is a document
   – Indexed as needed
   – Mongo query and update syntax convenient
       • Both in static code, but also in Python or Mongo
         repl




                           © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


              Honorable Mention
• redis for the celery backend, used for task messaging
  infrastructure
• but was never mongo anyway...




                          © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


      MongoDB Migration Summary
•   Configuration                     HeroDB
•   Data Collection                   to S3 via redis
•   Analytics Results                 ElephantDB
•   Task State Tracking               still Mongo
•   Matcher Lookup Tables             redis
•   Real-time Ad Stats                redis/HBase




                          © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


                       Thanks!



Site: yieldbot.com
Blog: blog.yieldbot.com
Twitter: @yieldbot
Email: info@yieldbot.com




                           © 2012 Yieldbot

More Related Content

What's hot

Benefits of Using MongoDB Over RDBMSs
Benefits of Using MongoDB Over RDBMSsBenefits of Using MongoDB Over RDBMSs
Benefits of Using MongoDB Over RDBMSs
MongoDB
 
The Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDBThe Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDB
MongoDB
 
Webinar: The Visual Query Profiler and MongoDB Compass
Webinar: The Visual Query Profiler and MongoDB CompassWebinar: The Visual Query Profiler and MongoDB Compass
Webinar: The Visual Query Profiler and MongoDB Compass
MongoDB
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
Raul Chong
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsSteven Francia
 
Mongodb
MongodbMongodb
Mongodb
Apurva Vyas
 
Why NoSQL and MongoDB for Big Data
Why NoSQL and MongoDB for Big DataWhy NoSQL and MongoDB for Big Data
Why NoSQL and MongoDB for Big Data
William LaForest
 
Introduction to structured authoring
Introduction to structured authoringIntroduction to structured authoring
Introduction to structured authoring
Rob Hanna, ECMs
 

What's hot (8)

Benefits of Using MongoDB Over RDBMSs
Benefits of Using MongoDB Over RDBMSsBenefits of Using MongoDB Over RDBMSs
Benefits of Using MongoDB Over RDBMSs
 
The Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDBThe Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDB
 
Webinar: The Visual Query Profiler and MongoDB Compass
Webinar: The Visual Query Profiler and MongoDB CompassWebinar: The Visual Query Profiler and MongoDB Compass
Webinar: The Visual Query Profiler and MongoDB Compass
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS Applications
 
Mongodb
MongodbMongodb
Mongodb
 
Why NoSQL and MongoDB for Big Data
Why NoSQL and MongoDB for Big DataWhy NoSQL and MongoDB for Big Data
Why NoSQL and MongoDB for Big Data
 
Introduction to structured authoring
Introduction to structured authoringIntroduction to structured authoring
Introduction to structured authoring
 

Similar to Yieldbot Tech Talk, Sept 20, 2012

Mongo db operations_v2
Mongo db operations_v2Mongo db operations_v2
Mongo db operations_v2
Thanabalan Sathneeganandan
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Ravi Teja
 
MongoDB Training
MongoDB TrainingMongoDB Training
MongoDB Training
Arcadian Learning
 
MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FSMongoDB
 
Introducing MongoDB into your Organization
Introducing MongoDB into your OrganizationIntroducing MongoDB into your Organization
Introducing MongoDB into your Organization
MongoDB
 
Webinar: How Banks Manage Reference Data with MongoDB
 Webinar: How Banks Manage Reference Data with MongoDB Webinar: How Banks Manage Reference Data with MongoDB
Webinar: How Banks Manage Reference Data with MongoDB
MongoDB
 
Mongodb Presentation
Mongodb PresentationMongodb Presentation
Mongodb PresentationHashim Shaikh
 
Branf final bringing mongodb into your organization - mongo db-boston2012
Branf final   bringing mongodb into your organization - mongo db-boston2012Branf final   bringing mongodb into your organization - mongo db-boston2012
Branf final bringing mongodb into your organization - mongo db-boston2012MongoDB
 
Mongodb Presentation
Mongodb PresentationMongodb Presentation
Mongodb Presentation
Hashim Shaikh
 
Mongodb hashim shaikh
Mongodb hashim shaikhMongodb hashim shaikh
Mongodb hashim shaikhHashim Shaikh
 
MongoDB World 2018: Data Analytics with MongoDB
MongoDB World 2018: Data Analytics with MongoDBMongoDB World 2018: Data Analytics with MongoDB
MongoDB World 2018: Data Analytics with MongoDB
MongoDB
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
MongoDB
 
Neo4j + MongoDB - SF Graph Database Meetup Group Presentation
Neo4j + MongoDB - SF Graph Database Meetup Group PresentationNeo4j + MongoDB - SF Graph Database Meetup Group Presentation
Neo4j + MongoDB - SF Graph Database Meetup Group Presentation
William Lyon
 
An Evening with MongoDB Detroit 2013
An Evening with MongoDB Detroit 2013An Evening with MongoDB Detroit 2013
An Evening with MongoDB Detroit 2013MongoDB
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB
 
Augmenting Mongo DB with Treasure Data
Augmenting Mongo DB with Treasure DataAugmenting Mongo DB with Treasure Data
Augmenting Mongo DB with Treasure Data
Treasure Data, Inc.
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure data
Treasure Data, Inc.
 
When and why to use MongoDB?
When and why to use MongoDB?When and why to use MongoDB?
When and why to use MongoDB?
adityakumar2080
 
Everything You Need to Know About MongoDB Development.pptx
Everything You Need to Know About MongoDB Development.pptxEverything You Need to Know About MongoDB Development.pptx
Everything You Need to Know About MongoDB Development.pptx
75waytechnologies
 

Similar to Yieldbot Tech Talk, Sept 20, 2012 (20)

Mongo db operations_v2
Mongo db operations_v2Mongo db operations_v2
Mongo db operations_v2
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
MongoDB Training
MongoDB TrainingMongoDB Training
MongoDB Training
 
MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FS
 
Introducing MongoDB into your Organization
Introducing MongoDB into your OrganizationIntroducing MongoDB into your Organization
Introducing MongoDB into your Organization
 
Webinar: How Banks Manage Reference Data with MongoDB
 Webinar: How Banks Manage Reference Data with MongoDB Webinar: How Banks Manage Reference Data with MongoDB
Webinar: How Banks Manage Reference Data with MongoDB
 
Mongodb Presentation
Mongodb PresentationMongodb Presentation
Mongodb Presentation
 
Branf final bringing mongodb into your organization - mongo db-boston2012
Branf final   bringing mongodb into your organization - mongo db-boston2012Branf final   bringing mongodb into your organization - mongo db-boston2012
Branf final bringing mongodb into your organization - mongo db-boston2012
 
Mongodb Presentation
Mongodb PresentationMongodb Presentation
Mongodb Presentation
 
Mongodb hashim shaikh
Mongodb hashim shaikhMongodb hashim shaikh
Mongodb hashim shaikh
 
MongoDB World 2018: Data Analytics with MongoDB
MongoDB World 2018: Data Analytics with MongoDBMongoDB World 2018: Data Analytics with MongoDB
MongoDB World 2018: Data Analytics with MongoDB
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
 
Neo4j + MongoDB - SF Graph Database Meetup Group Presentation
Neo4j + MongoDB - SF Graph Database Meetup Group PresentationNeo4j + MongoDB - SF Graph Database Meetup Group Presentation
Neo4j + MongoDB - SF Graph Database Meetup Group Presentation
 
Mongo bbmw
Mongo bbmwMongo bbmw
Mongo bbmw
 
An Evening with MongoDB Detroit 2013
An Evening with MongoDB Detroit 2013An Evening with MongoDB Detroit 2013
An Evening with MongoDB Detroit 2013
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data Presentation
 
Augmenting Mongo DB with Treasure Data
Augmenting Mongo DB with Treasure DataAugmenting Mongo DB with Treasure Data
Augmenting Mongo DB with Treasure Data
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure data
 
When and why to use MongoDB?
When and why to use MongoDB?When and why to use MongoDB?
When and why to use MongoDB?
 
Everything You Need to Know About MongoDB Development.pptx
Everything You Need to Know About MongoDB Development.pptxEverything You Need to Know About MongoDB Development.pptx
Everything You Need to Know About MongoDB Development.pptx
 

Recently uploaded

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 

Recently uploaded (20)

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 

Yieldbot Tech Talk, Sept 20, 2012

  • 1. Yieldbot Tech Talk – MongoDB to k/v © 2012 Yieldbot © 2012 Yieldbot / CONFIDENTIAL
  • 2. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 What We Do • Yieldbot technology creates marketplaces where advertisers target realtime consumer intent flowing through premium publishers. • At a high level: Analytics + Ad Serving – Geo-distributed • Data collection • Realtime ad matching – Cascalog batch analytics – Rich Analytics Results visualizations © 2012 Yieldbot
  • 3. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Why MongoDB (Dec 2009) • Needed manageable by dev team (1 person!) • Flexible • Easy to get started, run on laptop or deploy • Scale wasn’t initially biggest concern • Could focus on other stuff – Lucene – Analytics – Ad serving dynamics © 2012 Yieldbot
  • 4. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 How MongoDB Used Initially • Configuration – Publisher profiles, ad matching rules, etc. • Data collection – Pageviews, impressions, clicks • Analytics results • Task state tracking • Lookup tables for ad serving • Real-time ad stats © 2012 Yieldbot
  • 5. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Couple Aspects of Note • Master/Slave – convenient for simple durability – convenient for geo distribution – not unique to Mongo, now similar redis topology • Indexing – Easy to set up, but eventually RAM scaling issue – initially great for efficient views of data in UI – moved analytics results as key/value in mongo • Durable sharded config (replica sets) expensive © 2012 Yieldbot
  • 6. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Data Collection • Mongo: collections for pageviews, impressions, clicks – Wasn’t archived anywhere else – Not where you want to infinitely scale • Now flows through redis, to files, to S3 © 2012 Yieldbot
  • 7. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Data Collection with redis Assist • redis lists populated as events come in • Daemons pull off lists and write to files • Periodically compress and archive files to S3 • S3 files used for input later – Hadoop (Cascalog) batch analytics – Advertising Stats Calculations © 2012 Yieldbot
  • 8. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Matching Lookup Tables • Mongo: collections for different lookup types – Eg., geo, url – Built periodically, updated on config change – Lookup in each, correlate results • redis – Ability to pipeline operations in single server call – Set intersection across lookup dimensions and one response back – Same master/slave as Mongo for distribution © 2012 Yieldbot
  • 9. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Configuration • Mongo – Database per publisher – Collections for objects – Denormalized where possible – Manual Foreign Keys – Obviously best candidate for relational model • History and Versioning was paramount to us – Roll our own: HeroDB © 2012 Yieldbot
  • 10. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 HeroDB • History and granular versioning highest goal • Database built on top of git – Golden database is a bare repo – Can clone to anywhere, make changes, push – Changes in single commit are atomic • How, when, and who changed it • Ability to set to specific previous state of DB • Much more to do, in production 6+ months – Recent change, caching © 2012 Yieldbot
  • 11. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Analytics Results • ARCv1, Mongo: indexed collections – Very easy to code to – Initially with everything else in same server – Moved out to dedicated server – Memory became an issue • Indexes bigger than data itself – Overhead of importing Cascalog results • Pull json files from S3 to local disk • mongoimport files into DB © 2012 Yieldbot
  • 12. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Analytics Results Cont’d • ARCv2, Mongo: paged data, key/value – Migrated app to key/value access pattern – Much better memory usage – Application sharded, publishers spread around – DB per day per publisher, most recent 7 held – Still overhead of importing Hadoop results • Pull json files from S3 to local disk • mongoimport files into DB © 2012 Yieldbot
  • 13. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Analytics Results - ElephantDB • Cascalog support to directly write EDB format – Berkeley DB or LevelDB • Ring Topology – Shards distributed around ring, consistent hashing – Configurable replication factor – Request to any node, forwards as necessary – Incrementally increase ring size • Import from S3 efficient – Copy shard from S3 to local disk © 2012 Yieldbot
  • 14. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Real-time Ad Stats • Mongo: DB per day, collection by entity type – Document per entity instance – stat_type.hour.minute nested values, atomic increment – Never a good story around aggregating at larger timeframes • Enter redis again © 2012 Yieldbot
  • 15. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Real-time Ad Stats Cont’d • redis has robust access patterns – More pipelining • Initially realtime and aggregated kept in redis • Issue with redis scaling is DB has to fit in memory • Time-period aggregations now kept in HBase • Only most recent hours kept in redis © 2012 Yieldbot
  • 16. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Task State Tracking • The last holdout • Collection of tasks – Each task is a document – Indexed as needed – Mongo query and update syntax convenient • Both in static code, but also in Python or Mongo repl © 2012 Yieldbot
  • 17. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Honorable Mention • redis for the celery backend, used for task messaging infrastructure • but was never mongo anyway... © 2012 Yieldbot
  • 18. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 MongoDB Migration Summary • Configuration  HeroDB • Data Collection  to S3 via redis • Analytics Results  ElephantDB • Task State Tracking  still Mongo • Matcher Lookup Tables  redis • Real-time Ad Stats  redis/HBase © 2012 Yieldbot
  • 19. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Thanks! Site: yieldbot.com Blog: blog.yieldbot.com Twitter: @yieldbot Email: info@yieldbot.com © 2012 Yieldbot