SlideShare a Scribd company logo
1 of 39
StudyBlue




StudyBlue and MongoDB:
Implementation 101


October 18, 2011




StudyBlue, Inc.
Overview


  •      Who am I?

  •      Who is StudyBlue?

  •      Why MongoDB?

  •      How did we leverage MongoDB?

  •      What lessons did we learn?

  •      Q&A




StudyBlue, Inc.
Who am I?


  •      Sean Laurent

  •      sean@studyblue.com

  •      Director of Operations at StudyBlue, Inc.




StudyBlue, Inc.
studyblue.com



StudyBlue, Inc.
About StudyBlue

  •     Bottom-up attempt to improve student
        outcomes


  •     Online service for storing, studying, sharing
        and ultimately mastering course material


  •     Digital backpack for students


  •     Freemium business model




StudyBlue, Inc.
StudyBlue Usage

  •     Many simultaneous users


  •     Rapid growth


  •     Cyclical usage




StudyBlue, Inc.
The Challenge



StudyBlue, Inc.
Flashcard Scoring


  •      Track flashcard scoring

       •      Every single card

       •      Every single user

       •      Forever


  •      Provide aggregate statistics

       •      Flashcard deck

       •      Folder

       •      Overall


  •      Focus on content mastery



StudyBlue, Inc.
Scoring Results
StudyBlue, Inc.
The Problem


  •      Existing PostgreSQL database

  •      Reasonably large number of cards

  •      Large number of users

  •      Users base increasing rapidly

  •      Shift in usage - increasing faster than users

       •      Time on site

       •      Decks per user

       •      Average deck size

       •      Study sessions per user



StudyBlue, Inc.
Additional Requirements


  •      Support sustained rapid growth

  •      Highly available

  •      Minimize maintenance costs

  •      Active community

  •      Done yesterday




StudyBlue, Inc.
Why Mongo?



StudyBlue, Inc.
Alternatives


  •      Amazon Simple DB

       •      Far too simple


  •      Cassandra

       •      Difficult to add nodes and rebalance

       •      Column families cannot be modified w/out restart


  •      CouchDB

       •      Difficult to add nodes and rebalance


  •      Redis

       •      No native support for sharding/partitioning

       •      Master/slave only - no automatic failover

StudyBlue, Inc.
MongoDB for the Win


  •      Highly available

       •      Replica sets

       •      Automatic failover


  •      Shards

       •      Works across replica sets

       •      Easy to add additional shards


  •      Node addition

       •      Read performance degradation when adding nodes

            •     “hidden” flag

       •      No down time

StudyBlue, Inc.
More winning


  •      Atomic insert & replace

  •      Read balancing across slaves

  •      BSON/JSON document model

  •      It just works. Seriously.




StudyBlue, Inc.
Implementation



StudyBlue, Inc.
DevOps


  •      Amazon EC2

       •      Separate dev, test and production environments


  •      Operations testing

       •      Replication

       •      Failover


  •      Scripting & automation

       •      Creation

       •      Cloning




StudyBlue, Inc.
Development

  •     100% Java


  •     Existing PostgreSQL
        database

       •     System of record


       •     Synchronization issues




StudyBlue, Inc.
SQL Integration & Synchronization


  •      PostgreSQL considered system of record

  •      Asynchronous event driven

  •      Web servers queue change events

  •      Scoring server processes events

       •      Query PostgreSQL

       •      Update MongoDB




StudyBlue, Inc.
Architecture
StudyBlue, Inc.
MongoDB Schema


  •      Many shallow collections vs monolithic deep collection

  •      Leverage existing SQL knowledge

  •      Simplify SQL integration




StudyBlue, Inc.
Schema Design


  •      Two collections used together to map relationships

       •      Folder containing Deck

       •      Decks in a Folder

       •      Decks containing a Card

       •      Cards in a Deck


  •      Folders arranged in tree structure,

       •      One row per folder that points to its parent.

       •      Multiple queries required to build tree


  •      Postgres primary keys are used instead of object ids



StudyBlue, Inc.
StudyBlue, Inc.
Document Scores Example
StudyBlue, Inc.
Slave Reads


  •      SlaveOk set to true for most data retrieval

  •      Scoring calculations use Primary to ensure correctness




StudyBlue, Inc.
Data migration

  •     One-time process


  •     Postgres to MongoDB


  •     Ruby scripts


  •     Separate server




StudyBlue, Inc.
Key Issues



StudyBlue, Inc.
Summary

  •     Amazon EC2/EBS


  •     Java API


  •     MapReduce


  •     Replication


  •     Partitioning / Shards


  •     Performance




StudyBlue, Inc.
Amazon EC2 & EBS

  •     Plan for failure

       •     “When” not “if”


  •     EBS performance

       •     Inconsistent


       •     Limited by bandwidth


       •     60GB minimum


       •     RAID-0




StudyBlue, Inc.
Java API

  •     Not perfect

       •     Verbose

       •     Type safety

  •     Failover requires retry

       •     Up to 1 minute delay

  •     Read-only requests

       •     “slaveOk” works

       •     Burden on developer




StudyBlue, Inc.
Map Reduce

  •     Perfect for aggregation


  •     Not used by StudyBlue

       •     Not needed (yet)


       •     Difficult with multiple collections


       •     Reduce limited to masters


       •     Keep scalability simple


  •     Under consideration



StudyBlue, Inc.
Replication

  •     Automated failover


  •     Read scaling


  •     Maintenance


  •     Easy setup & configuration


  •     “Seed” node(s) for clients




StudyBlue, Inc.
Partitioning in the Cloud


  •      Operations perspective

       •      Dynamic changes in machines

            •     Config servers track machines

            •     Each node in replica set knows other nodes

            •     Avoids restarting applications when Mongo servers change

       •      Easy scaling

            •     Local shard servers

            •     Config servers store redundant copies

                  •   Two-phase commit




StudyBlue, Inc.
Useful EC2 Instance Types

  •     Config servers                         •       Mongo replica nodes
       •     t1.micro or m1.small                 •     Depends on memory needs

                                                  •     m2.xlarge, m2.2xlarge, m2.4xlarge or
                                                        cc1.4xlarge




         Name                       Memory              CU                       I/O
        m2.xlarge                   17.1 GB    6.5 (2 cores x 3.25)            medium

       m2.2xlarge                   34.2 GB    13 (4 cores x 3.25)               high

       m2.4xlarge                   68.4 GB    26 (8 cores x 3.25)               high

       cc1.4xlarge                   23 GB    33.5 (2 x Xeon X5570)            very high


StudyBlue, Inc.
Performance Issues


  •      Missing indexes

       •      Performance terrible without indexes

       •      Index on the fly


  •      Store array sizes in collection

  •      OR vs IN

  •      Redundant updates

       •      Events not consolidated




StudyBlue, Inc.
Lessons Learned



StudyBlue, Inc.
Key Lessons


  •      Amazon great, but plan for failure

  •      Leverage test platforms

  •      Use replica sets & partitions early

  •      Indexes critical

  •      Use IN instead of OR

  •      Java API cumbersome, but solid

  •      Design schema carefully


StudyBlue, Inc.
Q&A



StudyBlue, Inc.
Contact us
Web: http://www.studyblue.com
Twitter: @StudyBlue
Email: sean@studyblue.com




   StudyBlue, Inc.

More Related Content

What's hot

微博cache设计谈
微博cache设计谈微博cache设计谈
微博cache设计谈
Tim Y
 
Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014
marvin herrera
 
How_To_Soup_Up_Your_Farm
How_To_Soup_Up_Your_FarmHow_To_Soup_Up_Your_Farm
How_To_Soup_Up_Your_Farm
Nigel Price
 
Exadata 11-2-overview-v2 11
Exadata 11-2-overview-v2 11Exadata 11-2-overview-v2 11
Exadata 11-2-overview-v2 11
Oracle BH
 
Optimizing MySQL for Cascade Server
Optimizing MySQL for Cascade ServerOptimizing MySQL for Cascade Server
Optimizing MySQL for Cascade Server
hannonhill
 
WebObjects Optimization
WebObjects OptimizationWebObjects Optimization
WebObjects Optimization
WO Community
 

What's hot (20)

Running Open Source Solutions on Windows Azure
Running Open Source Solutions on Windows AzureRunning Open Source Solutions on Windows Azure
Running Open Source Solutions on Windows Azure
 
微博cache设计谈
微博cache设计谈微博cache设计谈
微博cache设计谈
 
Scaling Up and Out your Virtualized SQL Servers
Scaling Up and Out your Virtualized SQL ServersScaling Up and Out your Virtualized SQL Servers
Scaling Up and Out your Virtualized SQL Servers
 
Right-Sizing your SQL Server Virtual Machine
Right-Sizing your SQL Server Virtual MachineRight-Sizing your SQL Server Virtual Machine
Right-Sizing your SQL Server Virtual Machine
 
Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014
 
How_To_Soup_Up_Your_Farm
How_To_Soup_Up_Your_FarmHow_To_Soup_Up_Your_Farm
How_To_Soup_Up_Your_Farm
 
Caching 101 - WordCamp OC
Caching 101 - WordCamp OCCaching 101 - WordCamp OC
Caching 101 - WordCamp OC
 
Exadata 11-2-overview-v2 11
Exadata 11-2-overview-v2 11Exadata 11-2-overview-v2 11
Exadata 11-2-overview-v2 11
 
Presentation database on flash
Presentation   database on flashPresentation   database on flash
Presentation database on flash
 
[Pgday.Seoul 2018] PostgreSQL 성능을 위해 개발된 라이브러리 OS 소개 apposha
[Pgday.Seoul 2018]  PostgreSQL 성능을 위해 개발된 라이브러리 OS 소개 apposha[Pgday.Seoul 2018]  PostgreSQL 성능을 위해 개발된 라이브러리 OS 소개 apposha
[Pgday.Seoul 2018] PostgreSQL 성능을 위해 개발된 라이브러리 OS 소개 apposha
 
Optimizing MySQL for Cascade Server
Optimizing MySQL for Cascade ServerOptimizing MySQL for Cascade Server
Optimizing MySQL for Cascade Server
 
WebObjects Optimization
WebObjects OptimizationWebObjects Optimization
WebObjects Optimization
 
Papers We Love Too, June 2015: Haystack
Papers We Love Too, June 2015: Haystack Papers We Love Too, June 2015: Haystack
Papers We Love Too, June 2015: Haystack
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
IaaS for DBAs in Azure
IaaS for DBAs in AzureIaaS for DBAs in Azure
IaaS for DBAs in Azure
 
MongoDB and Amazon Web Services: Storage Options for MongoDB Deployments
MongoDB and Amazon Web Services: Storage Options for MongoDB DeploymentsMongoDB and Amazon Web Services: Storage Options for MongoDB Deployments
MongoDB and Amazon Web Services: Storage Options for MongoDB Deployments
 
Power Saturday 2019 B6 - SQL Server installation cookbook
Power Saturday 2019 B6 - SQL Server installation cookbookPower Saturday 2019 B6 - SQL Server installation cookbook
Power Saturday 2019 B6 - SQL Server installation cookbook
 
Windows Azure Blob Storage
Windows Azure Blob StorageWindows Azure Blob Storage
Windows Azure Blob Storage
 
MySQL 5.7 what's new
MySQL 5.7 what's newMySQL 5.7 what's new
MySQL 5.7 what's new
 
Study Notes: Facebook Haystack
Study Notes: Facebook HaystackStudy Notes: Facebook Haystack
Study Notes: Facebook Haystack
 

Viewers also liked

CV Al Rawajfeh Sep2016_CV & LoP
CV Al Rawajfeh Sep2016_CV & LoPCV Al Rawajfeh Sep2016_CV & LoP
CV Al Rawajfeh Sep2016_CV & LoP
aimanrawa
 
Using MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseUsing MongoDB as a high performance graph database
Using MongoDB as a high performance graph database
Chris Clarke
 
Viva las vegas
Viva las vegasViva las vegas
Viva las vegas
Syaff Hk
 
HP Quick Test Professional
HP Quick Test ProfessionalHP Quick Test Professional
HP Quick Test Professional
Vitaliy Ganzha
 
Po report 4
Po report 4Po report 4
Po report 4
Syaff Hk
 
Analisis surah al hujurat 10
Analisis surah al hujurat 10Analisis surah al hujurat 10
Analisis surah al hujurat 10
VERGITA HANDOKO
 
Intorudction into VBScript
Intorudction into VBScriptIntorudction into VBScript
Intorudction into VBScript
Vitaliy Ganzha
 
Traditional shopping vs online shopping
Traditional shopping vs online shopping Traditional shopping vs online shopping
Traditional shopping vs online shopping
Syaff Hk
 

Viewers also liked (19)

CV Al Rawajfeh Sep2016_CV & LoP
CV Al Rawajfeh Sep2016_CV & LoPCV Al Rawajfeh Sep2016_CV & LoP
CV Al Rawajfeh Sep2016_CV & LoP
 
Introduction
IntroductionIntroduction
Introduction
 
Using MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseUsing MongoDB as a high performance graph database
Using MongoDB as a high performance graph database
 
Issd africa photo competition publish
Issd africa photo competition publishIssd africa photo competition publish
Issd africa photo competition publish
 
Introduction of apache giraph project
Introduction of apache giraph projectIntroduction of apache giraph project
Introduction of apache giraph project
 
Viva las vegas
Viva las vegasViva las vegas
Viva las vegas
 
HP Quick Test Professional
HP Quick Test ProfessionalHP Quick Test Professional
HP Quick Test Professional
 
classroom learning community
classroom learning community classroom learning community
classroom learning community
 
Quality 101: Introduction to Continuous Improvement
Quality 101: Introduction to Continuous ImprovementQuality 101: Introduction to Continuous Improvement
Quality 101: Introduction to Continuous Improvement
 
Po report 4
Po report 4Po report 4
Po report 4
 
Zat Adiktif dan Psikotropika
Zat Adiktif dan PsikotropikaZat Adiktif dan Psikotropika
Zat Adiktif dan Psikotropika
 
Analisis surah al hujurat 10
Analisis surah al hujurat 10Analisis surah al hujurat 10
Analisis surah al hujurat 10
 
Eksposisi perbandingan dan pertentangan
Eksposisi perbandingan dan pertentanganEksposisi perbandingan dan pertentangan
Eksposisi perbandingan dan pertentangan
 
Intorudction into VBScript
Intorudction into VBScriptIntorudction into VBScript
Intorudction into VBScript
 
Profil negara maju dan berkembang Inggris dan Kenya
Profil negara maju dan berkembang Inggris dan KenyaProfil negara maju dan berkembang Inggris dan Kenya
Profil negara maju dan berkembang Inggris dan Kenya
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDB
 
Traditional shopping vs online shopping
Traditional shopping vs online shopping Traditional shopping vs online shopping
Traditional shopping vs online shopping
 
PKN Kelas 10 Smt 1 : Hak Asasi Manusia
PKN Kelas 10 Smt 1 : Hak Asasi ManusiaPKN Kelas 10 Smt 1 : Hak Asasi Manusia
PKN Kelas 10 Smt 1 : Hak Asasi Manusia
 
Getting Started with the AAA App
Getting Started with the AAA AppGetting Started with the AAA App
Getting Started with the AAA App
 

Similar to Leveraging MongoDB: An Introductory Case Study

A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
DATAVERSITY
 
Replication Solutions for PostgreSQL
Replication Solutions for PostgreSQLReplication Solutions for PostgreSQL
Replication Solutions for PostgreSQL
Peter Eisentraut
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
Don Demcsak
 
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Vigyan Jain
 
Serialization and performance by Sergey Morenets
Serialization and performance by Sergey MorenetsSerialization and performance by Sergey Morenets
Serialization and performance by Sergey Morenets
Alex Tumanoff
 

Similar to Leveraging MongoDB: An Introductory Case Study (20)

Store
StoreStore
Store
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
 
Drupal performance
Drupal performanceDrupal performance
Drupal performance
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
 
Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2
 
Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...
Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...
Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...
 
High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2
 
Keeping MongoDB Data Safe
Keeping MongoDB Data SafeKeeping MongoDB Data Safe
Keeping MongoDB Data Safe
 
Five Years of EC2 Distilled
Five Years of EC2 DistilledFive Years of EC2 Distilled
Five Years of EC2 Distilled
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
 
Replication Solutions for PostgreSQL
Replication Solutions for PostgreSQLReplication Solutions for PostgreSQL
Replication Solutions for PostgreSQL
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
Cost Effectively Run Multiple Oracle Database Copies at Scale
Cost Effectively Run Multiple Oracle Database Copies at Scale Cost Effectively Run Multiple Oracle Database Copies at Scale
Cost Effectively Run Multiple Oracle Database Copies at Scale
 
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
 
Machine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLMachine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkML
 
Serialization and performance by Sergey Morenets
Serialization and performance by Sergey MorenetsSerialization and performance by Sergey Morenets
Serialization and performance by Sergey Morenets
 
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless DreamsRainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
 
My Site is slow - Drupal Camp London 2013
My Site is slow - Drupal Camp London 2013My Site is slow - Drupal Camp London 2013
My Site is slow - Drupal Camp London 2013
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 

Leveraging MongoDB: An Introductory Case Study

  • 1. StudyBlue StudyBlue and MongoDB: Implementation 101 October 18, 2011 StudyBlue, Inc.
  • 2. Overview • Who am I? • Who is StudyBlue? • Why MongoDB? • How did we leverage MongoDB? • What lessons did we learn? • Q&A StudyBlue, Inc.
  • 3. Who am I? • Sean Laurent • sean@studyblue.com • Director of Operations at StudyBlue, Inc. StudyBlue, Inc.
  • 5. About StudyBlue • Bottom-up attempt to improve student outcomes • Online service for storing, studying, sharing and ultimately mastering course material • Digital backpack for students • Freemium business model StudyBlue, Inc.
  • 6. StudyBlue Usage • Many simultaneous users • Rapid growth • Cyclical usage StudyBlue, Inc.
  • 8. Flashcard Scoring • Track flashcard scoring • Every single card • Every single user • Forever • Provide aggregate statistics • Flashcard deck • Folder • Overall • Focus on content mastery StudyBlue, Inc.
  • 10. The Problem • Existing PostgreSQL database • Reasonably large number of cards • Large number of users • Users base increasing rapidly • Shift in usage - increasing faster than users • Time on site • Decks per user • Average deck size • Study sessions per user StudyBlue, Inc.
  • 11. Additional Requirements • Support sustained rapid growth • Highly available • Minimize maintenance costs • Active community • Done yesterday StudyBlue, Inc.
  • 13. Alternatives • Amazon Simple DB • Far too simple • Cassandra • Difficult to add nodes and rebalance • Column families cannot be modified w/out restart • CouchDB • Difficult to add nodes and rebalance • Redis • No native support for sharding/partitioning • Master/slave only - no automatic failover StudyBlue, Inc.
  • 14. MongoDB for the Win • Highly available • Replica sets • Automatic failover • Shards • Works across replica sets • Easy to add additional shards • Node addition • Read performance degradation when adding nodes • “hidden” flag • No down time StudyBlue, Inc.
  • 15. More winning • Atomic insert & replace • Read balancing across slaves • BSON/JSON document model • It just works. Seriously. StudyBlue, Inc.
  • 17. DevOps • Amazon EC2 • Separate dev, test and production environments • Operations testing • Replication • Failover • Scripting & automation • Creation • Cloning StudyBlue, Inc.
  • 18. Development • 100% Java • Existing PostgreSQL database • System of record • Synchronization issues StudyBlue, Inc.
  • 19. SQL Integration & Synchronization • PostgreSQL considered system of record • Asynchronous event driven • Web servers queue change events • Scoring server processes events • Query PostgreSQL • Update MongoDB StudyBlue, Inc.
  • 21. MongoDB Schema • Many shallow collections vs monolithic deep collection • Leverage existing SQL knowledge • Simplify SQL integration StudyBlue, Inc.
  • 22. Schema Design • Two collections used together to map relationships • Folder containing Deck • Decks in a Folder • Decks containing a Card • Cards in a Deck • Folders arranged in tree structure, • One row per folder that points to its parent. • Multiple queries required to build tree • Postgres primary keys are used instead of object ids StudyBlue, Inc.
  • 25. Slave Reads • SlaveOk set to true for most data retrieval • Scoring calculations use Primary to ensure correctness StudyBlue, Inc.
  • 26. Data migration • One-time process • Postgres to MongoDB • Ruby scripts • Separate server StudyBlue, Inc.
  • 28. Summary • Amazon EC2/EBS • Java API • MapReduce • Replication • Partitioning / Shards • Performance StudyBlue, Inc.
  • 29. Amazon EC2 & EBS • Plan for failure • “When” not “if” • EBS performance • Inconsistent • Limited by bandwidth • 60GB minimum • RAID-0 StudyBlue, Inc.
  • 30. Java API • Not perfect • Verbose • Type safety • Failover requires retry • Up to 1 minute delay • Read-only requests • “slaveOk” works • Burden on developer StudyBlue, Inc.
  • 31. Map Reduce • Perfect for aggregation • Not used by StudyBlue • Not needed (yet) • Difficult with multiple collections • Reduce limited to masters • Keep scalability simple • Under consideration StudyBlue, Inc.
  • 32. Replication • Automated failover • Read scaling • Maintenance • Easy setup & configuration • “Seed” node(s) for clients StudyBlue, Inc.
  • 33. Partitioning in the Cloud • Operations perspective • Dynamic changes in machines • Config servers track machines • Each node in replica set knows other nodes • Avoids restarting applications when Mongo servers change • Easy scaling • Local shard servers • Config servers store redundant copies • Two-phase commit StudyBlue, Inc.
  • 34. Useful EC2 Instance Types • Config servers • Mongo replica nodes • t1.micro or m1.small • Depends on memory needs • m2.xlarge, m2.2xlarge, m2.4xlarge or cc1.4xlarge Name Memory CU I/O m2.xlarge 17.1 GB 6.5 (2 cores x 3.25) medium m2.2xlarge 34.2 GB 13 (4 cores x 3.25) high m2.4xlarge 68.4 GB 26 (8 cores x 3.25) high cc1.4xlarge 23 GB 33.5 (2 x Xeon X5570) very high StudyBlue, Inc.
  • 35. Performance Issues • Missing indexes • Performance terrible without indexes • Index on the fly • Store array sizes in collection • OR vs IN • Redundant updates • Events not consolidated StudyBlue, Inc.
  • 37. Key Lessons • Amazon great, but plan for failure • Leverage test platforms • Use replica sets & partitions early • Indexes critical • Use IN instead of OR • Java API cumbersome, but solid • Design schema carefully StudyBlue, Inc.
  • 39. Contact us Web: http://www.studyblue.com Twitter: @StudyBlue Email: sean@studyblue.com StudyBlue, Inc.

Editor's Notes

  1. \n
  2. \n
  3. - Developer at heart\n- 15 years experience\n- Responsible for selecting Mongo\n\n
  4. \n
  5. - Bottom-up attempt to improve student outcomes through disruptive change outside of the education system. \n- Allows students to create and store lecture notes and flashcards and access them online and via mobile apps (iOS and Android)\n
  6. - No public numbers\n- 1000 simultaneous users (peak)\n
  7. \n
  8. \n
  9. \n
  10. - Over 20 million cards now\n- Approx 40 million by Xmas, 80-100 million by May 2012, 200+ million by end 2012\n
  11. \n
  12. \n
  13. \n
  14. \n
  15. - Read balancing (slaveOk) discuss later\n- No downtime with Mongo since launch\n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. - Relationship mapping is example of problem with NoSQL\n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. - Bean serialization\n- Annotations for slaveOk\n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n