SlideShare a Scribd company logo
StudyBlue




Databases at Scale:
A MongoDB Case Study


August 23, 2012




StudyBlue, Inc.
Overview


  •      About Me

  •      About StudyBlue

  •      Why MongoDB?

  •      Leveraging MongoDB

  •      Key Issues

  •      Q&A




StudyBlue, Inc.
Who am I?


  •      Sean Laurent

  •      sean@studyblue.com

  •      Head of Operations at StudyBlue, Inc.




StudyBlue, Inc.
studyblue.com



StudyBlue, Inc.
About StudyBlue

  •     Online service for storing, studying, sharing
        and ultimately mastering course material


  •     Digital backpack for students




StudyBlue, Inc.
StudyBlue Usage

  •     Many simultaneous users


  •     Rapid growth


  •     Cyclical usage




StudyBlue, Inc.
Initial Use Case



StudyBlue, Inc.
Flashcard Scoring


  •      Track flashcard scoring over time

       •      Every single card

       •      Every single user

       •      Forever


  •      Provide aggregate statistics

       •      Flashcard deck

       •      Folder

       •      Overall


  •      Focus on content mastery



StudyBlue, Inc.
Scoring Results
StudyBlue, Inc.
The Problem


  •      Reasonably large number of cards

  •      Large number of users

  •      Users base increasing rapidly

  •      Shift in usage - increasing faster than users

       •      Time on site

       •      Decks per user

       •      Average deck size

       •      Study sessions per user




StudyBlue, Inc.
StudyBlue Database Problems

  •     Amazon EC2


  •     Large number of simultaneous users


  •     High write volume


  •     Single PostgreSQL database


  •     Large tables




StudyBlue, Inc.
Why Mongo?



StudyBlue, Inc.
Alternatives


  •      Amazon Simple DB

       •      Far too simple


  •      Cassandra

       •      Difficult to add nodes and rebalance

       •      Column families cannot be modified w/out restart


  •      CouchDB

       •      Difficult to add nodes and rebalance


  •      Redis

       •      No native support for sharding/partitioning

       •      Master/slave only - no automatic failover

StudyBlue, Inc.
MongoDB for the Win


  •      Highly available

       •      Replica sets

       •      Automatic failover


  •     Horizontal scaling across shards

       •     Improved write performance


       •     Improved availability during failures


       •      Easy to add additional shards


  •     Easier maintenance


StudyBlue, Inc.
Implementation:
Phase 1


StudyBlue, Inc.
Development

  •     100% Java


  •     Existing PostgreSQL
        database

       •     System of record


       •     Synchronization issues




StudyBlue, Inc.
SQL Integration & Synchronization


  •      PostgreSQL considered system of record

  •      Asynchronous event driven

  •      Web servers queue change events

  •      Scoring servers process events

       •      Query PostgreSQL

       •      Update MongoDB




StudyBlue, Inc.
Architecture v1
StudyBlue, Inc.
MongoDB Schema


  •      Many shallow collections vs monolithic deep collection

  •      Leverage existing SQL knowledge

  •      Simplify SQL integration




StudyBlue, Inc.
Implementation:
Phase 2


StudyBlue, Inc.
DevOps


  •      Amazon EC2

       •      Separate dev, test and production environments


  •      Scripting & automation

       •      Creation

       •      Cloning

       •      Configuration management with Chef




StudyBlue, Inc.
Even More Data


  •     Moved existing tables from PostgreSQL to MongoDB

       •     Four PostgreSQL tables with millions of rows combined into single collection


  •     New development uses MongoDB:

       •     Analytics data with 300+ million documents




StudyBlue, Inc.
SQL Integration Part 2


  •      MongoDB considered system of record

  •      Web servers interact with MongoDB directly

  •      More complex structures, fewer shallow collections




StudyBlue, Inc.
Key Issues



StudyBlue, Inc.
Summary

  •     NoSQL vs SQL


  •     Design challenges


  •     Amazon EC2/EBS


  •     Partitioning & sharding


  •     Replication Lag




StudyBlue, Inc.
NoSQL vs SQL

  •     NoSQL != SQL


  •     Document database != RDBMS


  •     No joins


  •     Requires new mindset


  •     Store related data together


  •     Duplicate data as necessary




StudyBlue, Inc.
Design Challenges

  •     Multiple tables to single collections with complex objects


  •     Avoid growing objects

       •     Padding


       •     In-place update vs move


  •     Challenges with array elements




StudyBlue, Inc.
Amazon EC2 & EBS

  •     Plan for failure

       •     “When” not “if”


  •     EBS performance

       •     Inconsistent


       •     Limited by bandwidth


       •     100 IOPS / volume


       •     RAID-0




StudyBlue, Inc.
Instance Sizing

  •     Memory is king


  •     Keep working set in RAM

       •     Indexes


       •     Working data


  •     Spread horizontally instead of vertically

       •     Increased write performance




StudyBlue, Inc.
Data Routing with Shards




StudyBlue, Inc.
Partitioning in the Cloud


  •      Operations perspective

       •      Dynamic changes in machines

            •     Config servers track machines

            •     Each node in replica set knows other nodes

            •     Avoids restarting applications when Mongo servers change

       •      Easy scaling

            •     Local shard servers

            •     Config servers store redundant copies

                  •   Two-phase commit




StudyBlue, Inc.
Picking a shard key

  •     Shard key selection critical for proper distribution

       •     Spread writes across cluster


  •     Depends on usage

       •     Single document vs aggregation


  •     Examples all time-series data


  •     Cannot be changed




StudyBlue, Inc.
Sharding - Gritty Details

  •     Chunks

       •     64 MB blocks of data


  •     Splits

       •     1 chunk turns into 2 chunks


  •     Rebalance

       •     Move chunks to different nodes


       •     Maintain even distribution of chunks




StudyBlue, Inc.
Rebalancing Challenges

  •     Splits have to find mid point of chunk


  •     Very I/O expensive for collections with small documents

       •     Decreased chunk size


       •     Made documents larger & more complex


  •     Can be a drain on system


  •     Needs to run frequently




StudyBlue, Inc.
Replication Lag

  •     Eventual consistency


  •     No guarantees about lag


  •     Replica safe writes

       •     Data committed to at least 2 nodes


       •     Can cause problems with high replication lag


       •     Security vs time




StudyBlue, Inc.
Q&A



StudyBlue, Inc.
Contact us
Web: http://www.studyblue.com
Twitter: @StudyBlue
Email: sean@studyblue.com




   StudyBlue, Inc.

More Related Content

What's hot

OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB Tutorial
Steven Francia
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
Tuan Luong
 
Dissecting Scalable Database Architectures
Dissecting Scalable Database ArchitecturesDissecting Scalable Database Architectures
Dissecting Scalable Database Architectures
hypertable
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at Scale
MongoDB
 
Challenges with MongoDB
Challenges with MongoDBChallenges with MongoDB
Challenges with MongoDB
Stone Gao
 
Papers We Love Too, June 2015: Haystack
Papers We Love Too, June 2015: Haystack Papers We Love Too, June 2015: Haystack
Papers We Love Too, June 2015: Haystack
Sargun Dhillon
 
MongoDB, E-commerce and Transactions
MongoDB, E-commerce and TransactionsMongoDB, E-commerce and Transactions
MongoDB, E-commerce and Transactions
Steven Francia
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity Planning
Norberto Leite
 
Divide and conquer in the cloud
Divide and conquer in the cloudDivide and conquer in the cloud
Divide and conquer in the cloud
Justin Swanhart
 
Study Notes: Facebook Haystack
Study Notes: Facebook HaystackStudy Notes: Facebook Haystack
Study Notes: Facebook Haystack
Gao Yunzhong
 
Postgres NoSQL - Delivering Apps Faster
Postgres NoSQL - Delivering Apps FasterPostgres NoSQL - Delivering Apps Faster
Postgres NoSQL - Delivering Apps Faster
EDB
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBJustin Smestad
 
Where Is My Data - ILTAM Session
Where Is My Data - ILTAM SessionWhere Is My Data - ILTAM Session
Where Is My Data - ILTAM Session
Tamir Dresher
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
MongoDB
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Java
sunnygleason
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0
Kai Sasaki
 
Power BI with Essbase in the Oracle Cloud
Power BI with Essbase in the Oracle CloudPower BI with Essbase in the Oracle Cloud
Power BI with Essbase in the Oracle Cloud
Kellyn Pot'Vin-Gorman
 
Securing Your MongoDB Deployment
Securing Your MongoDB DeploymentSecuring Your MongoDB Deployment
Securing Your MongoDB Deployment
MongoDB
 

What's hot (20)

OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB Tutorial
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 
Dissecting Scalable Database Architectures
Dissecting Scalable Database ArchitecturesDissecting Scalable Database Architectures
Dissecting Scalable Database Architectures
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at Scale
 
Challenges with MongoDB
Challenges with MongoDBChallenges with MongoDB
Challenges with MongoDB
 
Papers We Love Too, June 2015: Haystack
Papers We Love Too, June 2015: Haystack Papers We Love Too, June 2015: Haystack
Papers We Love Too, June 2015: Haystack
 
MongoDB, E-commerce and Transactions
MongoDB, E-commerce and TransactionsMongoDB, E-commerce and Transactions
MongoDB, E-commerce and Transactions
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity Planning
 
Divide and conquer in the cloud
Divide and conquer in the cloudDivide and conquer in the cloud
Divide and conquer in the cloud
 
Study Notes: Facebook Haystack
Study Notes: Facebook HaystackStudy Notes: Facebook Haystack
Study Notes: Facebook Haystack
 
Postgres NoSQL - Delivering Apps Faster
Postgres NoSQL - Delivering Apps FasterPostgres NoSQL - Delivering Apps Faster
Postgres NoSQL - Delivering Apps Faster
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 
Where Is My Data - ILTAM Session
Where Is My Data - ILTAM SessionWhere Is My Data - ILTAM Session
Where Is My Data - ILTAM Session
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Java
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0
 
Power BI with Essbase in the Oracle Cloud
Power BI with Essbase in the Oracle CloudPower BI with Essbase in the Oracle Cloud
Power BI with Essbase in the Oracle Cloud
 
Securing Your MongoDB Deployment
Securing Your MongoDB DeploymentSecuring Your MongoDB Deployment
Securing Your MongoDB Deployment
 

Viewers also liked

MongoDB Case Study in Healthcare
MongoDB Case Study in HealthcareMongoDB Case Study in Healthcare
MongoDB Case Study in Healthcare
MongoDB
 
Learning To Relax
Learning To RelaxLearning To Relax
Learning To Relax
Cloudant
 
Mongodb open source_high_performance_database
Mongodb open source_high_performance_databaseMongodb open source_high_performance_database
Mongodb open source_high_performance_databaseMurat Çakal
 
Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...
Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...
Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...
DataStax
 
Storage Area Networks Unit 2 Notes
Storage Area Networks Unit 2 NotesStorage Area Networks Unit 2 Notes
Storage Area Networks Unit 2 Notes
Sudarshan Dhondaley
 
Storage Area Networks Unit 4 Notes
Storage Area Networks Unit 4 NotesStorage Area Networks Unit 4 Notes
Storage Area Networks Unit 4 Notes
Sudarshan Dhondaley
 
MongoDB on Financial Services Sector
MongoDB on Financial Services SectorMongoDB on Financial Services Sector
MongoDB on Financial Services Sector
Norberto Leite
 
Storage Area Networks Unit 3 Notes
Storage Area Networks Unit 3 NotesStorage Area Networks Unit 3 Notes
Storage Area Networks Unit 3 Notes
Sudarshan Dhondaley
 
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012Alexandre Morgaut
 
Real World CouchDB
Real World CouchDBReal World CouchDB
Real World CouchDB
John Wood
 

Viewers also liked (10)

MongoDB Case Study in Healthcare
MongoDB Case Study in HealthcareMongoDB Case Study in Healthcare
MongoDB Case Study in Healthcare
 
Learning To Relax
Learning To RelaxLearning To Relax
Learning To Relax
 
Mongodb open source_high_performance_database
Mongodb open source_high_performance_databaseMongodb open source_high_performance_database
Mongodb open source_high_performance_database
 
Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...
Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...
Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...
 
Storage Area Networks Unit 2 Notes
Storage Area Networks Unit 2 NotesStorage Area Networks Unit 2 Notes
Storage Area Networks Unit 2 Notes
 
Storage Area Networks Unit 4 Notes
Storage Area Networks Unit 4 NotesStorage Area Networks Unit 4 Notes
Storage Area Networks Unit 4 Notes
 
MongoDB on Financial Services Sector
MongoDB on Financial Services SectorMongoDB on Financial Services Sector
MongoDB on Financial Services Sector
 
Storage Area Networks Unit 3 Notes
Storage Area Networks Unit 3 NotesStorage Area Networks Unit 3 Notes
Storage Area Networks Unit 3 Notes
 
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
 
Real World CouchDB
Real World CouchDBReal World CouchDB
Real World CouchDB
 

Similar to MongoDB Case Study at NoSQL Now 2012

Store
StoreStore
Store
ESUG
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Bob Pusateri
 
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless DreamsRainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Josh Carlisle
 
Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2
Marco Tusa
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?DATAVERSITY
 
Scaling Databases On The Cloud
Scaling Databases On The CloudScaling Databases On The Cloud
Scaling Databases On The Cloud
Imaginea
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloud
Imaginea
 
Five Years of EC2 Distilled
Five Years of EC2 DistilledFive Years of EC2 Distilled
Five Years of EC2 Distilled
Grig Gheorghiu
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
Ruben Badaró
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
Andriy Zabavskyy
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
Jonas Bonér
 
Drop acid
Drop acidDrop acid
Drop acid
Mike Feltman
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
Yahoo Developer Network
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
Tony Tam
 
SpringPeople Introduction to MongoDB Administration
SpringPeople Introduction to MongoDB AdministrationSpringPeople Introduction to MongoDB Administration
SpringPeople Introduction to MongoDB Administration
SpringPeople
 
Managing storage on Prem and in Cloud
Managing storage on Prem and in CloudManaging storage on Prem and in Cloud
Managing storage on Prem and in Cloud
Howard Marks
 
Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...
Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...
Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...
Amazon Web Services
 
High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2
ScribbleLive
 

Similar to MongoDB Case Study at NoSQL Now 2012 (20)

Store
StoreStore
Store
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
 
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless DreamsRainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
 
noSQL choices
noSQL choicesnoSQL choices
noSQL choices
 
Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
 
Scaling Databases On The Cloud
Scaling Databases On The CloudScaling Databases On The Cloud
Scaling Databases On The Cloud
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloud
 
Five Years of EC2 Distilled
Five Years of EC2 DistilledFive Years of EC2 Distilled
Five Years of EC2 Distilled
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Drop acid
Drop acidDrop acid
Drop acid
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
SpringPeople Introduction to MongoDB Administration
SpringPeople Introduction to MongoDB AdministrationSpringPeople Introduction to MongoDB Administration
SpringPeople Introduction to MongoDB Administration
 
Managing storage on Prem and in Cloud
Managing storage on Prem and in CloudManaging storage on Prem and in Cloud
Managing storage on Prem and in Cloud
 
Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...
Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...
Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...
 
High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2
 

Recently uploaded

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 

Recently uploaded (20)

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 

MongoDB Case Study at NoSQL Now 2012

  • 1. StudyBlue Databases at Scale: A MongoDB Case Study August 23, 2012 StudyBlue, Inc.
  • 2. Overview • About Me • About StudyBlue • Why MongoDB? • Leveraging MongoDB • Key Issues • Q&A StudyBlue, Inc.
  • 3. Who am I? • Sean Laurent • sean@studyblue.com • Head of Operations at StudyBlue, Inc. StudyBlue, Inc.
  • 5. About StudyBlue • Online service for storing, studying, sharing and ultimately mastering course material • Digital backpack for students StudyBlue, Inc.
  • 6. StudyBlue Usage • Many simultaneous users • Rapid growth • Cyclical usage StudyBlue, Inc.
  • 8. Flashcard Scoring • Track flashcard scoring over time • Every single card • Every single user • Forever • Provide aggregate statistics • Flashcard deck • Folder • Overall • Focus on content mastery StudyBlue, Inc.
  • 10. The Problem • Reasonably large number of cards • Large number of users • Users base increasing rapidly • Shift in usage - increasing faster than users • Time on site • Decks per user • Average deck size • Study sessions per user StudyBlue, Inc.
  • 11. StudyBlue Database Problems • Amazon EC2 • Large number of simultaneous users • High write volume • Single PostgreSQL database • Large tables StudyBlue, Inc.
  • 13. Alternatives • Amazon Simple DB • Far too simple • Cassandra • Difficult to add nodes and rebalance • Column families cannot be modified w/out restart • CouchDB • Difficult to add nodes and rebalance • Redis • No native support for sharding/partitioning • Master/slave only - no automatic failover StudyBlue, Inc.
  • 14. MongoDB for the Win • Highly available • Replica sets • Automatic failover • Horizontal scaling across shards • Improved write performance • Improved availability during failures • Easy to add additional shards • Easier maintenance StudyBlue, Inc.
  • 16. Development • 100% Java • Existing PostgreSQL database • System of record • Synchronization issues StudyBlue, Inc.
  • 17. SQL Integration & Synchronization • PostgreSQL considered system of record • Asynchronous event driven • Web servers queue change events • Scoring servers process events • Query PostgreSQL • Update MongoDB StudyBlue, Inc.
  • 19. MongoDB Schema • Many shallow collections vs monolithic deep collection • Leverage existing SQL knowledge • Simplify SQL integration StudyBlue, Inc.
  • 21. DevOps • Amazon EC2 • Separate dev, test and production environments • Scripting & automation • Creation • Cloning • Configuration management with Chef StudyBlue, Inc.
  • 22. Even More Data • Moved existing tables from PostgreSQL to MongoDB • Four PostgreSQL tables with millions of rows combined into single collection • New development uses MongoDB: • Analytics data with 300+ million documents StudyBlue, Inc.
  • 23. SQL Integration Part 2 • MongoDB considered system of record • Web servers interact with MongoDB directly • More complex structures, fewer shallow collections StudyBlue, Inc.
  • 25. Summary • NoSQL vs SQL • Design challenges • Amazon EC2/EBS • Partitioning & sharding • Replication Lag StudyBlue, Inc.
  • 26. NoSQL vs SQL • NoSQL != SQL • Document database != RDBMS • No joins • Requires new mindset • Store related data together • Duplicate data as necessary StudyBlue, Inc.
  • 27. Design Challenges • Multiple tables to single collections with complex objects • Avoid growing objects • Padding • In-place update vs move • Challenges with array elements StudyBlue, Inc.
  • 28. Amazon EC2 & EBS • Plan for failure • “When” not “if” • EBS performance • Inconsistent • Limited by bandwidth • 100 IOPS / volume • RAID-0 StudyBlue, Inc.
  • 29. Instance Sizing • Memory is king • Keep working set in RAM • Indexes • Working data • Spread horizontally instead of vertically • Increased write performance StudyBlue, Inc.
  • 30. Data Routing with Shards StudyBlue, Inc.
  • 31. Partitioning in the Cloud • Operations perspective • Dynamic changes in machines • Config servers track machines • Each node in replica set knows other nodes • Avoids restarting applications when Mongo servers change • Easy scaling • Local shard servers • Config servers store redundant copies • Two-phase commit StudyBlue, Inc.
  • 32. Picking a shard key • Shard key selection critical for proper distribution • Spread writes across cluster • Depends on usage • Single document vs aggregation • Examples all time-series data • Cannot be changed StudyBlue, Inc.
  • 33. Sharding - Gritty Details • Chunks • 64 MB blocks of data • Splits • 1 chunk turns into 2 chunks • Rebalance • Move chunks to different nodes • Maintain even distribution of chunks StudyBlue, Inc.
  • 34. Rebalancing Challenges • Splits have to find mid point of chunk • Very I/O expensive for collections with small documents • Decreased chunk size • Made documents larger & more complex • Can be a drain on system • Needs to run frequently StudyBlue, Inc.
  • 35. Replication Lag • Eventual consistency • No guarantees about lag • Replica safe writes • Data committed to at least 2 nodes • Can cause problems with high replication lag • Security vs time StudyBlue, Inc.
  • 37. Contact us Web: http://www.studyblue.com Twitter: @StudyBlue Email: sean@studyblue.com StudyBlue, Inc.

Editor's Notes

  1. \n
  2. \n
  3. - Developer at heart\n- 15 years experience\n- Responsible for selecting Mongo\n\n
  4. \n
  5. - 15 person startup\n- Bottom-up attempt to improve student outcomes through disruptive change outside of the education system. \n- Allows students to create and store lecture notes and flashcards and access them online and via mobile apps (iOS and Android)\n
  6. - No public numbers (low millions)\n- 4000 simultaneous users (peak)\n- 120+ countries\n- Daily cycle slowly flattening\n
  7. \n
  8. \n
  9. \n
  10. - 20 million cards at the time\n- Over 60 million cards now\n- Expect 100 million cards in next 6 months\n
  11. - EC2 limits vertical scaling\n- Postgres tuning extremely beneficial\n- Tables > 70 million rows\n
  12. \n
  13. Cassandra & Redis have since improved \nAmazon Dynamo didn’t exist\n\n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. Launch replacement Mongo server in < 10 mins\nClone entire production Mongo cluster in < 60 mins\n
  22. - Not huge by BigData standards - Couple terabytes\n- Big by startup standards\n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. Provisioned IOPS\n
  29. - Working set is ~20% for SB, mostly recently created data\n
  30. \n
  31. \n
  32. http://www.snailinaturtleneck.com/blog/2011/01/04/how-to-choose-a-shard-key-the-card-game/\n
  33. \n
  34. Ran nightly - backlog causes really high load\n
  35. \n
  36. \n
  37. \n