MongoDB Case Study at NoSQL Now 2012

StudyBlue

Databases at Scale:
A MongoDB Case Study

August 23, 2012

StudyBlue, Inc.

Overview

• About Me

• About StudyBlue

• Why MongoDB?

• Leveraging MongoDB

• Key Issues

• Q&A

StudyBlue, Inc.

Who am I?

• Sean Laurent

• sean@studyblue.com

• Head of Operations at StudyBlue, Inc.

StudyBlue, Inc.

studyblue.com

StudyBlue, Inc.

About StudyBlue

• Online service for storing, studying, sharing
and ultimately mastering course material

• Digital backpack for students

StudyBlue, Inc.

StudyBlue Usage

• Many simultaneous users

• Rapid growth

• Cyclical usage

StudyBlue, Inc.

Initial Use Case

StudyBlue, Inc.

Flashcard Scoring

• Track ﬂashcard scoring over time

• Every single card

• Every single user

• Forever

• Provide aggregate statistics

• Flashcard deck

• Folder

• Overall

• Focus on content mastery

StudyBlue, Inc.

Scoring Results
StudyBlue, Inc.

The Problem

• Reasonably large number of cards

• Large number of users

• Users base increasing rapidly

• Shift in usage - increasing faster than users

• Time on site

• Decks per user

• Average deck size

• Study sessions per user

StudyBlue, Inc.

StudyBlue Database Problems

• Amazon EC2

• Large number of simultaneous users

• High write volume

• Single PostgreSQL database

• Large tables

StudyBlue, Inc.

Alternatives

• Amazon Simple DB

• Far too simple

• Cassandra

• Difficult to add nodes and rebalance

• Column families cannot be modified w/out restart

• CouchDB

• Difficult to add nodes and rebalance

• Redis

• No native support for sharding/partitioning

• Master/slave only - no automatic failover

StudyBlue, Inc.

MongoDB for the Win

• Highly available

• Replica sets

• Automatic failover

• Horizontal scaling across shards

• Improved write performance

• Improved availability during failures

• Easy to add additional shards

• Easier maintenance

StudyBlue, Inc.

Implementation:
Phase 1

StudyBlue, Inc.

Development

• 100% Java

• Existing PostgreSQL
database

• System of record

• Synchronization issues

StudyBlue, Inc.

SQL Integration & Synchronization

• PostgreSQL considered system of record

• Asynchronous event driven

• Web servers queue change events

• Scoring servers process events

• Query PostgreSQL

• Update MongoDB

StudyBlue, Inc.

Architecture v1
StudyBlue, Inc.

MongoDB Schema

• Many shallow collections vs monolithic deep collection

• Leverage existing SQL knowledge

• Simplify SQL integration

StudyBlue, Inc.

Implementation:
Phase 2

StudyBlue, Inc.

DevOps

• Amazon EC2

• Separate dev, test and production environments

• Scripting & automation

• Creation

• Cloning

• Conﬁguration management with Chef

StudyBlue, Inc.

Even More Data

• Moved existing tables from PostgreSQL to MongoDB

• Four PostgreSQL tables with millions of rows combined into single collection

• New development uses MongoDB:

• Analytics data with 300+ million documents

StudyBlue, Inc.

SQL Integration Part 2

• MongoDB considered system of record

• Web servers interact with MongoDB directly

• More complex structures, fewer shallow collections

StudyBlue, Inc.

Summary

• NoSQL vs SQL

• Design challenges

• Amazon EC2/EBS

• Partitioning & sharding

• Replication Lag

StudyBlue, Inc.

NoSQL vs SQL

• NoSQL != SQL

• Document database != RDBMS

• No joins

• Requires new mindset

• Store related data together

• Duplicate data as necessary

StudyBlue, Inc.

Design Challenges

• Multiple tables to single collections with complex objects

• Avoid growing objects

• Padding

• In-place update vs move

• Challenges with array elements

StudyBlue, Inc.

Amazon EC2 & EBS

• Plan for failure

• “When” not “if”

• EBS performance

• Inconsistent

• Limited by bandwidth

• 100 IOPS / volume

• RAID-0

StudyBlue, Inc.

Instance Sizing

• Memory is king

• Keep working set in RAM

• Indexes

• Working data

• Spread horizontally instead of vertically

• Increased write performance

StudyBlue, Inc.

Data Routing with Shards

StudyBlue, Inc.

Partitioning in the Cloud

• Operations perspective

• Dynamic changes in machines

• Conﬁg servers track machines

• Each node in replica set knows other nodes

• Avoids restarting applications when Mongo servers change

• Easy scaling

• Local shard servers

• Conﬁg servers store redundant copies

• Two-phase commit

StudyBlue, Inc.

Picking a shard key

• Shard key selection critical for proper distribution

• Spread writes across cluster

• Depends on usage

• Single document vs aggregation

• Examples all time-series data

• Cannot be changed

StudyBlue, Inc.

Sharding - Gritty Details

• Chunks

• 64 MB blocks of data

• Splits

• 1 chunk turns into 2 chunks

• Rebalance

• Move chunks to different nodes

• Maintain even distribution of chunks

StudyBlue, Inc.

Rebalancing Challenges

• Splits have to ﬁnd mid point of chunk

• Very I/O expensive for collections with small documents

• Decreased chunk size

• Made documents larger & more complex

• Can be a drain on system

• Needs to run frequently

StudyBlue, Inc.

Replication Lag

• Eventual consistency

• No guarantees about lag

• Replica safe writes

• Data committed to at least 2 nodes

• Can cause problems with high replication lag

• Security vs time

StudyBlue, Inc.

Contact us
Web: http://www.studyblue.com
Twitter: @StudyBlue
Email: sean@studyblue.com

StudyBlue, Inc.

MongoDB Case Study at NoSQL Now 2012

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (10)

Similar to MongoDB Case Study at NoSQL Now 2012

Similar to MongoDB Case Study at NoSQL Now 2012 (20)

Recently uploaded

Recently uploaded (20)

MongoDB Case Study at NoSQL Now 2012

Editor's Notes