Your SlideShare is downloading. ×
0
StudyBlueDatabases at Scale:A MongoDB Case StudyAugust 23, 2012StudyBlue, Inc.
Overview  •      About Me  •      About StudyBlue  •      Why MongoDB?  •      Leveraging MongoDB  •      Key Issues  •   ...
Who am I?  •      Sean Laurent  •      sean@studyblue.com  •      Head of Operations at StudyBlue, Inc.StudyBlue, Inc.
studyblue.comStudyBlue, Inc.
About StudyBlue  •     Online service for storing, studying, sharing        and ultimately mastering course material  •   ...
StudyBlue Usage  •     Many simultaneous users  •     Rapid growth  •     Cyclical usageStudyBlue, Inc.
Initial Use CaseStudyBlue, Inc.
Flashcard Scoring  •      Track flashcard scoring over time       •      Every single card       •      Every single user  ...
Scoring ResultsStudyBlue, Inc.
The Problem  •      Reasonably large number of cards  •      Large number of users  •      Users base increasing rapidly  ...
StudyBlue Database Problems  •     Amazon EC2  •     Large number of simultaneous users  •     High write volume  •     Si...
Why Mongo?StudyBlue, Inc.
Alternatives  •      Amazon Simple DB       •      Far too simple  •      Cassandra       •      Difficult to add nodes and...
MongoDB for the Win  •      Highly available       •      Replica sets       •      Automatic failover  •     Horizontal s...
Implementation:Phase 1StudyBlue, Inc.
Development  •     100% Java  •     Existing PostgreSQL        database       •     System of record       •     Synchroni...
SQL Integration & Synchronization  •      PostgreSQL considered system of record  •      Asynchronous event driven  •     ...
Architecture v1StudyBlue, Inc.
MongoDB Schema  •      Many shallow collections vs monolithic deep collection  •      Leverage existing SQL knowledge  •  ...
Implementation:Phase 2StudyBlue, Inc.
DevOps  •      Amazon EC2       •      Separate dev, test and production environments  •      Scripting & automation      ...
Even More Data  •     Moved existing tables from PostgreSQL to MongoDB       •     Four PostgreSQL tables with millions of...
SQL Integration Part 2  •      MongoDB considered system of record  •      Web servers interact with MongoDB directly  •  ...
Key IssuesStudyBlue, Inc.
Summary  •     NoSQL vs SQL  •     Design challenges  •     Amazon EC2/EBS  •     Partitioning & sharding  •     Replicati...
NoSQL vs SQL  •     NoSQL != SQL  •     Document database != RDBMS  •     No joins  •     Requires new mindset  •     Stor...
Design Challenges  •     Multiple tables to single collections with complex objects  •     Avoid growing objects       •  ...
Amazon EC2 & EBS  •     Plan for failure       •     “When” not “if”  •     EBS performance       •     Inconsistent      ...
Instance Sizing  •     Memory is king  •     Keep working set in RAM       •     Indexes       •     Working data  •     S...
Data Routing with ShardsStudyBlue, Inc.
Partitioning in the Cloud  •      Operations perspective       •      Dynamic changes in machines            •     Config s...
Picking a shard key  •     Shard key selection critical for proper distribution       •     Spread writes across cluster  ...
Sharding - Gritty Details  •     Chunks       •     64 MB blocks of data  •     Splits       •     1 chunk turns into 2 ch...
Rebalancing Challenges  •     Splits have to find mid point of chunk  •     Very I/O expensive for collections with small d...
Replication Lag  •     Eventual consistency  •     No guarantees about lag  •     Replica safe writes       •     Data com...
Q&AStudyBlue, Inc.
Contact usWeb: http://www.studyblue.comTwitter: @StudyBlueEmail: sean@studyblue.com   StudyBlue, Inc.
Upcoming SlideShare
Loading in...5
×

MongoDB Case Study at NoSQL Now 2012

1,604

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,604
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
46
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • - Developer at heart\n- 15 years experience\n- Responsible for selecting Mongo\n\n
  • \n
  • - 15 person startup\n- Bottom-up attempt to improve student outcomes through disruptive change outside of the education system. \n- Allows students to create and store lecture notes and flashcards and access them online and via mobile apps (iOS and Android)\n
  • - No public numbers (low millions)\n- 4000 simultaneous users (peak)\n- 120+ countries\n- Daily cycle slowly flattening\n
  • \n
  • \n
  • \n
  • - 20 million cards at the time\n- Over 60 million cards now\n- Expect 100 million cards in next 6 months\n
  • - EC2 limits vertical scaling\n- Postgres tuning extremely beneficial\n- Tables > 70 million rows\n
  • \n
  • Cassandra & Redis have since improved \nAmazon Dynamo didn’t exist\n\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Launch replacement Mongo server in < 10 mins\nClone entire production Mongo cluster in < 60 mins\n
  • - Not huge by BigData standards - Couple terabytes\n- Big by startup standards\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Provisioned IOPS\n
  • - Working set is ~20% for SB, mostly recently created data\n
  • \n
  • \n
  • http://www.snailinaturtleneck.com/blog/2011/01/04/how-to-choose-a-shard-key-the-card-game/\n
  • \n
  • Ran nightly - backlog causes really high load\n
  • \n
  • \n
  • \n
  • Transcript of "MongoDB Case Study at NoSQL Now 2012"

    1. 1. StudyBlueDatabases at Scale:A MongoDB Case StudyAugust 23, 2012StudyBlue, Inc.
    2. 2. Overview • About Me • About StudyBlue • Why MongoDB? • Leveraging MongoDB • Key Issues • Q&AStudyBlue, Inc.
    3. 3. Who am I? • Sean Laurent • sean@studyblue.com • Head of Operations at StudyBlue, Inc.StudyBlue, Inc.
    4. 4. studyblue.comStudyBlue, Inc.
    5. 5. About StudyBlue • Online service for storing, studying, sharing and ultimately mastering course material • Digital backpack for studentsStudyBlue, Inc.
    6. 6. StudyBlue Usage • Many simultaneous users • Rapid growth • Cyclical usageStudyBlue, Inc.
    7. 7. Initial Use CaseStudyBlue, Inc.
    8. 8. Flashcard Scoring • Track flashcard scoring over time • Every single card • Every single user • Forever • Provide aggregate statistics • Flashcard deck • Folder • Overall • Focus on content masteryStudyBlue, Inc.
    9. 9. Scoring ResultsStudyBlue, Inc.
    10. 10. The Problem • Reasonably large number of cards • Large number of users • Users base increasing rapidly • Shift in usage - increasing faster than users • Time on site • Decks per user • Average deck size • Study sessions per userStudyBlue, Inc.
    11. 11. StudyBlue Database Problems • Amazon EC2 • Large number of simultaneous users • High write volume • Single PostgreSQL database • Large tablesStudyBlue, Inc.
    12. 12. Why Mongo?StudyBlue, Inc.
    13. 13. Alternatives • Amazon Simple DB • Far too simple • Cassandra • Difficult to add nodes and rebalance • Column families cannot be modified w/out restart • CouchDB • Difficult to add nodes and rebalance • Redis • No native support for sharding/partitioning • Master/slave only - no automatic failoverStudyBlue, Inc.
    14. 14. MongoDB for the Win • Highly available • Replica sets • Automatic failover • Horizontal scaling across shards • Improved write performance • Improved availability during failures • Easy to add additional shards • Easier maintenanceStudyBlue, Inc.
    15. 15. Implementation:Phase 1StudyBlue, Inc.
    16. 16. Development • 100% Java • Existing PostgreSQL database • System of record • Synchronization issuesStudyBlue, Inc.
    17. 17. SQL Integration & Synchronization • PostgreSQL considered system of record • Asynchronous event driven • Web servers queue change events • Scoring servers process events • Query PostgreSQL • Update MongoDBStudyBlue, Inc.
    18. 18. Architecture v1StudyBlue, Inc.
    19. 19. MongoDB Schema • Many shallow collections vs monolithic deep collection • Leverage existing SQL knowledge • Simplify SQL integrationStudyBlue, Inc.
    20. 20. Implementation:Phase 2StudyBlue, Inc.
    21. 21. DevOps • Amazon EC2 • Separate dev, test and production environments • Scripting & automation • Creation • Cloning • Configuration management with ChefStudyBlue, Inc.
    22. 22. Even More Data • Moved existing tables from PostgreSQL to MongoDB • Four PostgreSQL tables with millions of rows combined into single collection • New development uses MongoDB: • Analytics data with 300+ million documentsStudyBlue, Inc.
    23. 23. SQL Integration Part 2 • MongoDB considered system of record • Web servers interact with MongoDB directly • More complex structures, fewer shallow collectionsStudyBlue, Inc.
    24. 24. Key IssuesStudyBlue, Inc.
    25. 25. Summary • NoSQL vs SQL • Design challenges • Amazon EC2/EBS • Partitioning & sharding • Replication LagStudyBlue, Inc.
    26. 26. NoSQL vs SQL • NoSQL != SQL • Document database != RDBMS • No joins • Requires new mindset • Store related data together • Duplicate data as necessaryStudyBlue, Inc.
    27. 27. Design Challenges • Multiple tables to single collections with complex objects • Avoid growing objects • Padding • In-place update vs move • Challenges with array elementsStudyBlue, Inc.
    28. 28. Amazon EC2 & EBS • Plan for failure • “When” not “if” • EBS performance • Inconsistent • Limited by bandwidth • 100 IOPS / volume • RAID-0StudyBlue, Inc.
    29. 29. Instance Sizing • Memory is king • Keep working set in RAM • Indexes • Working data • Spread horizontally instead of vertically • Increased write performanceStudyBlue, Inc.
    30. 30. Data Routing with ShardsStudyBlue, Inc.
    31. 31. Partitioning in the Cloud • Operations perspective • Dynamic changes in machines • Config servers track machines • Each node in replica set knows other nodes • Avoids restarting applications when Mongo servers change • Easy scaling • Local shard servers • Config servers store redundant copies • Two-phase commitStudyBlue, Inc.
    32. 32. Picking a shard key • Shard key selection critical for proper distribution • Spread writes across cluster • Depends on usage • Single document vs aggregation • Examples all time-series data • Cannot be changedStudyBlue, Inc.
    33. 33. Sharding - Gritty Details • Chunks • 64 MB blocks of data • Splits • 1 chunk turns into 2 chunks • Rebalance • Move chunks to different nodes • Maintain even distribution of chunksStudyBlue, Inc.
    34. 34. Rebalancing Challenges • Splits have to find mid point of chunk • Very I/O expensive for collections with small documents • Decreased chunk size • Made documents larger & more complex • Can be a drain on system • Needs to run frequentlyStudyBlue, Inc.
    35. 35. Replication Lag • Eventual consistency • No guarantees about lag • Replica safe writes • Data committed to at least 2 nodes • Can cause problems with high replication lag • Security vs timeStudyBlue, Inc.
    36. 36. Q&AStudyBlue, Inc.
    37. 37. Contact usWeb: http://www.studyblue.comTwitter: @StudyBlueEmail: sean@studyblue.com StudyBlue, Inc.
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×