Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MongoDB Case Study at NoSQL Now 2012

0 views

Published on

Published in: Technology
  • Be the first to comment

MongoDB Case Study at NoSQL Now 2012

  1. 1. StudyBlueDatabases at Scale:A MongoDB Case StudyAugust 23, 2012StudyBlue, Inc.
  2. 2. Overview • About Me • About StudyBlue • Why MongoDB? • Leveraging MongoDB • Key Issues • Q&AStudyBlue, Inc.
  3. 3. Who am I? • Sean Laurent • sean@studyblue.com • Head of Operations at StudyBlue, Inc.StudyBlue, Inc.
  4. 4. studyblue.comStudyBlue, Inc.
  5. 5. About StudyBlue • Online service for storing, studying, sharing and ultimately mastering course material • Digital backpack for studentsStudyBlue, Inc.
  6. 6. StudyBlue Usage • Many simultaneous users • Rapid growth • Cyclical usageStudyBlue, Inc.
  7. 7. Initial Use CaseStudyBlue, Inc.
  8. 8. Flashcard Scoring • Track flashcard scoring over time • Every single card • Every single user • Forever • Provide aggregate statistics • Flashcard deck • Folder • Overall • Focus on content masteryStudyBlue, Inc.
  9. 9. Scoring ResultsStudyBlue, Inc.
  10. 10. The Problem • Reasonably large number of cards • Large number of users • Users base increasing rapidly • Shift in usage - increasing faster than users • Time on site • Decks per user • Average deck size • Study sessions per userStudyBlue, Inc.
  11. 11. StudyBlue Database Problems • Amazon EC2 • Large number of simultaneous users • High write volume • Single PostgreSQL database • Large tablesStudyBlue, Inc.
  12. 12. Why Mongo?StudyBlue, Inc.
  13. 13. Alternatives • Amazon Simple DB • Far too simple • Cassandra • Difficult to add nodes and rebalance • Column families cannot be modified w/out restart • CouchDB • Difficult to add nodes and rebalance • Redis • No native support for sharding/partitioning • Master/slave only - no automatic failoverStudyBlue, Inc.
  14. 14. MongoDB for the Win • Highly available • Replica sets • Automatic failover • Horizontal scaling across shards • Improved write performance • Improved availability during failures • Easy to add additional shards • Easier maintenanceStudyBlue, Inc.
  15. 15. Implementation:Phase 1StudyBlue, Inc.
  16. 16. Development • 100% Java • Existing PostgreSQL database • System of record • Synchronization issuesStudyBlue, Inc.
  17. 17. SQL Integration & Synchronization • PostgreSQL considered system of record • Asynchronous event driven • Web servers queue change events • Scoring servers process events • Query PostgreSQL • Update MongoDBStudyBlue, Inc.
  18. 18. Architecture v1StudyBlue, Inc.
  19. 19. MongoDB Schema • Many shallow collections vs monolithic deep collection • Leverage existing SQL knowledge • Simplify SQL integrationStudyBlue, Inc.
  20. 20. Implementation:Phase 2StudyBlue, Inc.
  21. 21. DevOps • Amazon EC2 • Separate dev, test and production environments • Scripting & automation • Creation • Cloning • Configuration management with ChefStudyBlue, Inc.
  22. 22. Even More Data • Moved existing tables from PostgreSQL to MongoDB • Four PostgreSQL tables with millions of rows combined into single collection • New development uses MongoDB: • Analytics data with 300+ million documentsStudyBlue, Inc.
  23. 23. SQL Integration Part 2 • MongoDB considered system of record • Web servers interact with MongoDB directly • More complex structures, fewer shallow collectionsStudyBlue, Inc.
  24. 24. Key IssuesStudyBlue, Inc.
  25. 25. Summary • NoSQL vs SQL • Design challenges • Amazon EC2/EBS • Partitioning & sharding • Replication LagStudyBlue, Inc.
  26. 26. NoSQL vs SQL • NoSQL != SQL • Document database != RDBMS • No joins • Requires new mindset • Store related data together • Duplicate data as necessaryStudyBlue, Inc.
  27. 27. Design Challenges • Multiple tables to single collections with complex objects • Avoid growing objects • Padding • In-place update vs move • Challenges with array elementsStudyBlue, Inc.
  28. 28. Amazon EC2 & EBS • Plan for failure • “When” not “if” • EBS performance • Inconsistent • Limited by bandwidth • 100 IOPS / volume • RAID-0StudyBlue, Inc.
  29. 29. Instance Sizing • Memory is king • Keep working set in RAM • Indexes • Working data • Spread horizontally instead of vertically • Increased write performanceStudyBlue, Inc.
  30. 30. Data Routing with ShardsStudyBlue, Inc.
  31. 31. Partitioning in the Cloud • Operations perspective • Dynamic changes in machines • Config servers track machines • Each node in replica set knows other nodes • Avoids restarting applications when Mongo servers change • Easy scaling • Local shard servers • Config servers store redundant copies • Two-phase commitStudyBlue, Inc.
  32. 32. Picking a shard key • Shard key selection critical for proper distribution • Spread writes across cluster • Depends on usage • Single document vs aggregation • Examples all time-series data • Cannot be changedStudyBlue, Inc.
  33. 33. Sharding - Gritty Details • Chunks • 64 MB blocks of data • Splits • 1 chunk turns into 2 chunks • Rebalance • Move chunks to different nodes • Maintain even distribution of chunksStudyBlue, Inc.
  34. 34. Rebalancing Challenges • Splits have to find mid point of chunk • Very I/O expensive for collections with small documents • Decreased chunk size • Made documents larger & more complex • Can be a drain on system • Needs to run frequentlyStudyBlue, Inc.
  35. 35. Replication Lag • Eventual consistency • No guarantees about lag • Replica safe writes • Data committed to at least 2 nodes • Can cause problems with high replication lag • Security vs timeStudyBlue, Inc.
  36. 36. Q&AStudyBlue, Inc.
  37. 37. Contact usWeb: http://www.studyblue.comTwitter: @StudyBlueEmail: sean@studyblue.com StudyBlue, Inc.

×