Leveraging MongoDB: An Introductory Case Study
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Leveraging MongoDB: An Introductory Case Study

on

  • 1,129 views

Presented at MongoChicago 2011. An overview of why we selected MongoDB, how we integrated it and important lessons we learned in the process.

Presented at MongoChicago 2011. An overview of why we selected MongoDB, how we integrated it and important lessons we learned in the process.

Statistics

Views

Total Views
1,129
Views on SlideShare
1,121
Embed Views
8

Actions

Likes
0
Downloads
19
Comments
0

1 Embed 8

http://www.linkedin.com 8

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

CC Attribution-NonCommercial LicenseCC Attribution-NonCommercial License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • - Developer at heart\n- 15 years experience\n- Responsible for selecting Mongo\n\n
  • \n
  • - Bottom-up attempt to improve student outcomes through disruptive change outside of the education system. \n- Allows students to create and store lecture notes and flashcards and access them online and via mobile apps (iOS and Android)\n
  • - No public numbers\n- 1000 simultaneous users (peak)\n
  • \n
  • \n
  • \n
  • - Over 20 million cards now\n- Approx 40 million by Xmas, 80-100 million by May 2012, 200+ million by end 2012\n
  • \n
  • \n
  • \n
  • \n
  • - Read balancing (slaveOk) discuss later\n- No downtime with Mongo since launch\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • - Relationship mapping is example of problem with NoSQL\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • - Bean serialization\n- Annotations for slaveOk\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Leveraging MongoDB: An Introductory Case Study Presentation Transcript

  • 1. StudyBlueStudyBlue and MongoDB:Implementation 101October 18, 2011StudyBlue, Inc.
  • 2. Overview • Who am I? • Who is StudyBlue? • Why MongoDB? • How did we leverage MongoDB? • What lessons did we learn? • Q&AStudyBlue, Inc.
  • 3. Who am I? • Sean Laurent • sean@studyblue.com • Director of Operations at StudyBlue, Inc.StudyBlue, Inc.
  • 4. studyblue.comStudyBlue, Inc.
  • 5. About StudyBlue • Bottom-up attempt to improve student outcomes • Online service for storing, studying, sharing and ultimately mastering course material • Digital backpack for students • Freemium business modelStudyBlue, Inc.
  • 6. StudyBlue Usage • Many simultaneous users • Rapid growth • Cyclical usageStudyBlue, Inc.
  • 7. The ChallengeStudyBlue, Inc.
  • 8. Flashcard Scoring • Track flashcard scoring • Every single card • Every single user • Forever • Provide aggregate statistics • Flashcard deck • Folder • Overall • Focus on content masteryStudyBlue, Inc.
  • 9. Scoring ResultsStudyBlue, Inc.
  • 10. The Problem • Existing PostgreSQL database • Reasonably large number of cards • Large number of users • Users base increasing rapidly • Shift in usage - increasing faster than users • Time on site • Decks per user • Average deck size • Study sessions per userStudyBlue, Inc.
  • 11. Additional Requirements • Support sustained rapid growth • Highly available • Minimize maintenance costs • Active community • Done yesterdayStudyBlue, Inc.
  • 12. Why Mongo?StudyBlue, Inc.
  • 13. Alternatives • Amazon Simple DB • Far too simple • Cassandra • Difficult to add nodes and rebalance • Column families cannot be modified w/out restart • CouchDB • Difficult to add nodes and rebalance • Redis • No native support for sharding/partitioning • Master/slave only - no automatic failoverStudyBlue, Inc.
  • 14. MongoDB for the Win • Highly available • Replica sets • Automatic failover • Shards • Works across replica sets • Easy to add additional shards • Node addition • Read performance degradation when adding nodes • “hidden” flag • No down timeStudyBlue, Inc.
  • 15. More winning • Atomic insert & replace • Read balancing across slaves • BSON/JSON document model • It just works. Seriously.StudyBlue, Inc.
  • 16. ImplementationStudyBlue, Inc.
  • 17. DevOps • Amazon EC2 • Separate dev, test and production environments • Operations testing • Replication • Failover • Scripting & automation • Creation • CloningStudyBlue, Inc.
  • 18. Development • 100% Java • Existing PostgreSQL database • System of record • Synchronization issuesStudyBlue, Inc.
  • 19. SQL Integration & Synchronization • PostgreSQL considered system of record • Asynchronous event driven • Web servers queue change events • Scoring server processes events • Query PostgreSQL • Update MongoDBStudyBlue, Inc.
  • 20. ArchitectureStudyBlue, Inc.
  • 21. MongoDB Schema • Many shallow collections vs monolithic deep collection • Leverage existing SQL knowledge • Simplify SQL integrationStudyBlue, Inc.
  • 22. Schema Design • Two collections used together to map relationships • Folder containing Deck • Decks in a Folder • Decks containing a Card • Cards in a Deck • Folders arranged in tree structure, • One row per folder that points to its parent. • Multiple queries required to build tree • Postgres primary keys are used instead of object idsStudyBlue, Inc.
  • 23. StudyBlue, Inc.
  • 24. Document Scores ExampleStudyBlue, Inc.
  • 25. Slave Reads • SlaveOk set to true for most data retrieval • Scoring calculations use Primary to ensure correctnessStudyBlue, Inc.
  • 26. Data migration • One-time process • Postgres to MongoDB • Ruby scripts • Separate serverStudyBlue, Inc.
  • 27. Key IssuesStudyBlue, Inc.
  • 28. Summary • Amazon EC2/EBS • Java API • MapReduce • Replication • Partitioning / Shards • PerformanceStudyBlue, Inc.
  • 29. Amazon EC2 & EBS • Plan for failure • “When” not “if” • EBS performance • Inconsistent • Limited by bandwidth • 60GB minimum • RAID-0StudyBlue, Inc.
  • 30. Java API • Not perfect • Verbose • Type safety • Failover requires retry • Up to 1 minute delay • Read-only requests • “slaveOk” works • Burden on developerStudyBlue, Inc.
  • 31. Map Reduce • Perfect for aggregation • Not used by StudyBlue • Not needed (yet) • Difficult with multiple collections • Reduce limited to masters • Keep scalability simple • Under considerationStudyBlue, Inc.
  • 32. Replication • Automated failover • Read scaling • Maintenance • Easy setup & configuration • “Seed” node(s) for clientsStudyBlue, Inc.
  • 33. Partitioning in the Cloud • Operations perspective • Dynamic changes in machines • Config servers track machines • Each node in replica set knows other nodes • Avoids restarting applications when Mongo servers change • Easy scaling • Local shard servers • Config servers store redundant copies • Two-phase commitStudyBlue, Inc.
  • 34. Useful EC2 Instance Types • Config servers • Mongo replica nodes • t1.micro or m1.small • Depends on memory needs • m2.xlarge, m2.2xlarge, m2.4xlarge or cc1.4xlarge Name Memory CU I/O m2.xlarge 17.1 GB 6.5 (2 cores x 3.25) medium m2.2xlarge 34.2 GB 13 (4 cores x 3.25) high m2.4xlarge 68.4 GB 26 (8 cores x 3.25) high cc1.4xlarge 23 GB 33.5 (2 x Xeon X5570) very highStudyBlue, Inc.
  • 35. Performance Issues • Missing indexes • Performance terrible without indexes • Index on the fly • Store array sizes in collection • OR vs IN • Redundant updates • Events not consolidatedStudyBlue, Inc.
  • 36. Lessons LearnedStudyBlue, Inc.
  • 37. Key Lessons • Amazon great, but plan for failure • Leverage test platforms • Use replica sets & partitions early • Indexes critical • Use IN instead of OR • Java API cumbersome, but solid • Design schema carefullyStudyBlue, Inc.
  • 38. Q&AStudyBlue, Inc.
  • 39. Contact usWeb: http://www.studyblue.comTwitter: @StudyBlueEmail: sean@studyblue.com StudyBlue, Inc.