- Developer at heart\n- 15 years experience\n- Responsible for selecting Mongo\n\n
- 15 person startup\n- Bottom-up attempt to improve student outcomes through disruptive change outside of the education system.&#xA0;\n- Allows students to create and store lecture notes and flashcards and access them online and via mobile apps (iOS and Android)\n
- No public numbers (low millions)\n- 4000 simultaneous users (peak)\n- 120+ countries\n- Daily cycle slowly flattening\n
- 20 million cards at the time\n- Over 60 million cards now\n- Expect 100 million cards in next 6 months\n
Flashcard Scoring • Track ﬂashcard scoring over time • Every single card • Every single user • Forever • Provide aggregate statistics • Flashcard deck • Folder • Overall • Focus on content masteryStudyBlue, Inc.
The Problem • Reasonably large number of cards • Large number of users • Users base increasing rapidly • Shift in usage - increasing faster than users • Time on site • Decks per user • Average deck size • Study sessions per userStudyBlue, Inc.
StudyBlue Database Problems • Amazon EC2 • Large number of simultaneous users • High write volume • Single PostgreSQL database • Large tablesStudyBlue, Inc.
Alternatives • Amazon Simple DB • Far too simple • Cassandra • Difﬁcult to add nodes and rebalance • Column families cannot be modiﬁed w/out restart • CouchDB • Difﬁcult to add nodes and rebalance • Redis • No native support for sharding/partitioning • Master/slave only - no automatic failoverStudyBlue, Inc.
MongoDB for the Win • Highly available • Replica sets • Automatic failover • Horizontal scaling across shards • Improved write performance • Improved availability during failures • Easy to add additional shards • Easier maintenanceStudyBlue, Inc.
Development • 100% Java • Existing PostgreSQL database • System of record • Synchronization issuesStudyBlue, Inc.
SQL Integration & Synchronization • PostgreSQL considered system of record • Asynchronous event driven • Web servers queue change events • Scoring servers process events • Query PostgreSQL • Update MongoDBStudyBlue, Inc.
DevOps • Amazon EC2 • Separate dev, test and production environments • Scripting & automation • Creation • Cloning • Conﬁguration management with ChefStudyBlue, Inc.
Even More Data • Moved existing tables from PostgreSQL to MongoDB • Four PostgreSQL tables with millions of rows combined into single collection • New development uses MongoDB: • Analytics data with 300+ million documentsStudyBlue, Inc.
SQL Integration Part 2 • MongoDB considered system of record • Web servers interact with MongoDB directly • More complex structures, fewer shallow collectionsStudyBlue, Inc.
Partitioning in the Cloud • Operations perspective • Dynamic changes in machines • Conﬁg servers track machines • Each node in replica set knows other nodes • Avoids restarting applications when Mongo servers change • Easy scaling • Local shard servers • Conﬁg servers store redundant copies • Two-phase commitStudyBlue, Inc.
Picking a shard key • Shard key selection critical for proper distribution • Spread writes across cluster • Depends on usage • Single document vs aggregation • Examples all time-series data • Cannot be changedStudyBlue, Inc.
Sharding - Gritty Details • Chunks • 64 MB blocks of data • Splits • 1 chunk turns into 2 chunks • Rebalance • Move chunks to different nodes • Maintain even distribution of chunksStudyBlue, Inc.
Rebalancing Challenges • Splits have to ﬁnd mid point of chunk • Very I/O expensive for collections with small documents • Decreased chunk size • Made documents larger & more complex • Can be a drain on system • Needs to run frequentlyStudyBlue, Inc.
Replication Lag • Eventual consistency • No guarantees about lag • Replica safe writes • Data committed to at least 2 nodes • Can cause problems with high replication lag • Security vs timeStudyBlue, Inc.