Here is the slides from the presentation I gave at the tech architecture meetup in cape town / codebridge - https://www.facebook.com/events/429242577096097/433182133368808/?notif_t=plan_mall_activity
2. Structure of my talk
Context / How we are using MongoDB
Terminology in MongoDB world
Deployment / Growing
The things that suck
3. What we do - 3 web apps, 1 MongoDB cluster
‧
Motribe - mobile social network
● ±50k monthly active users
‧ JudgeMe – mxit social app
● 1m signups in 45 days / 200k+ photo uploads
● 0.5m - 3m pageviews/day
‧
MxPix – mxit photo sharing app
● 200k+ signups in a week / 70k+ photo uploads
● 0.5m – 1m page views/day
4. MongoDB (from "humongous")
‧
JSON Document storage
‧ Documents are members of Collections
‧
Collections belong to Databases
‧
Indexes on any key in a document
‧ Javascript interface for queries
‧ Trivial Sharding of data
5. Some benefits
‧ Implicit creation of databases, collections
‧ Index and query anything inside document
(composite keys, 2d/geo indexing)
‧ Great documentation
‧ Great examples (e.g. randomization)
‧ Emphasis on Just Works™
‧ Drivers in many languages
‧ Active community
6. Grow up or out
‧ Horizontal – read/write throughput
● Shard databases - split collections over shards
● Shard collections - split documents over shards
‧ Vertical – read throughput
● Replica sets
– Auto master / slaves, voting, promotion, demotion
– Time delayed slaves
– Magic... bring up mongo server, inform config server, siphons
data
7. Moving parts at scale
‧ Routers – “mongos”
● Abstract the magic, route requests to the right place
● Collate responses from shards, etc.
● Apps connect to “mongos”
‧ Config servers
● Min 3x for redundancy
● Knows about dbs, shards, slaves
9. Monitoring
‧ Mongo Monitoring Service (MMS) is lovely
‧ JS console means you can do powerful
things on the terminal
‧ JS interface exposes lots of data
● Size of indexes
● Comprehensive info about where db/collections stored
● Location, status and health of nodes
10. The things that suck
‧ Index anything means indexes can get
large (e.g. we have 4gb of indexes on a big
collection)
‧ You need to aim to have all indexes+data
in ram
‧ MapReduce is available but meh
● Improving with each release, possible to hack onto
hadoop
11. More of the things that suck
‧ Implicit creation means mis-spelled dbs,
collections == WTF
‧ B-tree indexes mean queries/insertion by
_id are O(log n) not O(1) ala pure
key/value stores