3. HEYZAP
• Largest mobile gaming social network
• MongoDB the main datastore
• Also MySQL & Redis
• High number of reads, fewer writes
4. BUGSNAG
bugsnag.com
• Exception tracking service for mobile and web
• MongoDB only persistent datastore
• Redis caching
• Lots of writes, fewer reads
13. I/O
• When data is not in RAM, MongoDB hits the disk
• Ensure this happens infrequently
• When it does, it should be fast
• EBS throughput sucks
14. HOW TO KEEP I/O FAST
• Fast filesystem - 10gen recommends xfs
• Use RAID - e.g. RAID 10 (stripe of mirrors)
• Increase file descriptor limits
• Turn off atime and diratime
• Tweak read-ahead settings
• http://www.mongodb.org/display/DOCS/Production+Notes
18. REPLICA SETS
• Scales reads well
• One primary, many secondaries
• Read from all members
• Write to primary only
• Inconsistent reads from secondaries
19. SHARDING
• Many primaries, many secondaries
• Scales writes and reads
• Harder to set up well
21. STANDARD RULES
• Standard DB scaling rules apply to MongoDB
• Use skip() and limit()
• Return subsets of fields
• Index all your queries
• Run explain() on new/slow queries
23. SCHEMA DESIGN
• Indexes should be minimized in size and number
{
{
"name" : "Angry Birds",
"name" : "Angry Birds",
"android" : true,
"platform" : 3
"iphone" : true
}
}
24. SCHEMA DESIGN
• Minimize key lengths on small documents
• Can reduce storage requirements and performance increase
{
"_id":"AHAHSPGPGSAVKLPAPHSVGKSALR",
"game_id":"8122",
"user_id":"1854",
"session_start":"51067007",
"session_end":"51067085"
}
92 bytes
25. SCHEMA DESIGN
• Minimize key lengths on small documents
• Can reduce storage requirements and performance increase
{ {
"_id":"AHAHSPGPGSAVKLPAPHSVGKSALR", "_id":"AHAHSPGPGSAVKLPAPHSVGKSALR",
"game_id":"8122", "g":"8122",
"user_id":"1854", "u":"1854",
"session_start":"51067007", "s":"51067007",
"session_end":"51067085" "e":"51067085"
} }
92 bytes 58 bytes
About 1/3 memory saved!
26. PROFILER
• MongoDB has a built in profiler
• Use the profiler all the time
• db.setProfilingLevel(1, 100)
• ‘show profile’ shows recent profiles
• Stored in db.system.profile
30. MONITORS
• Chart the index size
• Chart the number of current ops
• Monitor index misses
• Monitor replication lag
• Monitor I/O performance (iostat)
• Monitor disk space
37. MONGO MONITORING SERVICE
• MMS is 10gen hosted Mongo monitoring
• Available as web app (https://mms.10gen.com)
• Android client also available from Google Play
38. KIBANA & LOGSTASH
• Logstash is open-source log parser - http://logstash.net/
• Kibana is an alternative UI for Logstash - http://kibana.org/
• Cool trend analysis for mongo logs
39. • Questions?
• Check out www.bugsnag.com
• Follow me on twitter @snmaynard
Editor's Notes
\n
\n
All user activity stored in mongo - checkins, game usernames, etc\nHeyzap SDK in many top tier titles - lots of events. Analytics for the millions of game sessions involving heyzap SDK\nGeospatial queries to find where people checked in\nSupplement Mongo with MySQL (allows you to do joins etc)\nAlso Redis as a caching layer\n
High burst write. People deploy bad code and we get all their exceptions.\nBugsnag uses Mongo and Redis alone. Redis caching layer on top of mongo\n\n\n
\n
Schemaless - No migrations. Migrating SQL caused a lot of downtime for Heyzap. \nFire & Forget - by default mongo doesnt wait for the write to complete before returning to the app.\n\n
Many pros are also cons. Know what you are getting into.\nSchemaless means the app has to cope with bad data/migrations/bad states etc\nFire & Forget you can use the safe keyword, but that affects speed\nNo joins, can only pull data from one collection at a time\nSingle write lock across a database. Not great for high proportion of writes, but writes yield - mitigate with db per collection in 2.2. 2.4 will have collection locks.\n
\n
You should design with performance in mind. Think future proof.\nWork out where your pain points will be\nBegin to scale before you hit 95% capacity. You need spare capacity to scale.\n
\n
Working set = often used data. In logging app it would be the last n days of logs. 99% of queries would be on that.\nIndexes and documents should be in RAM for best results. Bare minimum is indexes!\n
When RAM gets full! This is no exaggeration. Mongo’s performance drops massively\n
For Heyzap I/O is the single biggest headache on EC2. EBS random spikes. \nHeyzap moved to provisioned IOPS when it was released to smooth the spikes, rather than get better throughput.\n
xfs supports io suspend and write-cache flushing - essential for AWS snapshots\nincrease file descriptors to allow more open files\natime updates access times for files. That turns reads into writes = bad\nread-ahead means system will read extra blocks from disk when doing a read. Good for sequential access, bad for random (mongo) access\n
\n
Bigger machine.\nHard to get more on 1 machine, especially in the cloud.\nCan be viable in the short term. You can do this with no downtime. Heyzap & Bugsnag do\n
\n
If you use replica sets - monitor the replication lag. This should be close to zero. Otherwise users can write something but cant read it back.\nYou can send a “Write Concern” to say replicate to slaves. Can screw you if slaves are behind.\nAll working set still in memory on each member, just scales volume of reads, not data size\n
Can automatically shard, mongo supports that. Carefully pick your shard key to correctly distribute the load across shards.\nDistributes working set across all shards for big working sets. Also distributes writes.\nHeyzap did manual sharding by collection.\n
\n
Only returning what you need will be faster.\nI advise ensuring (on large datasets) that pretty much every query is indexed. Cron jobs running unindexed queries have caused Heyzap downtime. Smaller datasets is fine.\nRun explain on a new query you are about to deploy. Saves a lot of downtime! Verify it uses an index.\n
Means we dont have to read as many documents, which means we dont need to seek as much on disk.\nNot always applicable. Sometimes the same doc will be in too many diff places. Would make updates too hard.\n
If we wanted to index here on android and iphone separately. That would be 2 indexes.\nWe can combine them into one “bitfield”, halving our index size. Heyzap had a very similar issue with schema.\nMeans we can use less RAM. #1 rule in mongo, use less RAM\n
\n
Depends how small your values/documents are as to whether its worth it\nCan reduce your working set - commonly accessed documents smaller.\nNo effect on indexes\n
Small performance hit from using the profile is worth it. You need to know how fast your db is running.\nIn mongo (command line) run db.setProfilingLevel(1,100). Logs all queries that took more than 100ms.\nprofile is capped collection. May need resize depending on your throughput.\n
Sample output of profiler.\n
ts = when it ran. Tie that to your other logs\nnscanned = number of indexes or documents scanned\nscanAndOrder = when mongo cant use the index to sort\nnumYield = how many times it yielded, indication of page fault etc\nmillis = total duration\n
\n
Index size graphing will allow you to predict scaling needs. Heyzap could accurately predict to within ~ day\nCurrent Ops spikes show you when to look at profiler\nIndexes should rarely miss.\nReplication lag leads to bunk user experience on reads, and hard app code (read from primary).\n
\n
opid = opid - Pass this to db.killOp() to stop it\nns = namespace = database.collection\nCan show you why everything has suddenly gone slow, but you can miss the guilty query, profiler is better\n
Locks are the microsecond duration locked and waiting for locks\nindex counters say how many index hits we had. Miss means index not in RAM = bad.\n
Useful stats. Index size - keep in RAM\nGraph index size.\nThese metrics can help you predict the need for scaling\nCan also call db.collection.stats(). Get something similar\n\n
Can use --locks to show you lock statistics if you prefer that view\nGood to check if you aren’t sure what collections are heavily used\n