Optimizing MongoDB: Lessons Learned at Localytics Benjamin Darfler MongoBoston - September 2011
Introduction Benjamin Darfler @bdarfler http://bdarfler.com Senior Software Engineer at Localytics Localytics Real time analytics for mobile applications 100M+ datapoints a day More than 2x growth over the past 4 months Heavy users of Scala, MongoDB and AWS This Talk Revised and updated from MongoNYC 2011
MongoDB at Localytics Use cases Anonymous loyalty information De-duplication of incoming data Scale today Hundreds of GBs of data per shard Thousands of ops per second per shard History In production for ~8 months Increased load 10x in that time Reduced shard count by more than a half
Disclaimer These steps worked for us and our data We verified them by testing early and often  You should too
Quick Poll Who is using MongoDB in production? Who is deployed on AWS? Who has a sharded deployment? More than 2 shards? More than 4 shards? More than 8 shards?
Optimizing Our Data Documents and Indexes
Shorten Names Before {super_happy_fun_awesome_name:"yay!"} After {s:"yay!"} Significantly reduced document size
Use BinData for uuids/hashes Before {u:"21EC2020-3AEA-1069-A2DD-08002B30309D"} After {u:BinData(0, "...")} Used BinData type 0, least overhead Reduced data size by more then 2x over UUID Reduced index size on the field
Override _id Before {_id:ObjectId("..."), u:BinData(0, "...")} After  {_id:BinData(0, "...")} Reduced data size Eliminated an index Warning: Locality - more on that later
Pre-aggregate Before {u:BinData(0, "..."), k:BinData(0, "abc")} {u:BinData(0, "..."), k:BinData(0, "abc")} {u:BinData(0, "..."), k:BinData(0, "def")} After {u:BinData(0, "abc"), c:2} {u:BinData(0, "def"), c:1} Actually kept data in both forms Fewer records meant smaller indexes
Prefix Indexes Before {k:BinData(0, "...")}  // indexed After { p:BinData(0, "...")  // prefix of k, indexed s:BinData(0, "...")  // suffix of k, not indexed } Reduced index size Warning: Prefix must be sufficiently unique Would be nice to have it built in - SERVER-3260
Sparse Indexes Create a sparse index db.collection.ensureIndex({middle:1}, {sparse:true}); Only indexes documents that contain the field {u:BinData(0, "abc"), first:"Ben", last:"Darfler"} {u:BinData(0, "abc"), first:"Mike", last:"Smith"} {u:BinData(0, "abc"), first:"John", middle:"F", last:"Kennedy"} Fewer records meant smaller indexes New in 1.8
Upgrade to {v:1} indexes Upto 25% smaller Upto 25% faster New in 2.0 Must reindex after upgrade
Optimizing Our Queries Reading and Writing
You are using an index right? Create an index db.collection.ensureIndex({user:1}); Ensure you are using it db.collection.find(query).explain(); Hint that it should be used if its not db.collection.find({user:u, foo:d}).hint({user:1}); I've seen the wrong index used before open a bug if you see this happen
Only as much as you need Before db.collection.find(); After db.collection.find().limit(10); db.collection.findOne(); Reduced bytes on the wire Reduced bytes read from disk Result cursor streams data but in large chunks
Only what you need Before db.collection.find({u:BinData(0, "...")}); After db.collection.find({u:BinData(0, "...")}, {field:1}); Reduced bytes on the wire Necessary to exploit covering indexes
Covering Indexes Create an index db.collection.ensureIndex({first:1, last:1}); Query for data only in the index db.collection.find({last:"Darfler"}, {_id:0, first:1, last:1}); Can service the query entirely from the index Eliminates having to read the data extent Explicitly exclude _id if its not in the index New in 1.8
Prefetch Before db.collection.update({u:BinData(0, "...")}, {$inc:{c:1}}); After db.collection.find({u:BinData(0, "...")}); db.collection.update({u:BinData(0, "...")}, {$inc:{c:1}}); Prevents holding a write lock while paging in data Most updates fit this pattern anyhow Less necessary with yield improvements in 2.0
Optimizing Our Disk Fragmentation
Inserts doc1 doc2 doc3 doc4 doc5
Deletes doc1 doc2 doc3 doc4 doc5 doc1 doc2 doc3 doc4 doc5
Updates doc1 doc2 doc3 doc4 doc5 doc1 doc2 doc3 doc4 doc5 doc3 Updates can be in place if the document doesn't grow
Reclaiming Freespace doc1 doc2 doc6 doc4 doc5 doc1 doc2 doc3 doc4 doc5
Memory Mapped Files doc1 doc2 doc6 doc4 doc5 } } page page Data is mapped into memory a full page at a time 
Fragmentation RAM used to be filled with useful data Now it contains useless space or useless data Inserts used to cause sequential writes Now inserts cause random writes
Fragmentation Mitigation Automatic Padding  MongoDB auto-pads records Manual tuning scheduled for 2.2 Manual Padding Pad arrays that are known to grow Pad with a BinData field, then remove it Free list improvement in 2.0 and scheduled in 2.2
Fragmentation Fixes Repair db.repairDatabase();  Run on secondary, swap with primary Requires 2x disk space Compact db.collection.runCommand( "compact" ); Run on secondary, swap with primary Faster than repair Requires minimal extra disk space New in 2.0 Repair, compact and import remove padding
Optimizing Our Keys Index and Shard
B-Tree Indexes - hash/uuid key Hashes/UUIDs randomly distribute across the whole b-tree
B-Tree Indexes - temporal key Keys with a temporal prefix (i.e. ObjectId) are right aligned
Migrations - hash/uuid shard key Chunk 1 k: 1 to 5 Chunk 2 k: 6 to 9 Shard 1                                                Shard 2 Chunk 1 k: 1 to 5 {k: 4, …} {k: 8, …} {k: 3, …} {k: 7, …} {k: 5, …} {k: 6, …} {k: 4, …} {k: 3, …} {k: 5, …}
Hash/uuid shard key Distributes read/write load evenly across nodes Migrations cause random I/O and fragmentation Makes it harder to add new shards Pre-split db.runCommand({split:"db.collection", middle:{_id:99}}); Pre-move db.adminCommand({moveChunk:"db.collection", find:{_id:5}, to:"s2"}); Turn off balancer db.settings.update({_id:"balancer"}, {$set:{stopped:true}}, true});
Migrations - temporal shard key Chunk 1 k: 1 to 5 Chunk 2 k: 6 to 9 Shard 1                                                Shard 2 Chunk 1 k: 1 to 5 {k: 3, …} {k: 4, …} {k: 5, …} {k: 6, …} {k: 7, …} {k: 8, …} {k: 3, …} {k: 4, …} {k: 5, …}
Temporal shard key Can cause hot chunks Migrations are less destructive Makes it easier to add new shards Include a temporal prefix in your shard key  {day: ..., id: ...} Choose prefix granularity based on insert rate low 100s of chunks (64MB) per "unit" of prefix i.e. 10 GB per day => ~150 chunks per day
Optimizing Our Deployment Hardware and Configuration
Elastic Compute Cloud Noisy Neighbor Used largest instance in a family (m1 or m2) Used m2 family for mongods Best RAM to dollar ratio Used micros for arbiters and config servers 
Elastic Block Storage Noisy Neighbor Netflix claims to only use 1TB disks RAID'ed our disks Minimum of 4-8 disks Recommended 8-16 disks RAID0 for write heavy workload RAID10 for read heavy workload
Pathological Test What happens when data far exceeds RAM? 10:1 read/write ratio Reads evenly distributed over entire key space
One Mongod Index out  of RAM Index in RAM One mongod on the host   Throughput drops more than 10x
Many Mongods Index out of RAM Index in RAM 16 mongods on the host Throughput drops less than 3x Graph for one shard, multiply by 16x for total
Sharding within a node One read/write lock per mongod  Ticket for lock per collection - SERVER-1240 Ticket for lock per extent - SERVER-1241 For in memory work load Shard per core For out of memory work load Shard per disk Warning: Must have shard key in every query Otherwise scatter gather across all shards Requires manually managing secondary keys Less necessary in 2.0 with yield improvements
Reminder These steps worked for us and our data We verified them by testing early and often  You should too
Questions? @bdarfler http://bdarfler.com

Optimizing MongoDB: Lessons Learned at Localytics

  • 1.
    Optimizing MongoDB: LessonsLearned at Localytics Benjamin Darfler MongoBoston - September 2011
  • 2.
    Introduction Benjamin Darfler@bdarfler http://bdarfler.com Senior Software Engineer at Localytics Localytics Real time analytics for mobile applications 100M+ datapoints a day More than 2x growth over the past 4 months Heavy users of Scala, MongoDB and AWS This Talk Revised and updated from MongoNYC 2011
  • 3.
    MongoDB at LocalyticsUse cases Anonymous loyalty information De-duplication of incoming data Scale today Hundreds of GBs of data per shard Thousands of ops per second per shard History In production for ~8 months Increased load 10x in that time Reduced shard count by more than a half
  • 4.
    Disclaimer These stepsworked for us and our data We verified them by testing early and often  You should too
  • 5.
    Quick Poll Whois using MongoDB in production? Who is deployed on AWS? Who has a sharded deployment? More than 2 shards? More than 4 shards? More than 8 shards?
  • 6.
    Optimizing Our DataDocuments and Indexes
  • 7.
    Shorten Names Before{super_happy_fun_awesome_name:"yay!"} After {s:"yay!"} Significantly reduced document size
  • 8.
    Use BinData foruuids/hashes Before {u:"21EC2020-3AEA-1069-A2DD-08002B30309D"} After {u:BinData(0, "...")} Used BinData type 0, least overhead Reduced data size by more then 2x over UUID Reduced index size on the field
  • 9.
    Override _id Before{_id:ObjectId("..."), u:BinData(0, "...")} After  {_id:BinData(0, "...")} Reduced data size Eliminated an index Warning: Locality - more on that later
  • 10.
    Pre-aggregate Before {u:BinData(0,"..."), k:BinData(0, "abc")} {u:BinData(0, "..."), k:BinData(0, "abc")} {u:BinData(0, "..."), k:BinData(0, "def")} After {u:BinData(0, "abc"), c:2} {u:BinData(0, "def"), c:1} Actually kept data in both forms Fewer records meant smaller indexes
  • 11.
    Prefix Indexes Before{k:BinData(0, "...")} // indexed After { p:BinData(0, "...")  // prefix of k, indexed s:BinData(0, "...")  // suffix of k, not indexed } Reduced index size Warning: Prefix must be sufficiently unique Would be nice to have it built in - SERVER-3260
  • 12.
    Sparse Indexes Createa sparse index db.collection.ensureIndex({middle:1}, {sparse:true}); Only indexes documents that contain the field {u:BinData(0, "abc"), first:"Ben", last:"Darfler"} {u:BinData(0, "abc"), first:"Mike", last:"Smith"} {u:BinData(0, "abc"), first:"John", middle:"F", last:"Kennedy"} Fewer records meant smaller indexes New in 1.8
  • 13.
    Upgrade to {v:1}indexes Upto 25% smaller Upto 25% faster New in 2.0 Must reindex after upgrade
  • 14.
    Optimizing Our QueriesReading and Writing
  • 15.
    You are usingan index right? Create an index db.collection.ensureIndex({user:1}); Ensure you are using it db.collection.find(query).explain(); Hint that it should be used if its not db.collection.find({user:u, foo:d}).hint({user:1}); I've seen the wrong index used before open a bug if you see this happen
  • 16.
    Only as muchas you need Before db.collection.find(); After db.collection.find().limit(10); db.collection.findOne(); Reduced bytes on the wire Reduced bytes read from disk Result cursor streams data but in large chunks
  • 17.
    Only what youneed Before db.collection.find({u:BinData(0, "...")}); After db.collection.find({u:BinData(0, "...")}, {field:1}); Reduced bytes on the wire Necessary to exploit covering indexes
  • 18.
    Covering Indexes Createan index db.collection.ensureIndex({first:1, last:1}); Query for data only in the index db.collection.find({last:"Darfler"}, {_id:0, first:1, last:1}); Can service the query entirely from the index Eliminates having to read the data extent Explicitly exclude _id if its not in the index New in 1.8
  • 19.
    Prefetch Before db.collection.update({u:BinData(0,"...")}, {$inc:{c:1}}); After db.collection.find({u:BinData(0, "...")}); db.collection.update({u:BinData(0, "...")}, {$inc:{c:1}}); Prevents holding a write lock while paging in data Most updates fit this pattern anyhow Less necessary with yield improvements in 2.0
  • 20.
    Optimizing Our DiskFragmentation
  • 21.
    Inserts doc1 doc2doc3 doc4 doc5
  • 22.
    Deletes doc1 doc2doc3 doc4 doc5 doc1 doc2 doc3 doc4 doc5
  • 23.
    Updates doc1 doc2doc3 doc4 doc5 doc1 doc2 doc3 doc4 doc5 doc3 Updates can be in place if the document doesn't grow
  • 24.
    Reclaiming Freespace doc1doc2 doc6 doc4 doc5 doc1 doc2 doc3 doc4 doc5
  • 25.
    Memory Mapped Filesdoc1 doc2 doc6 doc4 doc5 } } page page Data is mapped into memory a full page at a time 
  • 26.
    Fragmentation RAM usedto be filled with useful data Now it contains useless space or useless data Inserts used to cause sequential writes Now inserts cause random writes
  • 27.
    Fragmentation Mitigation AutomaticPadding  MongoDB auto-pads records Manual tuning scheduled for 2.2 Manual Padding Pad arrays that are known to grow Pad with a BinData field, then remove it Free list improvement in 2.0 and scheduled in 2.2
  • 28.
    Fragmentation Fixes Repairdb.repairDatabase();  Run on secondary, swap with primary Requires 2x disk space Compact db.collection.runCommand( "compact" ); Run on secondary, swap with primary Faster than repair Requires minimal extra disk space New in 2.0 Repair, compact and import remove padding
  • 29.
    Optimizing Our KeysIndex and Shard
  • 30.
    B-Tree Indexes -hash/uuid key Hashes/UUIDs randomly distribute across the whole b-tree
  • 31.
    B-Tree Indexes -temporal key Keys with a temporal prefix (i.e. ObjectId) are right aligned
  • 32.
    Migrations - hash/uuidshard key Chunk 1 k: 1 to 5 Chunk 2 k: 6 to 9 Shard 1                                                Shard 2 Chunk 1 k: 1 to 5 {k: 4, …} {k: 8, …} {k: 3, …} {k: 7, …} {k: 5, …} {k: 6, …} {k: 4, …} {k: 3, …} {k: 5, …}
  • 33.
    Hash/uuid shard keyDistributes read/write load evenly across nodes Migrations cause random I/O and fragmentation Makes it harder to add new shards Pre-split db.runCommand({split:"db.collection", middle:{_id:99}}); Pre-move db.adminCommand({moveChunk:"db.collection", find:{_id:5}, to:"s2"}); Turn off balancer db.settings.update({_id:"balancer"}, {$set:{stopped:true}}, true});
  • 34.
    Migrations - temporalshard key Chunk 1 k: 1 to 5 Chunk 2 k: 6 to 9 Shard 1                                                Shard 2 Chunk 1 k: 1 to 5 {k: 3, …} {k: 4, …} {k: 5, …} {k: 6, …} {k: 7, …} {k: 8, …} {k: 3, …} {k: 4, …} {k: 5, …}
  • 35.
    Temporal shard keyCan cause hot chunks Migrations are less destructive Makes it easier to add new shards Include a temporal prefix in your shard key  {day: ..., id: ...} Choose prefix granularity based on insert rate low 100s of chunks (64MB) per "unit" of prefix i.e. 10 GB per day => ~150 chunks per day
  • 36.
    Optimizing Our DeploymentHardware and Configuration
  • 37.
    Elastic Compute CloudNoisy Neighbor Used largest instance in a family (m1 or m2) Used m2 family for mongods Best RAM to dollar ratio Used micros for arbiters and config servers 
  • 38.
    Elastic Block StorageNoisy Neighbor Netflix claims to only use 1TB disks RAID'ed our disks Minimum of 4-8 disks Recommended 8-16 disks RAID0 for write heavy workload RAID10 for read heavy workload
  • 39.
    Pathological Test What happenswhen data far exceeds RAM? 10:1 read/write ratio Reads evenly distributed over entire key space
  • 40.
    One Mongod Indexout  of RAM Index in RAM One mongod on the host   Throughput drops more than 10x
  • 41.
    Many Mongods Indexout of RAM Index in RAM 16 mongods on the host Throughput drops less than 3x Graph for one shard, multiply by 16x for total
  • 42.
    Sharding within anode One read/write lock per mongod  Ticket for lock per collection - SERVER-1240 Ticket for lock per extent - SERVER-1241 For in memory work load Shard per core For out of memory work load Shard per disk Warning: Must have shard key in every query Otherwise scatter gather across all shards Requires manually managing secondary keys Less necessary in 2.0 with yield improvements
  • 43.
    Reminder These stepsworked for us and our data We verified them by testing early and often  You should too
  • 44.