Optimizing MongoDB:Lessons Learned at Localytics          Andrew Rollins            June 2011           MongoNYC
Me•   Email: my first name @ localytics.com•   twitter.com/andrew311•   andrewrollins.com•   Founder, Chief Software Archi...
Localytics• Real time analytics for mobile applications• Built on:  –   Scala  –   MongoDB  –   Amazon Web Services  –   R...
Why I‟m here: brain dump!• To share tips, tricks, and gotchas about:  –   Documents  –   Indexes  –   Fragmentation  –   M...
MongoDB at Localytics• Use cases:  – Anonymous loyalty information  – De-duplication of incoming data• Requirements:  – Hi...
Why MongoDB?•   Stability•   Community•   Support•   Drivers•   Ease of use•   Feature rich•   Scale out
OPTIMIZE YOUR DATADocuments and indexes
Shorten namesBad:{    super_happy_fun_awesome_name: “yay!”}Good:{    s: “yay!”}
Use BinData for UUIDs/hashesBad:{    u: “21EC2020-3AEA-1069-A2DD-08002B30309D”,    // 36 bytes plus field overhead} Good:{...
Override _idTurn this{    _id : ObjectId("47cc67093475061e3d95369d"),    u: BinData(0, “…”) // <- this is uniquely indexed...
Pack „em in• Look for cases where you can squish multiple  “records” into a single document.• Why?  – Decreases number of ...
Prefix IndexesSuppose you have an index on a large field, but that field doesn‟t havemany possible values. You can use a “...
FRAGMENTATION AND MIGRATIONHidden evils
Fragmentation• Data on disk is memory mapped into RAM.• Mapped in pages (4KB usually).• Deletes/updates will cause memory ...
New writes mingle with old data                     Data                     doc1                                  PageWri...
Dealing with fragmentation• “mongod --repair” on a secondary, swap with  primary.• 1.9 has in-place compaction, but this s...
The Dark Side of Migrations• Chunks are a logical construct, not physical.• Shard keys have serious implications.• What co...
Suppose the following     Chunk 1     • K is the shard key     k: 1 to 5                 • K is random     Chunk 2     k: ...
Migrate     Chunk 1                 Chunk 1     k: 1 to 5               k: 1 to 5     Chunk 2     k: 6 to 9    Shard 1    ...
Shard 1 is now heavily fragmented     Chunk 1                  Chunk 1     k: 1 to 5                k: 1 to 5     Chunk 2 ...
Why is this scenario bad?• Random reads• Massive fragmentation• New writes mingle with old data
How can we avoid bad migrations?• Pre-split, pre-chunk• Better shard keys for better locality   – Ideally where data in th...
Pre-split and move• If you know your key distribution, then pre-create  your chunks and assign them.• See this:  – http://...
Better shard keys• Usually means including a time prefix in your  shard key (e.g., {day: 100, id: X})• Beware of write hot...
OPTIMIZING HARDWARE/CLOUD
Working Set in RAM• EC2 m2.2xlarge, RAID0 setup with 16 EBS volumes.• Workers hammering MongoDB with this loop, growing da...
Pre-fetch• Updates hold a lock while they fetch the original  from disk.• Instead do a read to warm the doc in RAM under  ...
Shard per core• Instead of a shard per server, try a shard per  core.• Use this strategy to overcome write locks when  wri...
Amazon EC2• High throughput / small working set  – RAM matters, go with high memory instances.• Low throughput / large wor...
Amazon EBS• ~200 seeks per second per EBS on a good day• EBS has *much* better random IO perf than  ephemeral, but adds a ...
Further Reading•   MongoDB Performance Tuning     – http://www.scribd.com/doc/56271132/MongoDB-Performance-Tuning•   Monit...
Thank you.• Check out Localytics for mobile analytics!• Reach me at:  – Email: my first name @ localytics.com  – twitter.c...
Upcoming SlideShare
Loading in...5
×

Optimizing MongoDB: Lessons Learned at Localytics

72,140

Published on

Tips, tricks, and gotchas learned at Localytics for optimizing MongoDB installs. Includes information about document design, indexes, fragmentation, migration, AWS EC2/EBS, and more.

15 Comments
137 Likes
Statistics
Notes
No Downloads
Views
Total Views
72,140
On Slideshare
0
From Embeds
0
Number of Embeds
60
Actions
Shares
0
Downloads
843
Comments
15
Likes
137
Embeds 0
No embeds

No notes for slide

Optimizing MongoDB: Lessons Learned at Localytics

  1. 1. Optimizing MongoDB:Lessons Learned at Localytics Andrew Rollins June 2011 MongoNYC
  2. 2. Me• Email: my first name @ localytics.com• twitter.com/andrew311• andrewrollins.com• Founder, Chief Software Architect at Localytics
  3. 3. Localytics• Real time analytics for mobile applications• Built on: – Scala – MongoDB – Amazon Web Services – Ruby on Rails – and more…
  4. 4. Why I‟m here: brain dump!• To share tips, tricks, and gotchas about: – Documents – Indexes – Fragmentation – Migrations – Hardware – MongoDB on AWS• Basic to more advanced, a compliment to MongoDB Perf Tuning at MongoSF 2011
  5. 5. MongoDB at Localytics• Use cases: – Anonymous loyalty information – De-duplication of incoming data• Requirements: – High throughput – Add capacity without long down-time• Scale today: – Over 1 billion events tracked in May – Thousands of MongoDB operations a second
  6. 6. Why MongoDB?• Stability• Community• Support• Drivers• Ease of use• Feature rich• Scale out
  7. 7. OPTIMIZE YOUR DATADocuments and indexes
  8. 8. Shorten namesBad:{ super_happy_fun_awesome_name: “yay!”}Good:{ s: “yay!”}
  9. 9. Use BinData for UUIDs/hashesBad:{ u: “21EC2020-3AEA-1069-A2DD-08002B30309D”, // 36 bytes plus field overhead} Good:{ u: BinData(0, “…”), // 16 bytes plus field overhead}
  10. 10. Override _idTurn this{ _id : ObjectId("47cc67093475061e3d95369d"), u: BinData(0, “…”) // <- this is uniquely indexed } into{ _id : BinData(0, “…”) // was the u field}Eliminated an extra index, but be careful aboutlocality... (more later, see Further Reading at end)
  11. 11. Pack „em in• Look for cases where you can squish multiple “records” into a single document.• Why? – Decreases number of index entries – Brings documents closer to the size of a page, alleviating potential fragmentation• Example: comments for a blog post.
  12. 12. Prefix IndexesSuppose you have an index on a large field, but that field doesn‟t havemany possible values. You can use a “prefix index” to greatly decreaseindex size.find({k: <kval>}){ k: BinData(0, “…”), // 32 byte SHA256, indexed }into find({p: <prefix>, k: <kval>}){ k: BinData(0, “…”), // 28 byte SHA256 suffix, not indexed p: <32-bit integer> // first 4 bytes of k packed in integer, indexed}Example: git commits
  13. 13. FRAGMENTATION AND MIGRATIONHidden evils
  14. 14. Fragmentation• Data on disk is memory mapped into RAM.• Mapped in pages (4KB usually).• Deletes/updates will cause memory fragmentation. Disk RAM doc1 doc1 find(doc1) Page deleted deleted … …
  15. 15. New writes mingle with old data Data doc1 PageWrite docX docX doc3 doc4 Page doc5find(docX) also pulls in old doc1, wasting RAM
  16. 16. Dealing with fragmentation• “mongod --repair” on a secondary, swap with primary.• 1.9 has in-place compaction, but this still holds a write-lock.• MongoDB will auto-pad records.• Pad records yourself by including and then removing extra bytes on first insert. – Alternative offered in SERVER-1810.
  17. 17. The Dark Side of Migrations• Chunks are a logical construct, not physical.• Shard keys have serious implications.• What could go wrong? – Let‟s run through an example.
  18. 18. Suppose the following Chunk 1 • K is the shard key k: 1 to 5 • K is random Chunk 2 k: 6 to 9 Shard 1 {k: 3, …} 1st write {k: 9, …} 2nd write {k: 1, …} and so on {k: 7, …} {k: 2, …} {k: 8, …}
  19. 19. Migrate Chunk 1 Chunk 1 k: 1 to 5 k: 1 to 5 Chunk 2 k: 6 to 9 Shard 1 Shard 2 {k: 3, …} {k: 3, …} {k: 9, …} Random IO {k: 1, …} {k: 1, …} {k: 2, …} {k: 7, …} {k: 2, …} {k: 8, …}
  20. 20. Shard 1 is now heavily fragmented Chunk 1 Chunk 1 k: 1 to 5 k: 1 to 5 Chunk 2 k: 6 to 9 Shard 1 Shard 2 {k: 3, …} {k: 3, …} {k: 9, …} {k: 1, …} {k: 1, …} Fragmented {k: 2, …} {k: 7, …} {k: 2, …} {k: 8, …}
  21. 21. Why is this scenario bad?• Random reads• Massive fragmentation• New writes mingle with old data
  22. 22. How can we avoid bad migrations?• Pre-split, pre-chunk• Better shard keys for better locality – Ideally where data in the same chunk tends to be in the same region of disk
  23. 23. Pre-split and move• If you know your key distribution, then pre-create your chunks and assign them.• See this: – http://blog.zawodny.com/2011/03/06/mongodb-pre- splitting-for-faster-data-loading-and-importing/
  24. 24. Better shard keys• Usually means including a time prefix in your shard key (e.g., {day: 100, id: X})• Beware of write hotspots• How to Choose a Shard Key – http://www.snailinaturtleneck.com/blog/2011/01/04/ho w-to-choose-a-shard-key-the-card-game/
  25. 25. OPTIMIZING HARDWARE/CLOUD
  26. 26. Working Set in RAM• EC2 m2.2xlarge, RAID0 setup with 16 EBS volumes.• Workers hammering MongoDB with this loop, growing data: – Loop { insert 500 byte record; find random record }• Thousands of ops per second when in RAM• Much less throughput when working set (in this case, all data and index) grows beyond RAM. Ops per second over time In RAM Not In RAM
  27. 27. Pre-fetch• Updates hold a lock while they fetch the original from disk.• Instead do a read to warm the doc in RAM under a shared read lock, then update.
  28. 28. Shard per core• Instead of a shard per server, try a shard per core.• Use this strategy to overcome write locks when writes per second matter.• Why? Because MongoDB has one big write lock.
  29. 29. Amazon EC2• High throughput / small working set – RAM matters, go with high memory instances.• Low throughput / large working set – Ephemeral storage might be OK. – Remember that EBS IO goes over Ethernet. – Pay attention to IO wait time (iostat). – Your only shot at consistent perf: use the biggest instances in a family.• Read this: – http://perfcap.blogspot.com/2011/03/understanding- and-using-amazon-ebs.html
  30. 30. Amazon EBS• ~200 seeks per second per EBS on a good day• EBS has *much* better random IO perf than ephemeral, but adds a dependency• Use RAID0• Check out this benchmark: – http://orion.heroku.com/past/2009/7/29/io_performanc e_on_ebs/• To understand how to monitor EBS: – https://forums.aws.amazon.com/thread.jspa?messag eID=124044
  31. 31. Further Reading• MongoDB Performance Tuning – http://www.scribd.com/doc/56271132/MongoDB-Performance-Tuning• Monitoring Tips – http://blog.boxedice.com/mongodb-monitoring/• Markus‟ manual – http://www.markus-gattol.name/ws/mongodb.html• Helpful/interesting blog posts – http://nosql.mypopescu.com/tagged/mongodb/• MongoDB on EC2 – http://www.slideshare.net/jrosoff/mongodb-on-ec2-and-ebs• EC2 and Ephemeral Storage – http://www.gabrielweinberg.com/blog/2011/05/raid0-ephemeral-storage-on-aws- ec2.html• MongoDB Strategies for the Disk Averse – http://engineering.foursquare.com/2011/02/09/mongodb-strategies-for-the-disk-averse/• MongoDB Perf Tuning at MongoSF 2011 – http://www.scribd.com/doc/56271132/MongoDB-Performance-Tuning
  32. 32. Thank you.• Check out Localytics for mobile analytics!• Reach me at: – Email: my first name @ localytics.com – twitter.com/andrew311 – andrewrollins.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×