SHORTCUTS                          AROUND THE                         MISTAKES I’VE                         MADE SCALING  ...
What we do               We want to revolutionize the digital advertising               industry by showing that there is ...
Adsonsdag 21 september 11
Dataonsdag 21 september 11
Assembling sessions                         exposure                                    ping                  ping        ...
Crunching                                          session                          session           session             ...
Reportsonsdag 21 september 11
What we do               Track ads, make pretty reports.onsdag 21 september 11
That doesn’t               sound so hardonsdag 21 september 11
That doesn’t               sound so hard               We don’t know when sessions endonsdag 21 september 11
That doesn’t               sound so hard               We don’t know when sessions end               There’s a lot of data...
That doesn’t               sound so hard               We don’t know when sessions end               There’s a lot of data...
Numbersonsdag 21 september 11
Numbers               40 Gb dataonsdag 21 september 11
Numbers               40 Gb data               50 million documentsonsdag 21 september 11
Numbers               40 Gb data               50 million documents               per dayonsdag 21 september 11
How we use MongoDBonsdag 21 september 11
How we use MongoDB               “Virtual memory” to offload data while we wait               for sessions to finishonsdag 2...
How we use MongoDB               “Virtual memory” to offload data while we wait               for sessions to finish        ...
How we use MongoDB               “Virtual memory” to offload data while we wait               for sessions to finish        ...
Why we use MongoDBonsdag 21 september 11
Why we use MongoDB               Schemalessness makes things so much easier,               the data we collect changes as ...
Why we use MongoDB               Schemalessness makes things so much easier,               the data we collect changes as ...
Why we use MongoDB               Schemalessness makes things so much easier,               the data we collect changes as ...
Why we use MongoDB               Schemalessness makes things so much easier,               the data we collect changes as ...
Btw.onsdag 21 september 11
Btw.               We use JRuby, it’s awesomeonsdag 21 september 11
A story in 7 iterationsonsdag 21 september 11
1st iteration               secondary indexes and updatesonsdag 21 september 11
1st iteration               secondary indexes and updates               One document per session, update as new           ...
#1                         Everything is about                         working around the                           GLOBAL...
MongoDB 2.0.0                           db.coll.update({_id: "xyz"}, {$inc: {x: 1}}, true)                         db.coll...
MongoDB 1.8.1                           db.coll.update({_id: "xyz"}, {$inc: {x: 1}}, true)                         db.coll...
2nd iteration               using scans for two step assembling               Instead of updating, save each fragment, the...
2nd iteration               using scans for two step assembling               Outcome: not as much lock, but still not gre...
#2                         Everything is about                         working around the                           GLOBAL...
#3                           Give a lot of                         thought to your                         PRIMARY        ...
3rd iteration               partitioningonsdag 21 september 11
3rd iteration               partitioning               We came up with the idea of partitioning the               data by ...
3rd iteration               partitioning               We came up with the idea of partitioning the               data by ...
#4                         Make sure you can                         REMOVE                         OLD DATAonsdag 21 sept...
4th iteration               shardingonsdag 21 september 11
4th iteration               sharding               To get around the global write lock and get               higher write ...
4th iteration               sharding               To get around the global write lock and get               higher write ...
#5                         Everything is about                         working around the                           GLOBAL...
#6                           SHARDING                            IS NOT A                         SILVER BULLET           ...
onsdag 21 september 11
#7                         IT WILL FAIL                           design for itonsdag 21 september 11
onsdag 21 september 11
onsdag 21 september 11
5th iteration               moving things to separate clustersonsdag 21 september 11
5th iteration               moving things to separate clusters               We saw very different loads on the shards and...
5th iteration               moving things to separate clusters               We saw very different loads on the shards and...
#8                         Everything is about                         working around the                           GLOBAL...
#9                         ONE DATABASE                         with one usage pattern                         PER CLUSTER...
#10                          MONITOR                         EVERYTHING                         look at your health       ...
6th iteration               monster machinesonsdag 21 september 11
6th iteration               monster machines               We got new problems removing data and               needed some...
6th iteration               monster machines               We got new problems removing data and               needed some...
6th iteration               monster machines               We got new problems removing data and               needed some...
#11                         Don’t try to scale up                         SCALE OUTonsdag 21 september 11
#12                          When you’re                          out of ideas                         CALL THE           ...
7th iteration               partitioning (again) and pre-chunkingonsdag 21 september 11
7th iteration               partitioning (again) and pre-chunking               We rewrote the database layer to write to ...
7th iteration               partitioning (again) and pre-chunking               We rewrote the database layer to write to ...
#13                         Smaller objects means a                         smaller database, and a                       ...
#14                           Give a lot of                         thought to your                         PRIMARY       ...
#15                         Everything is about                         working around the                           GLOBA...
#16                         Everything is about                         working around the                           GLOBA...
KTHXBAI                                 @iconara                         architecturalatrocities.com                      ...
Since we got time…onsdag 21 september 11
Tips               Safe modeonsdag 21 september 11
Tips               Safe mode               Run every Nth insert in safe modeonsdag 21 september 11
Tips               Safe mode               Run every Nth insert in safe mode               This will give you warnings whe...
Tips               Avoid bulk insertsonsdag 21 september 11
Tips               Avoid bulk inserts               Very dangerous if there’s a possibility of               duplicate key...
Tips               EC2onsdag 21 september 11
Tips               EC2               You have three copies of your data, do you               really need EBS?onsdag 21 se...
Tips               EC2               You have three copies of your data, do you               really need EBS?            ...
Tips               EC2               You have three copies of your data, do you               really need EBS?            ...
Upcoming SlideShare
Loading in …5
×

Shortcuts around the mistakes I've made scaling MongoDB

753 views

Published on

Presentation held at MongoUK, September 2012

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
753
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
14
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Shortcuts around the mistakes I've made scaling MongoDB

  1. 1. SHORTCUTS AROUND THE MISTAKES I’VE MADE SCALING MONGODB Theo, Chief Architect atonsdag 21 september 11
  2. 2. What we do We want to revolutionize the digital advertising industry by showing that there is more to ad analytics than click through rates.onsdag 21 september 11
  3. 3. Adsonsdag 21 september 11
  4. 4. Dataonsdag 21 september 11
  5. 5. Assembling sessions exposure ping ping event ping ping ping ➔ ➔ session event pingonsdag 21 september 11
  6. 6. Crunching session session session session session session session session session session session ➔ ➔ 42 session sessiononsdag 21 september 11
  7. 7. Reportsonsdag 21 september 11
  8. 8. What we do Track ads, make pretty reports.onsdag 21 september 11
  9. 9. That doesn’t sound so hardonsdag 21 september 11
  10. 10. That doesn’t sound so hard We don’t know when sessions endonsdag 21 september 11
  11. 11. That doesn’t sound so hard We don’t know when sessions end There’s a lot of dataonsdag 21 september 11
  12. 12. That doesn’t sound so hard We don’t know when sessions end There’s a lot of data It’s all done in (close to) real timeonsdag 21 september 11
  13. 13. Numbersonsdag 21 september 11
  14. 14. Numbers 40 Gb dataonsdag 21 september 11
  15. 15. Numbers 40 Gb data 50 million documentsonsdag 21 september 11
  16. 16. Numbers 40 Gb data 50 million documents per dayonsdag 21 september 11
  17. 17. How we use MongoDBonsdag 21 september 11
  18. 18. How we use MongoDB “Virtual memory” to offload data while we wait for sessions to finishonsdag 21 september 11
  19. 19. How we use MongoDB “Virtual memory” to offload data while we wait for sessions to finish Short time storage (<48 hours) for batch jobsonsdag 21 september 11
  20. 20. How we use MongoDB “Virtual memory” to offload data while we wait for sessions to finish Short time storage (<48 hours) for batch jobs Metrics storageonsdag 21 september 11
  21. 21. Why we use MongoDBonsdag 21 september 11
  22. 22. Why we use MongoDB Schemalessness makes things so much easier, the data we collect changes as we come up with new ideasonsdag 21 september 11
  23. 23. Why we use MongoDB Schemalessness makes things so much easier, the data we collect changes as we come up with new ideas Sharding makes it possible to scale writesonsdag 21 september 11
  24. 24. Why we use MongoDB Schemalessness makes things so much easier, the data we collect changes as we come up with new ideas Sharding makes it possible to scale writes Secondary indexes and rich query language are great features (for the metrics store)onsdag 21 september 11
  25. 25. Why we use MongoDB Schemalessness makes things so much easier, the data we collect changes as we come up with new ideas Sharding makes it possible to scale writes Secondary indexes and rich query language are great features (for the metrics store) It’s just… niceonsdag 21 september 11
  26. 26. Btw.onsdag 21 september 11
  27. 27. Btw. We use JRuby, it’s awesomeonsdag 21 september 11
  28. 28. A story in 7 iterationsonsdag 21 september 11
  29. 29. 1st iteration secondary indexes and updatesonsdag 21 september 11
  30. 30. 1st iteration secondary indexes and updates One document per session, update as new data comes along Outcome: 1000% write lockonsdag 21 september 11
  31. 31. #1 Everything is about working around the GLOBAL WRITE LOCKonsdag 21 september 11
  32. 32. MongoDB 2.0.0 db.coll.update({_id: "xyz"}, {$inc: {x: 1}}, true) db.coll.update({_id: "abc"}, {$push: {x: “...”}}, true)onsdag 21 september 11
  33. 33. MongoDB 1.8.1 db.coll.update({_id: "xyz"}, {$inc: {x: 1}}, true) db.coll.update({_id: "abc"}, {$push: {x: “...”}}, true)onsdag 21 september 11
  34. 34. 2nd iteration using scans for two step assembling Instead of updating, save each fragment, then scan over _id to assemble sessionsonsdag 21 september 11
  35. 35. 2nd iteration using scans for two step assembling Outcome: not as much lock, but still not great performance. We also realised we couldn’t remove data fast enoughonsdag 21 september 11
  36. 36. #2 Everything is about working around the GLOBAL WRITE LOCKonsdag 21 september 11
  37. 37. #3 Give a lot of thought to your PRIMARY KEYonsdag 21 september 11
  38. 38. 3rd iteration partitioningonsdag 21 september 11
  39. 39. 3rd iteration partitioning We came up with the idea of partitioning the data by writing to a new collection every houronsdag 21 september 11
  40. 40. 3rd iteration partitioning We came up with the idea of partitioning the data by writing to a new collection every hour Outcome: lots of complicated code, lots of bugs, but we didn’t have to care about removing dataonsdag 21 september 11
  41. 41. #4 Make sure you can REMOVE OLD DATAonsdag 21 september 11
  42. 42. 4th iteration shardingonsdag 21 september 11
  43. 43. 4th iteration sharding To get around the global write lock and get higher write performance we moved to a sharded cluster.onsdag 21 september 11
  44. 44. 4th iteration sharding To get around the global write lock and get higher write performance we moved to a sharded cluster. Outcome: higher write performance, lots of problems, lots of ops time spent debuggingonsdag 21 september 11
  45. 45. #5 Everything is about working around the GLOBAL WRITE LOCKonsdag 21 september 11
  46. 46. #6 SHARDING IS NOT A SILVER BULLET and it’s buggy, if you can, avoid itonsdag 21 september 11
  47. 47. onsdag 21 september 11
  48. 48. #7 IT WILL FAIL design for itonsdag 21 september 11
  49. 49. onsdag 21 september 11
  50. 50. onsdag 21 september 11
  51. 51. 5th iteration moving things to separate clustersonsdag 21 september 11
  52. 52. 5th iteration moving things to separate clusters We saw very different loads on the shards and realised we had databases with very different usage patterns, some that made autosharding not work. We moved these off the cluster.onsdag 21 september 11
  53. 53. 5th iteration moving things to separate clusters We saw very different loads on the shards and realised we had databases with very different usage patterns, some that made autosharding not work. We moved these off the cluster. Outcome: a more balanced and stable clusteronsdag 21 september 11
  54. 54. #8 Everything is about working around the GLOBAL WRITE LOCKonsdag 21 september 11
  55. 55. #9 ONE DATABASE with one usage pattern PER CLUSTERonsdag 21 september 11
  56. 56. #10 MONITOR EVERYTHING look at your health graphs dailyonsdag 21 september 11
  57. 57. 6th iteration monster machinesonsdag 21 september 11
  58. 58. 6th iteration monster machines We got new problems removing data and needed some room to breathe and thinkonsdag 21 september 11
  59. 59. 6th iteration monster machines We got new problems removing data and needed some room to breathe and think Solution: upgraded the servers to High- Memory Quadruple Extra Large (with cheese).onsdag 21 september 11
  60. 60. 6th iteration monster machines We got new problems removing data and needed some room to breathe and think Solution: upgraded the servers to High- Memory Quadruple Extra Large (with cheese). I♥onsdag 21 september 11
  61. 61. #11 Don’t try to scale up SCALE OUTonsdag 21 september 11
  62. 62. #12 When you’re out of ideas CALL THE EXPERTSonsdag 21 september 11
  63. 63. 7th iteration partitioning (again) and pre-chunkingonsdag 21 september 11
  64. 64. 7th iteration partitioning (again) and pre-chunking We rewrote the database layer to write to a new database each day, and we created all chunks in advance. We also decreased the size of our documents by a lot.onsdag 21 september 11
  65. 65. 7th iteration partitioning (again) and pre-chunking We rewrote the database layer to write to a new database each day, and we created all chunks in advance. We also decreased the size of our documents by a lot. Outcome: no more problems removing data.onsdag 21 september 11
  66. 66. #13 Smaller objects means a smaller database, and a smaller database means LESS RAM NEEDEDonsdag 21 september 11
  67. 67. #14 Give a lot of thought to your PRIMARY KEYonsdag 21 september 11
  68. 68. #15 Everything is about working around the GLOBAL WRITE LOCKonsdag 21 september 11
  69. 69. #16 Everything is about working around the GLOBAL WRITE LOCKonsdag 21 september 11
  70. 70. KTHXBAI @iconara architecturalatrocities.com burtcorp.comonsdag 21 september 11
  71. 71. Since we got time…onsdag 21 september 11
  72. 72. Tips Safe modeonsdag 21 september 11
  73. 73. Tips Safe mode Run every Nth insert in safe modeonsdag 21 september 11
  74. 74. Tips Safe mode Run every Nth insert in safe mode This will give you warnings when bad things happen; like failoversonsdag 21 september 11
  75. 75. Tips Avoid bulk insertsonsdag 21 september 11
  76. 76. Tips Avoid bulk inserts Very dangerous if there’s a possibility of duplicate key errorsonsdag 21 september 11
  77. 77. Tips EC2onsdag 21 september 11
  78. 78. Tips EC2 You have three copies of your data, do you really need EBS?onsdag 21 september 11
  79. 79. Tips EC2 You have three copies of your data, do you really need EBS? Instance store disks are included in the price and they have predictable performance.onsdag 21 september 11
  80. 80. Tips EC2 You have three copies of your data, do you really need EBS? Instance store disks are included in the price and they have predictable performance. m1.xlarge comes with 1.7 TB of storage.onsdag 21 september 11

×