MongoSF 2012: MongoDB Deployment Preparedness


Published on

Published in: Technology, Business
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • This is a vast topic, because it encompasses every part of the development and administration process.\n\n“And much of what I’ll say falls under the ‘yeah yeah, we know’ category of advice. Maybe I should rename this talk to ‘Remember to floss after meals.’ That said, I hope to put all these tips into proper perspective”\n
  • Just state the sign posts....\n\nMany of the topics I’m touching on today will be covered in much greater detail in other presentations today.\n
  • "I've been working commercial support at 10gen for over a year now. 75% of the people I come in contact with call AFTER the emergency has happened. After demand has outstripped capacity, after the second replica set member went down or after they launch some minor feature change....."\n\nSo, what is deployment preparedness? It boils down to 2 broad problem areas:\nA forecasting problem\nDisaster recovery\n
  • Many things are unknowable when you start a project:\n 1. traffic patterns\n 2. hardware fire drills\n 3. changing business reqs -- let's face it, they _will_ change\n\nBut that’s what makes it FUN!\n\n\n
  • We know what this is. We’ve all been through it. How do you truly prepare though?\n\nThis is more than a check list, it’s something we need to continually re-address as your deployment evolves.\n
  • 5 features of a successfully deployed application\n\n* Prototype: early and then often for each sub-system\n* Test, especially load test: realistically, holistically\n* Monitor: analysis, logging, etc. learn how to use your tools\n* Capacity planning: aside from a good product, most crucial determining factor to successful MongoDB deployment\n* Operations playbook: last but not least\n\n“We’ll talk about each and how MongoDB fits into each step”\n
  • MongoDB at the end of the day is just a data store -- you need to approach development and deployment as you would any other database. However, taking advantage of it’s ease of use and agile \n\n“The recommendations given today could be slotted into any development process, but the key is to figure out how they each feed into each other either by changing your deployment topology or by providing the numbers you use to make forecasts and decisions in the next step.”\n\nThey are all interleaved and really difficult to talk about one without the other.. So really it’s better to look at the steps as a continuum.\n
  • I’ll keep coming back to this slide to emphasize how each step flows back into the next\n\n\n
  • * Chicken and egg problem - nothing else can proceed until this occurs\n\n\n
  • * Development process is full of unknowables and requirements WILL change\n\n* MongoDB doesn’t walk on water! Make sure you’ve chosen the right tool for the job.\n\n* Start with a simple design\n\nYou can easily spend weeks of developer time designing and optimizing your wombat tracking device only to discover that your neglected wombat aggregation system is the bottleneck once you go live.\n\n\n\n
  • And especially LOAD, PERFORMANCE and SOAK tests\n\n
  • I’ve certainly worked in an environment in which it’s hard to get support for good testing\n
  • You may think you’re memory bound, when in fact you’re cpu\n
  • Example: \nImagine you have both a users collection vs. events collection (1000x bigger) \n-- Need to be tested in isolation AND together\n\nMaybe you’ll see that they don’t work great together and you’ll split your deployment\n\n
  • “As mentioned, all of these process steps are interleaved, but I believe proper monitoring and knowing how to use it is the lynchpin to a successful deployment”\n\n“Often I’ll when I speak with users about their monitoring strategy I get, ‘yeah we’ve been meaning to set that up...’ and then they trail off.\n
  • The most crucial piece to get right.\nWHY? Because it gives you vision\n\nThere are probably a thousand metaphors I could use here, but I like the idea of looking at your monitoring set up as both a microscope and a telescope.\n\nMicroscope == fine grained statistics and realtime examination\nTelescope == A way to look into the past and extrapolate into the future (NEED TO FIX THIS METAPHOR)\n\n\n
  • Instrumentation: counters and metrics need to be built into app\n--> ability to test (turn off/turn on)\n--> for business and planning\n\nAlerts feed into your Ops playbook\n\n
  • Learn what to look for and understand what the numbers mean in steady state and stress test\n\nOf course these numbers can’t be analyzed in a vacuum. High lock% is appropriate for a write heavy app but can kill a read heavy app. Lots of page faults or index misses on a read heavy app are going to be a problem\n
  • For example, you can see you have high lock%, but high write throughput\n
  • Say what this chart is\n
  • \n
  • Graph of memory usage over 6 months\n
  • Once you’re in production, you are essentially “married” to your stack\n\nLearn to read the numbers in both health and in stress\n
  • Talking to users it seems many of them approach planning by feel, rather than using numbers.\n\n"It may seem like an art, but sometimes it seems people feel the rules of physics don't apply to them -- in fact, there is a knowable number of iops you can squeeze out your 15k spinning disk, your EBS volume, or even an array of SSDs -- we'll talk about some of the simple math and models you can use to make some provisioning decisions"\n\nForecasting and Analysis\n\nExamples:\nshutterfly,, ad serving\n
  • \n
  • \n
  • working set = indexes and active documents \n\nEvery data set is different, but the functionality to inspect is built in and readily accessible\n\nIf someone were to ask you how much space do my indexes take, you need to be able to access this data\n\n\n
  • There are many variables, but \n\nRandom Access - 4sq\nLong Tail - shutterfly, twitter stream\nLogging events - click tracker\n\n\n
  • Random access or locality of data?\n\nHi vs Low latency?\n4sq or viber need 100% of data in memory because of random access. how? shard! \n\nAd service networks usually have extremely low latency SLA’s and if that’s the biz req, then you basically can never hit disk\n\nshutterfly only needs to keep most recent data in memory\n\n
  • Last thing I’m going to talk about, but something that has to be considered from the start.\n\n“It’s hard to choose one of the segments, but this may be the most important”\n
  • First practice on your dev box.\nThen practice under Load\n
  • It’s not practical to have it all fleshed out in advance\n\nIf I ask you which is more important, feature X or realistic testing...\n
  • \n
  • MongoSF 2012: MongoDB Deployment Preparedness

    1. 1. DeploymentPreparedness Dan Pasette @pasette
    2. 2. This Talk• What is deployment preparedness?• Recommended process and tips• Q &A
    3. 3. About me• Engineering Manager in NYC• Server side developer for 15 years• Everyone works with the community
    4. 4. Forecasting
    5. 5. Disaster Recovery
    6. 6. Elements of Success• Prototype• Test• Monitor• Capacity Planning• Ops Playbook
    7. 7. A word about process• Iterative seems to work best• Requirements are required
    8. 8. Reinventing the Wheel Prototype Ops Playbook Test Capacity Planning Monitor
    9. 9. Prototype
    10. 10. Design Matters• Development spikes should be easy• Schema free does not mean free-form• Index design is crucial• Pre-optimization is the root of all evil!
    11. 11. Prototype OpsPlaybook Test Capacity Planning Monitor
    12. 12. Why hard?• PITA• No obvious biz case (not a “feature”)• Time - last task• $$$
    13. 13. Why do it?• Reduce the set of unknowables• Prove out theories• Provide input data for planning
    14. 14. What should I do?• Part of the process from the start• Isolate (start small)• Get real (holistic and realistic)
    15. 15. Prototype OpsPlaybook Test Capacity Planning Monitor
    16. 16. Kinds of Monitoring• Instrumentation and logging• Trending• Alerts
    17. 17. Microscope• Fine grained stats to tell you what’s happening now• explain(), db profiling system• mongostat, iostat, mongotop...• MMS -Free, hosted monitoring from 10gen
    18. 18. mongostat
    19. 19. MMS Microscope
    20. 20. Telescope• Trending stats to help you analyze the past and predict the future• munin, cacti, nagios, zabix, graphite...• MMS -
    21. 21. MMS Telescope
    22. 22. Prototype OpsPlaybook Test Capacity Planning Monitor
    23. 23. Ways to Scale• Vertical • RAM, SSDs• Horizontal • Sharding and replica sets• Iterate your schema design!
    24. 24. Sizing Considerations• Access Pattern• Working Set• Latency Requirements• Data Security
    25. 25. Working Set> db.blogs.stats(){ "ns" : "test.blogs", Size of data "count" : 1338330, "size" : 46915928, Average "avgObjSize" : 35.05557523181876, document size "storageSize" : 86092032, "numExtents" : 12, Size on disk (and in "nindexes" : 2, memory!) "lastExtentSize" : 20872960, "paddingFactor" : 1, "flags" : 0, Size of all indexes "totalIndexSize" : 99860480, "indexSizes" : { "_id_" : 55877632, Size of each index "name_1" : 43982848 },
    26. 26. Access Pattern & Working Set• Random Access• Long Tail, Right Balanced• Event Stream
    27. 27. Prototype OpsPlaybook Test Capacity Planning Monitor
    28. 28. Plays in the book• Backups• Restores (backups are not enough)• Upgrades• Replica Set Ops • understand the oplog• Sharding Ops
    29. 29. Wrap Up• Eat your veggies• Humans are irrational animals• Just do what you can
    30. 30. Download MongoDB and let us know what you think @pasette @mongodb 10gen is hiring!