Performance Tuning and Monitoring Using MMS

298 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
298
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • 5-10 minutes for What, Why, and How, and then the rest of the time to Performance and monitoring and the wrap-up.Talk a little bit about why it’s helpful for 10gen support.
  • Talk about how many current users of MMSGive some idea of size/ scale of customersAsk current users to raise their hands
  • Understand the components (i.e. potential bottlenecks)Test and measure each oneWatch performance before, during, after the testsWatch trends over time
  • Know your environment – a critical piece of understanding what changed is to know the way things were before. The great thing about MMS is that not only does it provide you with what’s happening right now, but it also provides you with history – the sort of context you need to be able to identify changes, which is a critical piece of finding and fixing bottlenecks.
  • Memory: get back to thisOpcounters: commands, queries, etc. per time unitLock%: Time spent in a write-lock state, global time == global lock + hottest database.Queues: operations waiting for read lock, write lock, or global lock (total).Background flush: time it takes to flush the journal to disk (via fsync) – by default once per minute, so the closer to 60s, the bigger the problem.Repl Lag: number of seconds secondary is behind primary in writing each oplog entry.Replica: number of hours of oplog on the primary
  • We had a customer report replication lag – almost 150,00 seconds of it. We examined their systems – checked CPU, checked IO capacity, checked network utilization, and had them do an initial sync via data file copy, and nothing worked – even though the systems seemed fine.
  • Background index creation on secondaries: fixed in 2.6
  • Background index creation on secondaries: fixed in 2.6
  • NOTE: That’s the minimum required assuming no overhead, no competing traffic, nothing else… and that’s just to keep up!In the customer’s case, they had huge updates, which since the oplog is idempotent, meant huge oplog entries, and it turns out the bandwidth required was 3x their available bandwidth (30 mbps vs 10 mbps).
  • Moral of the story: pay attention to these things, get alerted when they first start to go south, and you can resolve them before things blow up at 3 am.
  • Memory: resident vs. virtual vs. mapped vs. non-mapped (connections)Page faults: accessing a page of memory that is in virtual memory but not resident in physical memory. Page fault on normal spinning disk is ~40k slower than direct memory access. However, the size of page faults also matters: 100 small page faults/ sec might be better than 10 large ones! Check readahead!Record stats: number of accesses not in memory, and page faults required to get them into memoryBtree: misses/ missRatio indicates indexes can’t be stored in memory (see above re: page faulting)
  • Memory: resident vs. virtual vs. mapped vs. non-mapped (connections)Page faults: accessing a page of memory that is in virtual memory but not resident in physical memory. Page fault on normal spinning disk is ~40k slower than direct memory access. However, the size of page faults also matters: 100 small page faults/ sec might be better than 10 large ones! Check readahead!Record stats: number of accesses not in memory, and page faults required to get them into memoryBtree: misses/ missRatio indicates indexes can’t be stored in memory (see above re: page faulting)
  • Background flush: average time it takes to flush the journal to disk (fsync).IO time: amount of time (in ms) spent waiting on disk for a read or write operation.
  • Also worth noting: the exposed DB check in settings, to tell you if you messed up your firewall settings.
  • Performance Tuning and Monitoring Using MMS

    1. 1. @gregormacadam Gregor Macadam #MongoDBDays Performance Tuning and Monitoring Using MMS
    2. 2. Performance Tuning and Monitoring Using MMS, Nicholas Tang Agenda • What is MMS? • Why use it? • Setting it up and getting around • Performance and monitoring (the fun stuff) • Wrap up
    3. 3. What is MMS?
    4. 4. Performance Tuning and Monitoring Using MMS, Nicholas Tang What is MMS? • The MongoDB Monitoring Service: a free service (or software) for monitoring and management
    5. 5. Performance Tuning and Monitoring Using MMS, Nicholas Tang Metric collection and reporting
    6. 6. Performance Tuning and Monitoring Using MMS, Nicholas Tang Alerting
    7. 7. Performance Tuning and Monitoring Using MMS, Nicholas Tang Event Tracking
    8. 8. Performance Tuning and Monitoring Using MMS, Nicholas Tang Logs and Profile data
    9. 9. Performance Tuning and Monitoring Using MMS, Nicholas Tang Hardware stats (CPU, disk)
    10. 10. Performance Tuning and Monitoring Using MMS, Nicholas Tang DB stats
    11. 11. Performance Tuning and Monitoring Using MMS, Nicholas Tang Basic user management
    12. 12. What’s in it for me?
    13. 13. Performance Tuning and Monitoring Using MMS, Nicholas Tang Why? • Great high level view + detailed metrics • Low effort, high-return • Makes it easier for us to help you! • Makes you more attractive, promotes bone strength and muscle tone * * - these last points still under review
    14. 14. How do I use this crazy thing?
    15. 15. Performance Tuning and Monitoring Using MMS, Nicholas Tang Setting it up http://mms.10gen.com/help/monitoring/tutorial/ • Setup an account • Install the agent • Add your hosts • Optional: hardware stats through munin-node • Optional: enable logging and profiling • More info: http://mms.10gen.com/help/monitoring/install/
    16. 16. Performance Tuning and Monitoring Using MMS, Nicholas Tang Notes • Agent written in Python (moving to Go) • Failover: run multiple agents (1 primary) • Hosts: use CNAMEs, especially on AWS! • You can use a group per env (each needs an agent) • Connections are over SSL • On-Premise solution for Enterprise customers that don’t want to use the hosted service
    17. 17. Performance tuning and monitoring
    18. 18. Performance Tuning and Monitoring Using MMS, Nicholas Tang Finding the bottleneck Source:http://www.flickr.com/photos/laenulfean/462715479/
    19. 19. Performance Tuning and Monitoring Using MMS, Nicholas Tang What is performance tuning? 1. Assess the problem and establish acceptable behavior 2. Measure the current performance 3. Find the bottleneck* 4. Remove the bottleneck 5. Re-test to confirm 6. Wash, rinse, repeat * - (This is the hard part) (Adapted from http://en.wikipedia.org/wiki/Performance_tuning )
    20. 20. Performance Tuning and Monitoring Using MMS, Nicholas Tang Pro-Tip: know thyself You have to recognize normal to know when it isn’t. Source:http://www.flickr.com/photos/skippy/6853920/
    21. 21. Performance Tuning and Monitoring Using MMS, Nicholas Tang Some handy metrics to watch • Memory usage • Opcounters • Lock % • Queues • Background flush average • Replication stats
    22. 22. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: replication lag 150,000s of lag == almost 2 days of lag!
    23. 23. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: replication lag Some common causes of replication lag: • Secondaries underspecced vs primaries • Access patterns between primary/ secondaries • Insufficient bandwidth • Foreground index builds on secondaries
    24. 24. Performance Tuning and Monitoring Using MMS, Nicholas Tang Fun fact: oplog idempotency Operations in the oplog only affect the value once, so they can be run multiple times safely. Example: If you increment n from 2 to 3, n = 3 is fine; n + 1 is not. Frequent, large updates means a big oplog to sync.
    25. 25. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: replication lag • Secondaries underspecced vs primaries • Access patterns between primary/ secondaries • Insufficient bandwidth • Foreground index builds on secondaries “…when you have eliminated the impossible,whatever remains,however improbable,must be the truth…” -- Sherlock Holmes SirArthur Conan Doyle,The Sign of the Four
    26. 26. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: replication lag Example: • ~1500 ops per minute (opcounters) • 0.1 MB per object (average object size, local db) ~1500 ops/min / 60 seconds * 0.1 MB/op * 8b/B =~ 20 mbps required bandwidth
    27. 27. Performance Tuning and Monitoring Using MMS, Nicholas Tang Use alerts! Don’t wait until your secondaries fall off your oplog!
    28. 28. Performance Tuning and Monitoring Using MMS, Nicholas Tang Examining memory and disk Memory: resident vs virtual vs (non-)mapped
    29. 29. Performance Tuning and Monitoring Using MMS, Nicholas Tang Examining memory and disk Page faults and Record Stats
    30. 30. Performance Tuning and Monitoring Using MMS, Nicholas Tang Examining memory and disk Background flush and Disk IO (Checkout http://www.wmarrow.com/strcalc/ )
    31. 31. Performance Tuning and Monitoring Using MMS, Nicholas Tang Monitoring: watch for warnings MMS warns you if your systems have startup warnings or if they are running outdated versions. Don’t ignore these!
    32. 32. Wrapping up
    33. 33. Performance Tuning and Monitoring Using MMS, Nicholas Tang What’s next? • Visual update • Backup service (join the queue!) • More UI/ UX improvements: – Enhanced dashboards – Improved cluster view
    34. 34. Performance Tuning and Monitoring Using MMS, Nicholas Tang Summary • MMS is a great, free service • Setup is easy • Metrics are awesome, preventing failures even more awesome • There’s more functionality coming soon!
    35. 35. Performance Tuning and Monitoring Using MMS, Nicholas Tang Questions?

    ×