• Like
MongoDB performance tuning and monitoring with MMS
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

MongoDB performance tuning and monitoring with MMS

  • 780 views
Published

Using the MongoDB Monitoring Service to monitor your MongoDB instance(s) and track down performance issues - including two real-world examples of how we tracked down problems using MMS to understand …

Using the MongoDB Monitoring Service to monitor your MongoDB instance(s) and track down performance issues - including two real-world examples of how we tracked down problems using MMS to understand the environment, figure out what changed, and help us rapidly drill into a successful diagnosis.

Published in Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
780
On SlideShare
0
From Embeds
0
Number of Embeds
4

Actions

Shares
Downloads
14
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Show of hands: who is responsible in some ways for monitoring? Who has used Nagios, Cacti, Zenoss, Graphite, or some other similar tools?
  • 5-10 minutes for What, Why, and How, and then the rest of the time to Performance and monitoring and the wrap-up.Talk a little bit about why it’s helpful for 10gen support.
  • Understand the components (i.e. potential bottlenecks)Test and measure each oneWatch performance before, during, after the testsWatch trends over time
  • Know your environment – a critical piece of understanding what changed is to know the way things were before. The great thing about MMS is that not only does it provide you with what’s happening right now, but it also provides you with history – the sort of context you need to be able to identify changes, which is a critical piece of finding and fixing bottlenecks.
  • Memory: get back to thisOpcounters: commands, queries, etc. per time unitLock%: Time spent in a write-lock state, global time == global lock + hottest database.Queues: operations waiting for read lock, write lock, or global lock (total).Background flush: time it takes to flush the journal to disk (via fsync) – by default once per minute, so the closer to 60s, the bigger the problem.Repl Lag: number of seconds secondary is behind primary in writing each oplog entry.Replica: number of hours of oplog on the primary
  • We had a customer report replication lag – almost 150,00 seconds of it. We examined their systems – checked CPU, checked IO capacity, checked network utilization, and had them do an initial sync via data file copy, and nothing worked – even though the systems seemed fine.
  • Background index creation on secondaries: fixed in 2.6
  • Background index creation on secondaries: fixed in 2.6
  • NOTE: That’s the minimum required assuming no overhead, no competing traffic, nothing else… and that’s just to keep up!In the customer’s case, they had huge updates, which since the oplog is idempotent, meant huge oplog entries, and it turns out the bandwidth required was 3x their available bandwidth (30 mbps vs 10 mbps).
  • Moral of the story: pay attention to these things, get alerted when they first start to go south, and you can resolve them before things blow up at 3 am.
  • Blue: commands, purple: queries, green: updates, orange: deletes, red: getmores, yellow: inserts
  • Blue: commands, purple: queries, green: updates, orange: deletes, red: getmores, yellow: inserts
  • Memory: resident vs. virtual vs. mapped vs. non-mapped (connections)Page faults: accessing a page of memory that is in virtual memory but not resident in physical memory. Page fault on normal spinning disk is ~40k slower than direct memory access. However, the size of page faults also matters: 100 small page faults/ sec might be better than 10 large ones! Check readahead!Record stats: number of accesses not in memory, and page faults required to get them into memoryBtree: misses/ missRatio indicates indexes can’t be stored in memory (see above re: page faulting)
  • Memory: resident vs. virtual vs. mapped vs. non-mapped (connections)Page faults: accessing a page of memory that is in virtual memory but not resident in physical memory. Page fault on normal spinning disk is ~40k slower than direct memory access. However, the size of page faults also matters: 100 small page faults/ sec might be better than 10 large ones! Check readahead!Record stats: number of accesses not in memory, and page faults required to get them into memoryBtree: misses/ missRatio indicates indexes can’t be stored in memory (see above re: page faulting)
  • Background flush: average time it takes to flush the journal to disk (fsync).IO time: amount of time (in ms) spent waiting on disk for a read or write operation.
  • Also worth noting: the exposed DB check in settings, to tell you if you messed up your firewall settings.

Transcript

  • 1. Technical Support Manager, North America @ 10gen Nicholas Tang #MongoDB Performance Tuning and Monitoring Using MMS
  • 2. Performance Tuning and Monitoring Using MMS, Nicholas Tang Agenda • What is MMS? • Why use it? • Setting it up and getting around • Performance and monitoring (the fun stuff) • Wrap up
  • 3. What is MMS?
  • 4. Performance Tuning and Monitoring Using MMS, Nicholas Tang What is MMS? The MongoDB Monitoring Service: a free service (or software) for monitoring and management
  • 5. Performance Tuning and Monitoring Using MMS, Nicholas Tang Metric collection and reporting
  • 6. Performance Tuning and Monitoring Using MMS, Nicholas Tang Alerting
  • 7. Performance Tuning and Monitoring Using MMS, Nicholas Tang Event Tracking
  • 8. Performance Tuning and Monitoring Using MMS, Nicholas Tang Logs and Profile data
  • 9. Performance Tuning and Monitoring Using MMS, Nicholas Tang Hardware stats (CPU, disk)
  • 10. Performance Tuning and Monitoring Using MMS, Nicholas Tang DB stats
  • 11. Performance Tuning and Monitoring Using MMS, Nicholas Tang Basic user management
  • 12. What’s in it for me?
  • 13. Performance Tuning and Monitoring Using MMS, Nicholas Tang Why? • Great high level view + detailed metrics • Low effort, high-return • Makes it easier for us to help you! • Makes you more attractive, promotes bone strength and muscle tone * * - these last points still under review
  • 14. How do I use this crazy thing?
  • 15. Performance Tuning and Monitoring Using MMS, Nicholas Tang Setting it up http://mms.10gen.com/help/monitoring/tutorial/ • Setup an account • Install the agent • Add your hosts • Optional: hardware stats through munin-node • Optional: enable logging and profiling • More info: http://mms.10gen.com/help/monitoring/install/
  • 16. Performance Tuning and Monitoring Using MMS, Nicholas Tang Notes • Agent written in Python (moving to Go) • Failover: run multiple agents (1 primary) • Hosts: use CNAMEs, especially on AWS! • You can use a group per env (each needs an agent) • Connections are over SSL • On-Premise solution for Enterprise customers that don’t want to use the hosted service
  • 17. Performance tuning and monitoring
  • 18. Performance Tuning and Monitoring Using MMS, Nicholas Tang Finding the bottleneck Source:http://www.flickr.com/photos/laenulfean/462715479/
  • 19. Performance Tuning and Monitoring Using MMS, Nicholas Tang What is performance tuning? 1. Assess the problem and establish acceptable behavior 2. Measure the current performance 3. Find the bottleneck* 4. Remove the bottleneck 5. Re-test to confirm 6. Lather, rinse, repeat * - (This is often the hard part) (Adapted from http://en.wikipedia.org/wiki/Performance_tuning )
  • 20. Performance Tuning and Monitoring Using MMS, Nicholas Tang Pro-Tip: know thyself You have to recognize normal to know when it isn’t. Source:http://www.flickr.com/photos/skippy/6853920/
  • 21. Performance Tuning and Monitoring Using MMS, Nicholas Tang Some handy metrics to watch • Memory usage • Opcounters • Lock % • Queues • Background flush average • Replication stats
  • 22. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: replication lag Scenario: Customer reports 150,000s of replication lag == almost 2 days of lag!
  • 23. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: replication lag Some common causes of replication lag: • Secondaries underspecced vs primaries • Access patterns between primary/ secondaries • Insufficient bandwidth • Foreground index builds on secondaries
  • 24. Performance Tuning and Monitoring Using MMS, Nicholas Tang Fun fact: oplog idempotency Operations in the oplog only affect the value once, so they can be run multiple times safely. Example: If you increment n from 2 to 3, n = 3 is fine; n + 1 is not. Frequent, large updates means a big oplog to sync. Updates that change sets mean writing the entire new version of the set to the oplog.
  • 25. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: replication lag • Secondaries underspecced vs primaries • Access patterns between primary/ secondaries • Insufficient bandwidth • Foreground index builds on secondaries “…when you have eliminated the impossible,whatever remains,however improbable,must be the truth…” -- Sherlock Holmes SirArthur Conan Doyle,The Sign of the Four
  • 26. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: replication lag Example: • ~1500 ops per minute (opcounters) • 0.1 MB per object (average object size, local db) ~1500 ops/min / 60 seconds * 0.1 MB/op * 8b/B =~ 20 mbps required bandwidth
  • 27. Performance Tuning and Monitoring Using MMS, Nicholas Tang Remember to use alerts! Don’t wait until your secondaries fall off your oplog!
  • 28. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Scenario: user-facing web application. Customer was seeing significant performance degradation after adding and removing an index from their replicaset. Their replicaset had 2 visible data-bearing nodes, each on real hardware, with dedicated 15K RPM disks and a significant amount of RAM. Why were things slow?
  • 29. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Opcounters: queries rose a bit but writes were flat…
  • 30. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Background flush average: went up considerably!
  • 31. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Queues: also went up considerably!
  • 32. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Journal stats: went up much higher than the ops…
  • 33. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Connections: also went up…
  • 34. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Background flush average: consistent until then
  • 35. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Opcounters: interesting… around July 9th
  • 36. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Page faults: something’s going on!
  • 37. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Local DB average object size: growing!
  • 38. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Now what? Time to analyze the logs – what query or queries were going crazy? And what sort of query would grow in size without growing significantly in volume? Remember: growing disk latency (maybe caused by page faults?) and journal/ oplog entries growing even though inserts/ updates were flat.
  • 39. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Log analysis The best tools for analyzing MongoDB logs are included in mtools*: • mlogfilter (filter logs for slow queries, table scans, etc…) • mplotqueries (graph query response times and volumes) * https://github.com/rueckstiess/mtools
  • 40. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Log analysis (example syntax) Show me queries that took more than 1000 ms from 6 am to 6 pm: mlogfilter mongodb.log --from 06:00 --to 18:00 --slow 1000 > mongodb-filtered.log Now, graph those queries: mplotqueries --logscale mongodb- filtered.log
  • 41. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance
  • 42. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Filter more! --operation Logarithmic! --logscale
  • 43. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Sample query Wed Jul 17 14:16:44 [conn60560] update x.y query: { e: ”[id1]" } update: { $addToSet: { fr: ”[id2]" } } nscanned:1 nupdated:1 keyUpdates:1 locks(micros) w:889 6504ms 6.5 seconds to add a single value to a set!
  • 44. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance http://docs.mongodb.org/manual/reference/operator/addTo Set/ The $addToSet operator adds a value to an array only if the value is not in the array already. If the value is in the array, $addToSet returns without modifying the array. Consider the following example: db.collection.update( { field: value }, { $addToSet: { field: value1 } } ); Here, $addToSet appends value1 to the array stored in field, only if value1 is not already a member of this array.
  • 45. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance https://jira.mongodb.org/browse/SERVER- 8192 “IndexSpec::getKeys() finds the set of index keys for a given document and index key spec. It's used when inserting / updating / deleting a document to update the index entries,and also for performing in memorysorts,deduping $or clauses and for other purposes. Right now extracting 10k elements from a nested object field within an array takes on the order of seconds on a decentlyfast machine.We could see how much we can optimize the implementation.”
  • 46. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance What else?! Wed Jul 17 14:11:59 [conn56541] update x.y query: { e: ”[id1]" } update: { $addToSet: { fr: ”[id2]" } } nscanned:1 nmoved:1 nupdated:1 keyUpdates:0 locks(micros) w:85145 11768ms Almost 12 seconds! This time, there’s “nmoved:1”, too. This means a document was moved on disk – it outgrew the space allocated for it.
  • 47. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance But wait, there’s more! Wed Jul 17 13:40:14 [conn28600] query x.y [snip] ntoreturn:16 ntoskip:0 nscanned:16779 scanAndOrder:1 keyUpdates:0 numYields: 906 locks(micros) r:46877422 nreturned:16 reslen:6948 38172ms 38 seconds! Scanned 17k documents, returned 16.
  • 48. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance What next? Short term fix: disable the new feature for the heaviest users! After that: • rework the code to avoid $addToSet • add indexes for queries scanning collections • use powerOf2Sizes* to reduce fragmentation/ document moves * http://docs.mongodb.org/manual/reference/command/collMod/
  • 49. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Did it work? (Yes.) (So far. ;) )
  • 50. Performance Tuning and Monitoring Using MMS, Nicholas Tang Examining memory and disk Memory: resident vs virtual vs (non-)mapped
  • 51. Performance Tuning and Monitoring Using MMS, Nicholas Tang Examining memory and disk Page faults and Record Stats
  • 52. Performance Tuning and Monitoring Using MMS, Nicholas Tang Examining memory and disk Background flush and Disk IO (Checkout http://www.wmarrow.com/strcalc/ )
  • 53. Performance Tuning and Monitoring Using MMS, Nicholas Tang Monitoring: watch for warnings MMS warns you if your systems have startup warnings or if they are running outdated versions. Don’t ignore these!
  • 54. Wrapping up
  • 55. Performance Tuning and Monitoring Using MMS, Nicholas Tang What’s next? • Visual update (June 3rd) • Backup service (join the queue!) • More UI/ UX improvements: – Enhanced dashboards – Improved cluster view
  • 56. Performance Tuning and Monitoring Using MMS, Nicholas Tang Summary • MMS is a great, free service • Setup is easy • Metrics are awesome, preventing failures even more awesome • There’s more functionality coming soon!
  • 57. Performance Tuning and Monitoring Using MMS, Nicholas Tang Questions?