Technical Support Manager, North America @ 10gen
Nicholas Tang
#MongoDB
Performance Tuning and
Monitoring Using MMS
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Agenda
• What is MMS?
• Why use it?
• Setting it up and getting...
What is MMS?
Performance Tuning and Monitoring Using MMS, Nicholas Tang
What is MMS?
The MongoDB Monitoring Service: a free
service (or...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Metric collection and
reporting
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Alerting
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Event Tracking
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Logs and Profile data
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Hardware stats (CPU, disk)
Performance Tuning and Monitoring Using MMS, Nicholas Tang
DB stats
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Basic user management
What’s in it for me?
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Why?
• Great high level view + detailed metrics
• Low effort, h...
How do I use this crazy
thing?
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Setting it up
http://mms.10gen.com/help/monitoring/tutorial/
• ...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Notes
• Agent written in Python (moving to Go)
• Failover: run ...
Performance tuning
and monitoring
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Finding the bottleneck
Source:http://www.flickr.com/photos/laen...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
What is performance tuning?
1. Assess the problem and establish...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Pro-Tip: know thyself
You have to recognize normal to know when...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Some handy metrics to watch
• Memory usage
• Opcounters
• Lock ...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: replication lag
Scenario:
Customer reports 150,000s of...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: replication lag
Some common causes of replication lag:...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Fun fact: oplog idempotency
Operations in the oplog only affect...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: replication lag
• Secondaries underspecced vs primarie...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: replication lag
Example:
• ~1500 ops per minute (opcou...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Remember to use alerts!
Don’t wait until your secondaries fall ...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Scenario: user-facing web application...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Opcounters: queries rose a bit but wr...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Background flush average: went up
con...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Queues: also went up considerably!
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Journal stats: went up much higher th...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Connections: also went up…
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Background flush average: consistent ...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Opcounters: interesting… around July ...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Page faults: something’s going on!
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Local DB average object size: growing!
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Now what?
Time to analyze the logs – ...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Log analysis
The best tools for analy...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Log analysis (example syntax)
Show me...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Filter more!
--operation
Logarithmic!...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Sample query
Wed Jul 17 14:16:44 [con...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
http://docs.mongodb.org/manual/refere...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
https://jira.mongodb.org/browse/SERVE...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
What else?!
Wed Jul 17 14:11:59 [conn...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
But wait, there’s more!
Wed Jul 17 13...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
What next?
Short term fix: disable th...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Did it work?
(Yes.)
(So far. ;) )
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Examining memory and disk
Memory: resident vs virtual vs (non-)...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Examining memory and disk
Page faults and Record Stats
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Examining memory and disk
Background flush and Disk IO
(Checkou...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Monitoring: watch for
warnings
MMS warns you if your systems ha...
Wrapping up
Performance Tuning and Monitoring Using MMS, Nicholas Tang
What’s next?
• Visual update (June 3rd)
• Backup service (join ...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Summary
• MMS is a great, free service
• Setup is easy
• Metric...
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Questions?
Upcoming SlideShare
Loading in...5
×

Performance Tuning and Monitoring Using MMS

4,805

Published on

Presented by Nicholas Tang MongoDB Monitoring Service (MMS) is a free cloud-based service provided by 10gen for monitoring MongoDB deployments in real time. MMS ensures that you have visibility into the right metrics to manage and optimize applications during development and in production. In this talk, you'll learn how to leverage MMS charts, custom dashboards, alerting and other features to track the performance of your system.

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,805
On Slideshare
0
From Embeds
0
Number of Embeds
21
Actions
Shares
0
Downloads
58
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • Show of hands: who is responsible in some ways for monitoring? Who has used Nagios, Cacti, Zenoss, Graphite, or some other similar tools?
  • 5-10 minutes for What, Why, and How, and then the rest of the time to Performance and monitoring and the wrap-up.Talk a little bit about why it’s helpful for 10gen support.
  • Understand the components (i.e. potential bottlenecks)Test and measure each oneWatch performance before, during, after the testsWatch trends over time
  • Know your environment – a critical piece of understanding what changed is to know the way things were before. The great thing about MMS is that not only does it provide you with what’s happening right now, but it also provides you with history – the sort of context you need to be able to identify changes, which is a critical piece of finding and fixing bottlenecks.
  • Memory: get back to thisOpcounters: commands, queries, etc. per time unitLock%: Time spent in a write-lock state, global time == global lock + hottest database.Queues: operations waiting for read lock, write lock, or global lock (total).Background flush: time it takes to flush the journal to disk (via fsync) – by default once per minute, so the closer to 60s, the bigger the problem.Repl Lag: number of seconds secondary is behind primary in writing each oplog entry.Replica: number of hours of oplog on the primary
  • We had a customer report replication lag – almost 150,00 seconds of it. We examined their systems – checked CPU, checked IO capacity, checked network utilization, and had them do an initial sync via data file copy, and nothing worked – even though the systems seemed fine.
  • Background index creation on secondaries: fixed in 2.6
  • Background index creation on secondaries: fixed in 2.6
  • NOTE: That’s the minimum required assuming no overhead, no competing traffic, nothing else… and that’s just to keep up!In the customer’s case, they had huge updates, which since the oplog is idempotent, meant huge oplog entries, and it turns out the bandwidth required was 3x their available bandwidth (30 mbps vs 10 mbps).
  • Moral of the story: pay attention to these things, get alerted when they first start to go south, and you can resolve them before things blow up at 3 am.
  • Blue: commands, purple: queries, green: updates, orange: deletes, red: getmores, yellow: inserts
  • Blue: commands, purple: queries, green: updates, orange: deletes, red: getmores, yellow: inserts
  • Memory: resident vs. virtual vs. mapped vs. non-mapped (connections)Page faults: accessing a page of memory that is in virtual memory but not resident in physical memory. Page fault on normal spinning disk is ~40k slower than direct memory access. However, the size of page faults also matters: 100 small page faults/ sec might be better than 10 large ones! Check readahead!Record stats: number of accesses not in memory, and page faults required to get them into memoryBtree: misses/ missRatio indicates indexes can’t be stored in memory (see above re: page faulting)
  • Memory: resident vs. virtual vs. mapped vs. non-mapped (connections)Page faults: accessing a page of memory that is in virtual memory but not resident in physical memory. Page fault on normal spinning disk is ~40k slower than direct memory access. However, the size of page faults also matters: 100 small page faults/ sec might be better than 10 large ones! Check readahead!Record stats: number of accesses not in memory, and page faults required to get them into memoryBtree: misses/ missRatio indicates indexes can’t be stored in memory (see above re: page faulting)
  • Background flush: average time it takes to flush the journal to disk (fsync).IO time: amount of time (in ms) spent waiting on disk for a read or write operation.
  • Also worth noting: the exposed DB check in settings, to tell you if you messed up your firewall settings.
  • Performance Tuning and Monitoring Using MMS

    1. 1. Technical Support Manager, North America @ 10gen Nicholas Tang #MongoDB Performance Tuning and Monitoring Using MMS
    2. 2. Performance Tuning and Monitoring Using MMS, Nicholas Tang Agenda • What is MMS? • Why use it? • Setting it up and getting around • Performance and monitoring (the fun stuff) • Wrap up
    3. 3. What is MMS?
    4. 4. Performance Tuning and Monitoring Using MMS, Nicholas Tang What is MMS? The MongoDB Monitoring Service: a free service (or software) for monitoring and management
    5. 5. Performance Tuning and Monitoring Using MMS, Nicholas Tang Metric collection and reporting
    6. 6. Performance Tuning and Monitoring Using MMS, Nicholas Tang Alerting
    7. 7. Performance Tuning and Monitoring Using MMS, Nicholas Tang Event Tracking
    8. 8. Performance Tuning and Monitoring Using MMS, Nicholas Tang Logs and Profile data
    9. 9. Performance Tuning and Monitoring Using MMS, Nicholas Tang Hardware stats (CPU, disk)
    10. 10. Performance Tuning and Monitoring Using MMS, Nicholas Tang DB stats
    11. 11. Performance Tuning and Monitoring Using MMS, Nicholas Tang Basic user management
    12. 12. What’s in it for me?
    13. 13. Performance Tuning and Monitoring Using MMS, Nicholas Tang Why? • Great high level view + detailed metrics • Low effort, high-return • Makes it easier for us to help you! • Makes you more attractive, promotes bone strength and muscle tone * * - these last points still under review
    14. 14. How do I use this crazy thing?
    15. 15. Performance Tuning and Monitoring Using MMS, Nicholas Tang Setting it up http://mms.10gen.com/help/monitoring/tutorial/ • Setup an account • Install the agent • Add your hosts • Optional: hardware stats through munin-node • Optional: enable logging and profiling • More info: http://mms.10gen.com/help/monitoring/install/
    16. 16. Performance Tuning and Monitoring Using MMS, Nicholas Tang Notes • Agent written in Python (moving to Go) • Failover: run multiple agents (1 primary) • Hosts: use CNAMEs, especially on AWS! • You can use a group per env (each needs an agent) • Connections are over SSL • On-Premise solution for Enterprise customers that don’t want to use the hosted service
    17. 17. Performance tuning and monitoring
    18. 18. Performance Tuning and Monitoring Using MMS, Nicholas Tang Finding the bottleneck Source:http://www.flickr.com/photos/laenulfean/462715479/
    19. 19. Performance Tuning and Monitoring Using MMS, Nicholas Tang What is performance tuning? 1. Assess the problem and establish acceptable behavior 2. Measure the current performance 3. Find the bottleneck* 4. Remove the bottleneck 5. Re-test to confirm 6. Lather, rinse, repeat * - (This is often the hard part) (Adapted from http://en.wikipedia.org/wiki/Performance_tuning )
    20. 20. Performance Tuning and Monitoring Using MMS, Nicholas Tang Pro-Tip: know thyself You have to recognize normal to know when it isn’t. Source:http://www.flickr.com/photos/skippy/6853920/
    21. 21. Performance Tuning and Monitoring Using MMS, Nicholas Tang Some handy metrics to watch • Memory usage • Opcounters • Lock % • Queues • Background flush average • Replication stats
    22. 22. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: replication lag Scenario: Customer reports 150,000s of replication lag == almost 2 days of lag!
    23. 23. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: replication lag Some common causes of replication lag: • Secondaries underspecced vs primaries • Access patterns between primary/ secondaries • Insufficient bandwidth • Foreground index builds on secondaries
    24. 24. Performance Tuning and Monitoring Using MMS, Nicholas Tang Fun fact: oplog idempotency Operations in the oplog only affect the value once, so they can be run multiple times safely. Example: If you increment n from 2 to 3, n = 3 is fine; n + 1 is not. Frequent, large updates means a big oplog to sync. Updates that change sets mean writing the entire new version of the set to the oplog.
    25. 25. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: replication lag • Secondaries underspecced vs primaries • Access patterns between primary/ secondaries • Insufficient bandwidth • Foreground index builds on secondaries “…when you have eliminated the impossible,whatever remains,however improbable,must be the truth…” -- Sherlock Holmes SirArthur Conan Doyle,The Sign of the Four
    26. 26. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: replication lag Example: • ~1500 ops per minute (opcounters) • 0.1 MB per object (average object size, local db) ~1500 ops/min / 60 seconds * 0.1 MB/op * 8b/B =~ 20 mbps required bandwidth
    27. 27. Performance Tuning and Monitoring Using MMS, Nicholas Tang Remember to use alerts! Don’t wait until your secondaries fall off your oplog!
    28. 28. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Scenario: user-facing web application. Customer was seeing significant performance degradation after adding and removing an index from their replicaset. Their replicaset had 2 visible data-bearing nodes, each on real hardware, with dedicated 15K RPM disks and a significant amount of RAM. Why were things slow?
    29. 29. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Opcounters: queries rose a bit but writes were flat…
    30. 30. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Background flush average: went up considerably!
    31. 31. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Queues: also went up considerably!
    32. 32. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Journal stats: went up much higher than the ops…
    33. 33. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Connections: also went up…
    34. 34. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Background flush average: consistent until then
    35. 35. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Opcounters: interesting… around July 9th
    36. 36. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Page faults: something’s going on!
    37. 37. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Local DB average object size: growing!
    38. 38. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Now what? Time to analyze the logs – what query or queries were going crazy? And what sort of query would grow in size without growing significantly in volume? Remember: growing disk latency (maybe caused by page faults?) and journal/ oplog entries growing even though inserts/ updates were flat.
    39. 39. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Log analysis The best tools for analyzing MongoDB logs are included in mtools*: • mlogfilter (filter logs for slow queries, table scans, etc…) • mplotqueries (graph query response times and volumes) * https://github.com/rueckstiess/mtools
    40. 40. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Log analysis (example syntax) Show me queries that took more than 1000 ms from 6 am to 6 pm: mlogfilter mongodb.log --from 06:00 --to 18:00 --slow 1000 > mongodb-filtered.log Now, graph those queries: mplotqueries --logscale mongodb- filtered.log
    41. 41. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance
    42. 42. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Filter more! --operation Logarithmic! --logscale
    43. 43. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Sample query Wed Jul 17 14:16:44 [conn60560] update x.y query: { e: ”[id1]" } update: { $addToSet: { fr: ”[id2]" } } nscanned:1 nupdated:1 keyUpdates:1 locks(micros) w:889 6504ms 6.5 seconds to add a single value to a set!
    44. 44. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance http://docs.mongodb.org/manual/reference/operator/addTo Set/ The $addToSet operator adds a value to an array only if the value is not in the array already. If the value is in the array, $addToSet returns without modifying the array. Consider the following example: db.collection.update( { field: value }, { $addToSet: { field: value1 } } ); Here, $addToSet appends value1 to the array stored in field, only if value1 is not already a member of this array.
    45. 45. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance https://jira.mongodb.org/browse/SERVER- 8192 “IndexSpec::getKeys() finds the set of index keys for a given document and index key spec. It's used when inserting / updating / deleting a document to update the index entries,and also for performing in memorysorts,deduping $or clauses and for other purposes. Right now extracting 10k elements from a nested object field within an array takes on the order of seconds on a decentlyfast machine.We could see how much we can optimize the implementation.”
    46. 46. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance What else?! Wed Jul 17 14:11:59 [conn56541] update x.y query: { e: ”[id1]" } update: { $addToSet: { fr: ”[id2]" } } nscanned:1 nmoved:1 nupdated:1 keyUpdates:0 locks(micros) w:85145 11768ms Almost 12 seconds! This time, there’s “nmoved:1”, too. This means a document was moved on disk – it outgrew the space allocated for it.
    47. 47. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance But wait, there’s more! Wed Jul 17 13:40:14 [conn28600] query x.y [snip] ntoreturn:16 ntoskip:0 nscanned:16779 scanAndOrder:1 keyUpdates:0 numYields: 906 locks(micros) r:46877422 nreturned:16 reslen:6948 38172ms 38 seconds! Scanned 17k documents, returned 16.
    48. 48. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance What next? Short term fix: disable the new feature for the heaviest users! After that: • rework the code to avoid $addToSet • add indexes for queries scanning collections • use powerOf2Sizes* to reduce fragmentation/ document moves * http://docs.mongodb.org/manual/reference/command/collMod/
    49. 49. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Did it work? (Yes.) (So far. ;) )
    50. 50. Performance Tuning and Monitoring Using MMS, Nicholas Tang Examining memory and disk Memory: resident vs virtual vs (non-)mapped
    51. 51. Performance Tuning and Monitoring Using MMS, Nicholas Tang Examining memory and disk Page faults and Record Stats
    52. 52. Performance Tuning and Monitoring Using MMS, Nicholas Tang Examining memory and disk Background flush and Disk IO (Checkout http://www.wmarrow.com/strcalc/ )
    53. 53. Performance Tuning and Monitoring Using MMS, Nicholas Tang Monitoring: watch for warnings MMS warns you if your systems have startup warnings or if they are running outdated versions. Don’t ignore these!
    54. 54. Wrapping up
    55. 55. Performance Tuning and Monitoring Using MMS, Nicholas Tang What’s next? • Visual update (June 3rd) • Backup service (join the queue!) • More UI/ UX improvements: – Enhanced dashboards – Improved cluster view
    56. 56. Performance Tuning and Monitoring Using MMS, Nicholas Tang Summary • MMS is a great, free service • Setup is easy • Metrics are awesome, preventing failures even more awesome • There’s more functionality coming soon!
    57. 57. Performance Tuning and Monitoring Using MMS, Nicholas Tang Questions?
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×