Nagios Conference 2011 - Nicholas Scott - Nagios Performance Tuning

3,458 views

Published on

Nicholas Scott's presentation on tuning Nagios performance. The presentation was given during the Nagios World Conference North America held Sept 27-29th, 2011 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,458
On SlideShare
0
From Embeds
0
Number of Embeds
371
Actions
Shares
0
Downloads
72
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Nagios Conference 2011 - Nicholas Scott - Nagios Performance Tuning

  1. 1. Nagios Performance Tuning Nick Scott [email_address]
  2. 2. Abstract <ul><li>Topics To Be Covered </li><ul><li>Baseline System / Testing Method
  3. 3. Implementation and Impacts of: </li><ul><li>RAM Disks
  4. 4. Offloading MySQL with NDOUtils
  5. 5. Memcached Systems
  6. 6. Passive Checks </li></ul><li>Final Q & A </li></ul></ul>
  7. 7. Baseline System Used and Method of Analysis
  8. 8. Specs VM Specs Operating System CentOS 6 2.6.32 i686 Processors 2 CPUs RAM 1GB RAM Hard Disk 20GB Host Version VMware Workstation 7.1.4 build-385536 Nagios Details Nagios Version Nagios Core 3.2.3 Active Host Checks 1002 Hosts Active Service Checks 4012 Services Check Frequency 5min Event Broker* NDOUtils * - Unless otherwise noted
  9. 9. Method of Data Gathering <ul><li>iostat </li><ul><li>Harvesting Method </li><ul><li>iostat -x 60 60 > file.log
  10. 10. Repeat twice </li></ul><li>After test was performed </li><ul><li>Reset virtual machine to clean state
  11. 11. Chose log file with least extrema
  12. 12. Allowed for two hours of normal operation </li></ul><li>Log files parsed and plotted </li></ul></ul>
  13. 13. Description of Metrics <ul><li>Metrics Used For Comparison </li><ul><li>w/s
  14. 14. r/s
  15. 15. await
  16. 16. avgqu-sz
  17. 17. avgrq-sz </li></ul></ul>
  18. 18. RAM Disks
  19. 19. RAM Disk - Description <ul><li>Folder mounted in RAM </li><ul><li>Pros </li><ul><li>Very fast I/O
  20. 20. Can act as separate volume to alleviate I/O issues on a particular volumes </li></ul><li>Cons </li><ul><li>Does not hold state on restart
  21. 21. Uses actual RAM, not ideal for large files
  22. 22. Miscalculation will cause more harm than good </li></ul></ul></ul>
  23. 23. RAM Disk - Applications <ul><li>Storing often accessed, small files </li><ul><li>objects.cache
  24. 24. status.dat
  25. 25. service-perfdata </li></ul><li>Beware of scaling issues </li><ul><li>Size can often become very large
  26. 26. Computer treats it like any other RAM mounted volume, for better or for worse </li></ul></ul>
  27. 27. RAM Disk - Implementation <ul><li>Implementation is very easy </li><ul><li>Near zero downtime required
  28. 28. Calculate size
  29. 29. Mount tmpfs / edit fstab
  30. 30. Edit nagios.cfg
  31. 31. Restart nagios service </li></ul><li>Does not increase points of failure </li></ul>
  32. 32. RAM Disk – Request Reads / Sec R = ~.852 of Vanilla Read Requests Per Second for Device sda Time
  33. 33. RAM Disk – Request Writes / Sec R = ~.98 of Vanilla Write Requests Per Second for Device sda Time
  34. 34. RAM Disk – Device Queue R = ~.556 of Vanilla Number of Items In Write Queue for Device sda Time
  35. 35. RAM Disk – Wait Queue R = ~.869 of Vanilla Average Wait (ms) Time
  36. 36. RAM Disk - Conclusion Metric Vanilla RAM Disk r/s 1 ~0.85 w/s 1 ~0.98 avgqu-sz 1 ~0.556 await 1 ~0.87 <ul><li>Improvement
  37. 37. Will help overtaxed hard drives </li></ul>
  38. 38. Offloading MySQL with NDOUtils
  39. 39. Offloading MySQL - Description <ul><li>Moves writing of database to separate server
  40. 40. Server moved to does not have to be powerful </li><ul><li>Pros </li><ul><li>Alleviates I/O and CPU time that MySQL uses
  41. 41. Creates a path for redundancy/high availability </li></ul><li>Cons </li><ul><li>Lights out
  42. 42. Increased wait time on results
  43. 43. Bandwidth utilization
  44. 44. Requires additional hardware/virtualization </li></ul></ul></ul>
  45. 45. Offloading MySQL - Applications <ul><li>Alleviating CPU, I/O on central Nagios </li><ul><li>MySQL typically eats ~50% of CPU time
  46. 46. MySQL and performance graphing on one machine causes incredible Disk I/O with larger installations </li></ul><li>Allows for scalability </li><ul><li>Clusterable
  47. 47. Abstraction </li></ul></ul>
  48. 48. Offloading MySQL - Implementation <ul><li>Implementation is invasive </li><ul><li>Requires downtime
  49. 49. Requires additional server, additional software
  50. 50. Adds possible point of failure
  51. 51. Sends traffic over network </li></ul><li>Editing the config files </li><ul><li>ndo2db.cfg
  52. 52. config.inc.php
  53. 53. config.php </li></ul></ul>
  54. 54. Offloading MySQL – Request Reads / Sec R = ~.76 of Vanilla Read Requests Per Second for Device sda Time
  55. 55. Offloading MySQL – Request Writes / Sec R = ~2.05 of Vanilla Write Requests Per Second for Device sda Time
  56. 56. Offloading MySQL – Average Size In Queue R = ~.707 of Vanilla Write Requests Per Second for Device sda Time
  57. 57. Offloading MySQL – Device Queue R = ~.55 of Vanilla Number of Items In Write Queue for Device sda Time
  58. 58. Offloading MySQL – Wait Queue R = ~.529 of Vanilla Average Wait (ms) Time
  59. 59. Offloading MySQL - Conclusion Metric Vanilla RAM Disk r/s 1 ~0.76 w/s 1 ~2.05 iowait% 1 ~0.45 await 1 ~0.59 avgqu-sz 1 ~0.55 <ul><li>Definite improvement
  60. 60. Nearly 50% reduction in hardware demand
  61. 61. Smaller files in queue, but more writes </li></ul>
  62. 62. Memcache
  63. 63. Memcache - Description <ul><li>Creates a collective of caches
  64. 64. Can be one, or many memcache servers
  65. 65. Avoids database hits </li><ul><li>Checks cache
  66. 66. Hits database
  67. 67. Updates database
  68. 68. Removes from cache on update </li></ul></ul>
  69. 69. Memcache - Description <ul><li>Pros </li><ul><li>Can run on most hardware
  70. 70. Hardware can be most anything
  71. 71. Allows for massive scalability </li></ul><li>Cons </li><ul><li>Can increase latency of requests
  72. 72. Improper coding can cause incorrect returns
  73. 73. Creates lots of network traffic </li></ul></ul>
  74. 74. Memcache - Implemenation <ul><li>Installing on server hardware </li><ul><li>Generally in repositories </li></ul><li>Noninvasive, specialized </li><ul><li>Currently supported with Nagios XI </li></ul></ul>
  75. 75. Memcache – Request Reads / Sec R = ~.15 of Vanilla Read Requests Per Second for Device sda Time
  76. 76. Memcache – Request Writes / Sec R = ~11.47 of Vanilla Write Requests Per Second for Device sda Time
  77. 77. Memcache – Average Size In Queue R = ~.27 of Vanilla Write Requests Per Second for Device sda Time
  78. 78. Memcache – Device Queue R = ~1.55 of Vanilla Number of Items In Write Queue for Device sda Time
  79. 79. Memcache – Wait Queue R = ~.259 of Vanilla Average Wait (ms) Time
  80. 80. Memcache - Conclusion Metric Vanilla RAM Disk r/s 1 ~0.15 w/s 1 ~11.47 iowait% 1 ~0.27 await 1 ~0.259 avgqu-sz 1 ~1.55 <ul><li>Some tradeoffs
  81. 81. Large reduction in reads
  82. 82. Massive increase in writes
  83. 83. Test did not accurately portray scalability </li></ul>
  84. 84. Passive Checks
  85. 85. Passive Checks - Discussion <ul><li>Creates separation of labor
  86. 86. Nagios Execution vs Remote Execution </li><ul><li>Bring some complexity
  87. 87. Eases some cross-platform issues
  88. 88. Allows for ridiculous checks </li></ul><li>Consumes more CPU cycles net
  89. 89. Freshness checks are necessary, but counter-productive </li><ul><li>Forcing active checks anyways </li></ul></ul>
  90. 90. Passive Checks - Discussion <ul><li>Common types and relative expense </li><ul><li>NSCA
  91. 91. NRDP
  92. 92. SNMP </li></ul><li>Ideal design </li></ul>
  93. 93. Final Thoughts <ul><li>Is the service very dynamic? </li><ul><li>Slow, semi-predictable, low priority </li></ul><li>Make it all from a little
  94. 94. Have a way to profile your install </li></ul>

×