Trending with Purpose

3,930 views

Published on

This talk evalutes some easy ways to extract useful trending and capacity planning out of your existing monitoring investment. Using Nagios performance data, we examine simple behaviors with PNP4Nagios and graduate on to more insightful analytics with Graphite. With metrics in hand we look at the questions that IT /should/ be asking, such as:

* What sort of data should I trend?
* Why do I need to trend it?
* How do Operational or Engineering trends relate to Business or Transactional monitoring?
* How does this data impact our customer relationship and/or their bottom-line?

Finally, we look at creative ways to get profiling data out of your production systems with a minimum amount of effort from your development team.

Published in: Technology
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,930
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
0
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Trending with Purpose

    1. 1. Trending with Purpose Jason Dixon
    2. 2. Trending
    3. 3. A general direction in which something is developing or changing.
    4. 4. Why do we trend?
    5. 5. Planning for growth.
    6. 6. Predicting ordiagnosing failure.
    7. 7. ... instead of finding out from your customer.
    8. 8. Operational Questions
    9. 9. Just because a host or service reponds, how do you know it’s working?
    10. 10. If you haven’t measured good, how will you recognize bad?
    11. 11. You don’t know what might break, so collect everything now.
    12. 12. "Those who cannot remember the past are condemned to repeat it" George Santayana
    13. 13. "I dont care if my servers are on fire as long as theyre making me money" Business Owner
    14. 14. How can we addBusiness Value?
    15. 15. Start simple.
    16. 16. Dance with the one who brought you.
    17. 17. Nagios
    18. 18. Nagios• Fault Detection
    19. 19. Nagios• Fault Detection• Notifications
    20. 20. Nagios• Fault Detection• Notifications• Escalations
    21. 21. Nagios• Fault Detection• Notifications• Escalations• Acknowledgements/Downtime
    22. 22. Nagios• Fault Detection• Notifications• Escalations• Acknowledgements/Downtime• http://www.nagios.org/
    23. 23. Nagios
    24. 24. Nagios• What it does well:
    25. 25. Nagios• What it does well: • Free
    26. 26. Nagios• What it does well: • Free • Extensible
    27. 27. Nagios• What it does well: • Free • Extensible • Plugins
    28. 28. Nagios• What it does well: • Free • Extensible • Plugins • Configuration templates
    29. 29. Nagios• What it does well: • Free • Extensible • Plugins • Configuration templates • Popular (lesser of all free evils)
    30. 30. Nagios• What it does well: • Free • Extensible • Plugins • Configuration templates • Popular (lesser of all free evils) • Log metrics (“performance data”)
    31. 31. Nagios
    32. 32. Nagios• Where it sucks:
    33. 33. Nagios• Where it sucks: • Interface
    34. 34. Nagios• Where it sucks: • Interface • (Lack of) Scalability
    35. 35. Nagios• Where it sucks: • Interface • (Lack of) Scalability • Promotes bad habits
    36. 36. Nagios• Where it sucks: • Interface • (Lack of) Scalability • Promotes bad habits • Acknowledgements never expire
    37. 37. Nagios• Where it sucks: • Interface • (Lack of) Scalability • Promotes bad habits • Acknowledgements never expire • Configuration (over-)flexibility
    38. 38. Nagios• Where it sucks: • Interface • (Lack of) Scalability • Promotes bad habits • Acknowledgements never expire • Configuration (over-)flexibility • Flapping
    39. 39. Nagios
    40. 40. PNP4Nagios
    41. 41. PNP4Nagios• Basic graphing & dashboard capabilities
    42. 42. PNP4Nagios• Basic graphing & dashboard capabilities• Retrieves Nagios performance data
    43. 43. PNP4Nagios• Basic graphing & dashboard capabilities• Retrieves Nagios performance data• Creates graphs with RRD
    44. 44. PNP4Nagios• Basic graphing & dashboard capabilities• Retrieves Nagios performance data• Creates graphs with RRD• Limited introspection/correlation
    45. 45. PNP4Nagios• Basic graphing & dashboard capabilities• Retrieves Nagios performance data• Creates graphs with RRD• Limited introspection/correlation• http://www.pnp4nagios.org/
    46. 46. PNP4Nagios
    47. 47. Graphite
    48. 48. Graphite• Metric storage
    49. 49. Graphite• Metric storage• Complex graph creation
    50. 50. Graphite• Metric storage• Complex graph creation• Web and “CLI” interfaces
    51. 51. Graphite• Metric storage• Complex graph creation• Web and “CLI” interfaces• Created and released by Orbitz.com
    52. 52. Graphite• Metric storage• Complex graph creation• Web and “CLI” interfaces• Created and released by Orbitz.com• http://graphite.wikidot.com/
    53. 53. Graphite
    54. 54. Graphite
    55. 55. Graphite• The good stuff:
    56. 56. Graphite• The good stuff: • Horizontally scalable
    57. 57. Graphite• The good stuff: • Horizontally scalable • Rapid graph prototyping (CLI)
    58. 58. Graphite• The good stuff: • Horizontally scalable • Rapid graph prototyping (CLI) • Graph disparate data points
    59. 59. Graphite• The good stuff: • Horizontally scalable • Rapid graph prototyping (CLI) • Graph disparate data points • Numerous formulas available
    60. 60. Graphite• The good stuff: • Horizontally scalable • Rapid graph prototyping (CLI) • Graph disparate data points • Numerous formulas available • derive, transform, average, sum, etc...
    61. 61. Graphite• The good stuff: • Horizontally scalable • Rapid graph prototyping (CLI) • Graph disparate data points • Numerous formulas available • derive, transform, average, sum, etc... • Share graphs with other users
    62. 62. Graphite• The good stuff: • Horizontally scalable • Rapid graph prototyping (CLI) • Graph disparate data points • Numerous formulas available • derive, transform, average, sum, etc... • Share graphs with other users • Supports existing RRD databases
    63. 63. Graphite
    64. 64. Graphite• The not-so-good stuff:
    65. 65. Graphite• The not-so-good stuff: • Not a dashboard (well, sorta)
    66. 66. Graphite• The not-so-good stuff: • Not a dashboard (well, sorta) • No hover details
    67. 67. Graphite• The not-so-good stuff: • Not a dashboard (well, sorta) • No hover details • Single y-axis
    68. 68. GraphiteWeb Carbon Metrics Whisper
    69. 69. Carbon
    70. 70. Carbon• agent - starts other daemons, receives metrics and pipelines them to cache
    71. 71. Carbon• agent - starts other daemons, receives metrics and pipelines them to cache• aggregator - aggregate and transform your data before storage
    72. 72. Carbon• agent - starts other daemons, receives metrics and pipelines them to cache• aggregator - aggregate and transform your data before storage• cache - caches metrics for real-time graphing, pipelines them to persister
    73. 73. Carbon• agent - starts other daemons, receives metrics and pipelines them to cache• aggregator - aggregate and transform your data before storage• cache - caches metrics for real-time graphing, pipelines them to persister• persister - writes persistent data to disk
    74. 74. Whisper
    75. 75. Whisper• Metrics database format
    76. 76. Whisper• Metrics database format• Supplanted RRDtool
    77. 77. Whisper• Metrics database format• Supplanted RRDtool• Accepts out-of-order data
    78. 78. Whisper• Metrics database format• Supplanted RRDtool• Accepts out-of-order data• Supports pipelining of data in a single operation (multiplexing)
    79. 79. Coming Soon - Ceres
    80. 80. Coming Soon - Ceres• Smaller files - doesn’t pad missing datapoints
    81. 81. Coming Soon - Ceres• Smaller files - doesn’t pad missing datapoints• Doesn’t store timestamps, calculates them
    82. 82. Coming Soon - Ceres• Smaller files - doesn’t pad missing datapoints• Doesn’t store timestamps, calculates them• Federates individual datapoints
    83. 83. Graphite Web
    84. 84. Graphite Web• Traditional web interface
    85. 85. Graphite Web• Traditional web interface• Javascript CLI
    86. 86. Graphite Web• Traditional web interface• Javascript CLI• Rudimentary dashboard
    87. 87. Graphite Web• Traditional web interface• Javascript CLI• Rudimentary dashboard• Django application
    88. 88. Sending metrics to Graphite
    89. 89. Sending metrics to Graphite• Connect to Carbon socket (tcp/2003)
    90. 90. Sending metrics to Graphite• Connect to Carbon socket (tcp/2003)• Send your data
    91. 91. Sending metrics to Graphite• Connect to Carbon socket (tcp/2003)• Send your data my $sock = IO::Socket::INET->new(‘127.0.0.1:2003’); $sock->send(“endpoint.app.metric $value $epochn");
    92. 92. Sending metrics to Graphite• Connect to Carbon socket (tcp/2003)• Send your data my $sock = IO::Socket::INET->new(‘127.0.0.1:2003’); $sock->send(“endpoint.app.metric $value $epochn");• ???
    93. 93. Sending metrics to Graphite• Connect to Carbon socket (tcp/2003)• Send your data my $sock = IO::Socket::INET->new(‘127.0.0.1:2003’); $sock->send(“endpoint.app.metric $value $epochn");• ???• Profit!
    94. 94. What should we collect?
    95. 95. App/DB Profiling
    96. 96. App/DB Profiling• How fast is our:
    97. 97. App/DB Profiling• How fast is our: • function foo() for each iteration
    98. 98. App/DB Profiling• How fast is our: • function foo() for each iteration • SQL query
    99. 99. App/DB Profiling• How fast is our: • function foo() for each iteration • SQL query • 3rd party API service (e.g. payment gateway, social media)
    100. 100. App/DB Profiling
    101. 101. App/DB Profiling • How many times do we:
    102. 102. App/DB Profiling • How many times do we: • call function foo()
    103. 103. App/DB Profiling • How many times do we: • call function foo() • register a new user
    104. 104. App/DB Profiling • How many times do we: • call function foo() • register a new user • chargeback a sale
    105. 105. IT exists to support Business.
    106. 106. Not the other way around.
    107. 107. Business Profiling
    108. 108. Business Profiling• How many sales did we generate this hour:
    109. 109. Business Profiling• How many sales did we generate this hour: • per sku
    110. 110. Business Profiling• How many sales did we generate this hour: • per sku • from the recent ad campaign
    111. 111. Business Profiling• How many sales did we generate this hour: • per sku • from the recent ad campaign • from users in West Bumble, Arkansas
    112. 112. Last week?
    113. 113. Last month?
    114. 114. Last year?
    115. 115. Be Creative.
    116. 116. More Data > Less Data
    117. 117. You probably won’t know what metrics you might need until it’s too late.
    118. 118. Don’t be that guy.
    119. 119. Storage is Cheap.
    120. 120. Data Sources• Nagios • SQL• Munin • Logs• SNMP • sar• Ganglia • /proc• collectd • REST APIs
    121. 121. You can’t be serious!
    122. 122. You can’t be serious! • You’re thinking to yourself...
    123. 123. You can’t be serious! • You’re thinking to yourself... • This sounds like a lot of work.
    124. 124. You can’t be serious! • You’re thinking to yourself... • This sounds like a lot of work. • Our developers will never buy in.
    125. 125. You can’t be serious! • You’re thinking to yourself... • This sounds like a lot of work. • Our developers will never buy in. • Is there an easier way?
    126. 126. Duh.
    127. 127. StatsD
    128. 128. StatsD• Created and released by Etsy
    129. 129. StatsD• Created and released by Etsy• "Measure Anything, Measure Everything"
    130. 130. StatsD• Created and released by Etsy• "Measure Anything, Measure Everything"• Aggregate counters and timers
    131. 131. StatsD• Created and released by Etsy• "Measure Anything, Measure Everything"• Aggregate counters and timers• Pipeline to Graphite
    132. 132. StatsD• Created and released by Etsy• "Measure Anything, Measure Everything"• Aggregate counters and timers• Pipeline to Graphite• Fire-and-forget (UDP)
    133. 133. StatsD• Created and released by Etsy• "Measure Anything, Measure Everything"• Aggregate counters and timers• Pipeline to Graphite• Fire-and-forget (UDP)• Perl, PHP, Python and Java clients
    134. 134. StatsD• Created and released by Etsy• "Measure Anything, Measure Everything"• Aggregate counters and timers• Pipeline to Graphite• Fire-and-forget (UDP)• Perl, PHP, Python and Java clients• https://github.com/etsy/statsd
    135. 135. StatsD
    136. 136. StatsD• https://github.com/sivy/statsd-client
    137. 137. StatsD• https://github.com/sivy/statsd-client use Net::StatsD::Client; my $c = Net::StatsD::Client->new(); $c->increment(endpoint.customer.app.metric); $c->timing(endpoint.customer.app.foo, 200);
    138. 138. StatsD• https://github.com/sivy/statsd-client use Net::StatsD::Client; my $c = Net::StatsD::Client->new(); $c->increment(endpoint.customer.app.metric); $c->timing(endpoint.customer.app.foo, 200);• Too much activity? Sample it!
    139. 139. StatsD• https://github.com/sivy/statsd-client use Net::StatsD::Client; my $c = Net::StatsD::Client->new(); $c->increment(endpoint.customer.app.metric); $c->timing(endpoint.customer.app.foo, 200);• Too much activity? Sample it! # sample 10%, StatsD will multiply it up $c->increment(endpoint.customer.app.metric, 0.1)
    140. 140. That’s enough?
    141. 141. Too much awesome?
    142. 142. Too Bad!
    143. 143. Logster
    144. 144. Logster• Yet Another Etsy Project (YAEP)
    145. 145. Logster• Yet Another Etsy Project (YAEP)• Rips metrics from log files
    146. 146. Logster• Yet Another Etsy Project (YAEP)• Rips metrics from log files• https://github.com/etsy/logster
    147. 147. Logster• Yet Another Etsy Project (YAEP)• Rips metrics from log files• https://github.com/etsy/logster # /usr/sbin/logster --dry-run --output=graphite --graphite-host=graphite.example.com:2003 SampleLogster /var/log/httpd/access_log
    148. 148. Questions?
    149. 149. Thank you

    ×