The State of Open
Source Monitoring
Tools
Michael Richardson (@m_richo)
Energized Work
What tools are we currently using to
monitor and troubleshoot our systems?
What tools are we currently using to
monitor and troubleshoot our systems?

•
•
•
•

Nagios
ssh + grep <something_bad> /so...
Nagios
Nagios – The lovers
Nagios – The lovers
Nagios – The lovers
Nagios – The lovers
Nagios Love-meter

0

10
Nagios Love-meter
Where are you on the Scale?

0

10
Nagios Love-meter
Where are you on the Scale?

0
Nagios shits
me to tears

10
Sign me up to
Nagios World
Conference 2013!!...
Alternatives ?
Alternatives ?
Yep, there’s lots
Alternatives ?

Yep, there’s lots
some are better and
some are worse
Today let’s check out
• Graphite
• Statsd
• Logstash
• Sensu
Graphite
Graphite
•
•
•
•
•

Metric storage
Complex graph creation
http://graphite.wikidot.com
Apache 2.0 license
Send time-series ...
Graphite
Components
1. Web
2. Whisper
3. Carbon
Graphite
•

Everything stored in graphite has a path with
components delimited by dots. Eg

servers.HOSTNAME.METRIC
applic...
Graphite
•
•

No need to pre-define metric end-points
Determine granularity of data upfront.

/opt/graphite/conf/storage-s...
Graphite
What should I graph/trend?
1. Application Profiling Data
2. Operational Profiling Data
3. Regression Testing (rel...
Graphite
Demo

Image source - http://joemiller.me/2011/11/05/correlating-puppet-changes-to-events-in-your-infrastructure/
StatsD
StatsD
•
•
•
•

Measure Anything, Measure Everything
Created and released by Etsy
Aggregate counters and timers
http://git...
StatsD
• Written in node.js
• ~400 lines of javascript
• Listens to statistics (counters & timers),
and sends aggregates t...
StatsD
Don’t like Javascript or Node.js??
StatsD
Don’t like Javascript or Node.js??
Google “statsd alternatives”…..
StatsD
Don’t like Javascript or Node.js??
Google “statsd alternatives”…..

20+ rewrites/clones for you including..
Ruby, p...
StatsD
Concepts
• Buckets (a name that translates to graphite end-point)
• Values
• Flush (default 10 seconds)
Counter met...
StatsD
Counter examples
• Successful customer login attempts
• Failed customer login attempts
• Register a new customer
• ...
StatsD
Timer examples
• How fast is our function blah()
• How fast is a database query
• How fast is our 3rd party API ser...
StatsD

demo
LogStash
LogStash
•
•
•
•
•

Tool for managing Events and logs
http://logstash.net
https://github.com/logstash/logstash
Apache 2.0 ...
LogStash
• Written in ruby.
• Built with jruby and ships as a jar file.
LogStash
LogStash agent is an Event pipeline with 3
parts.
1. Inputs
2. Filters
3. Outputs
LogStash
1. Inputs – generate events
1. Filters – modify them
1. Outputs – ship them somewhere
LogStash
Inputs include :
amqp, drupal_dblog, eventlog, exec, file,
ganglia, gelf, gemfire, generator, heroku,
irc, log4j,...
LogStash
Filters include :
alter, anonymize, checksum, csv, date, dns,
environment, gelfify, geoip, grep, grok,
grokdiscov...
LogStash
Outputs include :
amqp, boundary, circonus, cloudwatch,
datadog, elasticsearch, elasticsearch_http,
elasticsearch...
LogStash
Typical setup
LogStash
Shipper alternatives?
LogStash
Shipper alternatives?
• Syslog (rsyslog, syslog-ng,)
• Lumberjack
https://github.com/jordansissel/lumberjack

• B...
LogStash
Kibana
• Web interface for viewing logstash
records stored in elastic search
• http://kibana.org/
• http://github...
LogStash
Kibana – search data

Image source - http://kibana.org/
LogStash
Kibana – trend data

Image source - http://kibana.org/
LogStash
Demo
(Syslog & Apache access logs)
LogStash
TIP – Go buy the Logstash Book –
http://logstashbook.com/
James Turnbull (@kartar)
It’s a great introduction to h...
Sensu
•
•
•
•
•

https://github.com/sensu/sensu
Creator – Sean Porter (@portertech)
Ruby, RabbitMQ, Redis
<1200 lines of c...
Sensu
Components
• Sensu-server
• Sensu-client
• Sensu-api
• Sensu-dashboard
Sensu
• Message oriented architecture
(messages are JSON objects)
• Described as a monitoring router
• Connects “check” sc...
Sensu
Checks can
• Determine if a service like apache up
and running? (check exit code)
• Collect metrics like page views ...
Sensu
Output of checks are router to 1 or more
handlers who determine what to do.
Sensu
Output of checks are router to 1 or more
handlers who determine what to do.
• Send alerts via email, pagerduty, IRC,...
Sensu
Output of checks are router to 1 or more
handlers who determine what to do.
• Send alerts via email, pagerduty, IRC,...
Sensu
demo
Questions??
Thank you
Open Source Monitoring Tools
Upcoming SlideShare
Loading in...5
×

Open Source Monitoring Tools

22,285

Published on

Published in: Technology
1 Comment
69 Likes
Statistics
Notes
  • Hello

    I would like to share with you a very comprehensive monitoring tool as an alternative to Nagios, Pandora FMS. With this tool you can do TCP, ICMP, SNMP, WMI checks and local checks with software agents. (www.pandorafms.com)
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
22,285
On Slideshare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
554
Comments
1
Likes
69
Embeds 0
No embeds

No notes for slide
  • Anyone want a quick rundown of how it works?Fault detection, notifictations, escalations, acknowledgements, adding new nodes, no ajax
  • Graphite is a highly scalable real-time graphing systemwritten in pythonapache 2.0 license
  • Graphite is a highly scalable real-time graphing systemwritten in pythonapache 2.0 license
  • Web – djangoWhisper – metrics database format (similar to RRDTool). Accepts out-of-order data and supports pipelining of data in a single operation.Carbon – storage engine (agent + cache + persister)
  • Web – djangoWhisper – database for storing time series dataCarbon – listening service for capturing data
  • Web – djangoWhisper – database for storing time series dataCarbon – listening service for capturing data
  • Why Graphing and trendingApplication profiling dataOperational profiling data
  • Why Graphing and trendingApplication profiling dataOperational profiling data
  • Counter example add 1 to the particular bucket. Count is sent at flush interval and reset to 0tells statsd that counter is sampled every 1/10th of the time.Timing exampleAPI service took 320ms to completeStatsd determines percentiles, average (mean), standard deviation, sum, lower and upper bounds for the flush intervalCan support storing histogram of values too (not default)
  • Mean, upper, lower, stddev, upper 90, lower 90, count
  • Embedded web server and embedded elastic searchLead in shipper alternatives
  • Designed with CM in mind
  • Designed with CM in mind
  • Designed with CM in mindDescribe how client registers with server.
  • Reuse nagios plugins
  • Open Source Monitoring Tools

    1. 1. The State of Open Source Monitoring Tools Michael Richardson (@m_richo) Energized Work
    2. 2. What tools are we currently using to monitor and troubleshoot our systems?
    3. 3. What tools are we currently using to monitor and troubleshoot our systems? • • • • Nagios ssh + grep <something_bad> /some/random/log/file.log tail –f /some/random/log/file.log Others?
    4. 4. Nagios
    5. 5. Nagios – The lovers
    6. 6. Nagios – The lovers
    7. 7. Nagios – The lovers
    8. 8. Nagios – The lovers
    9. 9. Nagios Love-meter 0 10
    10. 10. Nagios Love-meter Where are you on the Scale? 0 10
    11. 11. Nagios Love-meter Where are you on the Scale? 0 Nagios shits me to tears 10 Sign me up to Nagios World Conference 2013!!!!
    12. 12. Alternatives ?
    13. 13. Alternatives ? Yep, there’s lots
    14. 14. Alternatives ? Yep, there’s lots some are better and some are worse
    15. 15. Today let’s check out • Graphite • Statsd • Logstash • Sensu
    16. 16. Graphite
    17. 17. Graphite • • • • • Metric storage Complex graph creation http://graphite.wikidot.com Apache 2.0 license Send time-series data that you are interested in graphing
    18. 18. Graphite Components 1. Web 2. Whisper 3. Carbon
    19. 19. Graphite • Everything stored in graphite has a path with components delimited by dots. Eg servers.HOSTNAME.METRIC applications.APPNAME.METRIC servers.database01.memfree applications.trading.loginattempts
    20. 20. Graphite • • No need to pre-define metric end-points Determine granularity of data upfront. /opt/graphite/conf/storage-schemas.conf [stats] pattern = ^stats.* retentions = 10:2160,60:10080,600:262974 [catchall] priority = 0 pattern = ^.* retentions = 30:86400,300:525600
    21. 21. Graphite What should I graph/trend? 1. Application Profiling Data 2. Operational Profiling Data 3. Regression Testing (releases) Why should I Graph/trend? 1. Trends can tell you when something is about to break. 2. …instead of hearing from your customers that it’s broken 3. Data can tell you when something is already broken but you don’t yet know it (regression). Source: Jason Dixon (@obfuscurity)
    22. 22. Graphite Demo Image source - http://joemiller.me/2011/11/05/correlating-puppet-changes-to-events-in-your-infrastructure/
    23. 23. StatsD
    24. 24. StatsD • • • • Measure Anything, Measure Everything Created and released by Etsy Aggregate counters and timers http://github.com/etsy/statsd
    25. 25. StatsD • Written in node.js • ~400 lines of javascript • Listens to statistics (counters & timers), and sends aggregates to backend services (like graphite). • simple
    26. 26. StatsD Don’t like Javascript or Node.js??
    27. 27. StatsD Don’t like Javascript or Node.js?? Google “statsd alternatives”…..
    28. 28. StatsD Don’t like Javascript or Node.js?? Google “statsd alternatives”….. 20+ rewrites/clones for you including.. Ruby, python, scala, python+twisted, erlang, clojure, C, groovy
    29. 29. StatsD Concepts • Buckets (a name that translates to graphite end-point) • Values • Flush (default 10 seconds) Counter metrics successfullogins:1|c|@0.1 Timing metrics apitimer:320|ms
    30. 30. StatsD Counter examples • Successful customer login attempts • Failed customer login attempts • Register a new customer • Hit 3rd party API
    31. 31. StatsD Timer examples • How fast is our function blah() • How fast is a database query • How fast is our 3rd party API service • How fast is our internet access • How fast are our page response times.
    32. 32. StatsD demo
    33. 33. LogStash
    34. 34. LogStash • • • • • Tool for managing Events and logs http://logstash.net https://github.com/logstash/logstash Apache 2.0 license Created by Jordan Sissel (@jordansissel)
    35. 35. LogStash • Written in ruby. • Built with jruby and ships as a jar file.
    36. 36. LogStash LogStash agent is an Event pipeline with 3 parts. 1. Inputs 2. Filters 3. Outputs
    37. 37. LogStash 1. Inputs – generate events 1. Filters – modify them 1. Outputs – ship them somewhere
    38. 38. LogStash Inputs include : amqp, drupal_dblog, eventlog, exec, file, ganglia, gelf, gemfire, generator, heroku, irc, log4j, lumberjack, pipe, redis, relp, sqs, stdin, stomp, syslog, tcp, twitter, udp, xmpp, zenoss, zeromq
    39. 39. LogStash Filters include : alter, anonymize, checksum, csv, date, dns, environment, gelfify, geoip, grep, grok, grokdiscovery, json, kv, metrics, multiline, mutate, noop, split, syslog_pri, urldecode, xml, zeromq
    40. 40. LogStash Outputs include : amqp, boundary, circonus, cloudwatch, datadog, elasticsearch, elasticsearch_http, elasticsearch_river, email, exec, file, ganglia, gelf, gemfire, graphite, graphtastic, http, internal, irc, juggernaut, librato, loggly, lumberjack, metriccatcher, mongodb, nagios, nagios_nsca, null, opentsdb, pagerduty, pipe, redis, riak, riemann, sns, sqs, statsd, stdout, stomp, syslog, tcp, websocket, xmpp, zabbix, zeromq
    41. 41. LogStash Typical setup
    42. 42. LogStash Shipper alternatives?
    43. 43. LogStash Shipper alternatives? • Syslog (rsyslog, syslog-ng,) • Lumberjack https://github.com/jordansissel/lumberjack • Beaver https://github.com/josegonzalez/beaver • Woodchuck https://github.com/danryan/woodchuck
    44. 44. LogStash Kibana • Web interface for viewing logstash records stored in elastic search • http://kibana.org/ • http://github.com/rashidkpc/Kibana • Search for records • Stream records (near realtime) • Create RSS feeds based on search results • Score, trend data
    45. 45. LogStash Kibana – search data Image source - http://kibana.org/
    46. 46. LogStash Kibana – trend data Image source - http://kibana.org/
    47. 47. LogStash Demo (Syslog & Apache access logs)
    48. 48. LogStash TIP – Go buy the Logstash Book – http://logstashbook.com/ James Turnbull (@kartar) It’s a great introduction to how to use Logstash.
    49. 49. Sensu • • • • • https://github.com/sensu/sensu Creator – Sean Porter (@portertech) Ruby, RabbitMQ, Redis <1200 lines of code Omnibus installation packages
    50. 50. Sensu Components • Sensu-server • Sensu-client • Sensu-api • Sensu-dashboard
    51. 51. Sensu • Message oriented architecture (messages are JSON objects) • Described as a monitoring router • Connects “check” scripts on Sensu Clients to “handler” scripts on Sensu Servers
    52. 52. Sensu Checks can • Determine if a service like apache up and running? (check exit code) • Collect metrics like page views or database cache usage.
    53. 53. Sensu Output of checks are router to 1 or more handlers who determine what to do.
    54. 54. Sensu Output of checks are router to 1 or more handlers who determine what to do. • Send alerts via email, pagerduty, IRC, twitter, basecamp, xmpp, hipchat, campfire, etc, etc
    55. 55. Sensu Output of checks are router to 1 or more handlers who determine what to do. • Send alerts via email, pagerduty, IRC, twitter, basecamp, xmpp, hipchat, campfire, etc, etc • Feed metrics to backend services like graphite, librato, opentsdb, etc, etc
    56. 56. Sensu demo
    57. 57. Questions??
    58. 58. Thank you
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×