Monitoring with
First of all...
First of all...
Why we (me) don’t like Nagios?
• No active-active cluster
Why we (me) don’t like Nagios?
• No active-active cluster (nor easy active-passive one)
Why we (me) don’t like Nagios?
• No active-active cluster (nor easy active-passive one)
Why we (me) don’t like Nagios?
• No active-active cluster (nor easy active-passive one)
• Restart after adding a host (auto-scaling
anyone??)
Why we (me) don’t like Nagios?
• No active-active cluster (nor easy active-passive one)
• Restart after adding a host (auto-scaling
anyone??)
• (Really) difficult to hack on
Why we (me) don’t like Nagios?
• The master executes the checks (all of them!)
Why we (me) don’t like Nagios?
• The master executes the checks (all of them!)
• Painful config reloads if using slaves
Why we (me) don’t like Nagios?
• The master executes the checks (all of them!)
• Painful config reloads if using slaves
• (Quite) hard to configure
Why we (me) don’t like Nagios?
• First release: March 14, 1999
Why we (me) don’t like Nagios?
• First release: March 14, 1999
What’s Sensu?
• A monitoring framework
• They say “monitoring router”
• It takes results of checks and passes them to
handlers
Sensu facts
• Active-active cluster
Sensu facts
• Active-active cluster
• Hosts automagically registered when agent
starts
Sensu facts
• Active-active cluster
• Hosts automagically registered when agent
starts
• Written in Ruby
Sensu facts
• Clients execute all the checks (all of them!)
Sensu facts
• Clients execute all the checks (all of them!)
• No need to have slaves (but you can if you
want to)
Sensu facts
• Clients execute all the checks (all of them!)
• No need to have slaves (but you can if you
want to)
• Configured with small JSON files
Sensu facts
• Clients execute all the checks (all of them!)
• No need to have slaves (but you can if you
want to)
• Configured with small JSON files
• Compatible with Nagios plugins!
Sensu facts
• IT’S 2014
Sensu facts
• IT’S 2014
What does Sensu provide?
• Agent (client) to gather check results
• Server to take actions (or not) on these results
• REST API to query the state
• Optional dashboard to query the API
What does Sensu NOT provide?
• Storage for the results history
• Notifications
• Graphing
Architecture overview
Two modes of operation
Subscription
subscription messages
Subscription
check request
Subscription
check results
Standalone
Standalone
check results
• Modes can be mixed
• The client also listens for results in a socket
– Throw JSON to it from your application
Handlers
• Each check can have its own handler
• Sensu considers only results flagged as
“metrics” and not-ok’s
Handler types
• Pipe: the server executes a script and throws
the event as JSON to the standard input
• TPC and UDP: the server connects to a host
and port and delivers the event
• AMQP: the event can be reinjected in the
broker for further routing
Handling events selectively
• What if you don’t want to handle some
events?
• REST API to the rescue
Stashes
• A stash is associated with an ID (a “path”)
• You can store here as much shit as you want
• Put a “silence” in the stash, and make your
handlers check for it.
Mutators
• A filter to modify event data before passing it
to a handler.
• Example: manipulate events to make them
suitable for graphite before passing them to a
TCP handler
Extensions
• Mutators and Handlers running in the same
process than Sensu
• They must be written also in Ruby
• Run in the same event loop
Things missing / rant
• You can’t monitor something if you can’t
install an agent on it.
– There’s a ticket open since more than one year
ago...
– Some patches, but still not a solution
• As with nagios, the state is decided by the
plugins.
Further research
• http://sensuapp.org/docs/0.12
• https://github.com/sensu
• http://sensuapp.org/support/
• https://metacpan.org/pod/Sensu::API::Client

Monitoring with sensu

  • 1.
  • 2.
  • 3.
  • 4.
    Why we (me)don’t like Nagios? • No active-active cluster
  • 5.
    Why we (me)don’t like Nagios? • No active-active cluster (nor easy active-passive one)
  • 6.
    Why we (me)don’t like Nagios? • No active-active cluster (nor easy active-passive one)
  • 7.
    Why we (me)don’t like Nagios? • No active-active cluster (nor easy active-passive one) • Restart after adding a host (auto-scaling anyone??)
  • 8.
    Why we (me)don’t like Nagios? • No active-active cluster (nor easy active-passive one) • Restart after adding a host (auto-scaling anyone??) • (Really) difficult to hack on
  • 9.
    Why we (me)don’t like Nagios? • The master executes the checks (all of them!)
  • 10.
    Why we (me)don’t like Nagios? • The master executes the checks (all of them!) • Painful config reloads if using slaves
  • 11.
    Why we (me)don’t like Nagios? • The master executes the checks (all of them!) • Painful config reloads if using slaves • (Quite) hard to configure
  • 12.
    Why we (me)don’t like Nagios? • First release: March 14, 1999
  • 13.
    Why we (me)don’t like Nagios? • First release: March 14, 1999
  • 15.
    What’s Sensu? • Amonitoring framework • They say “monitoring router” • It takes results of checks and passes them to handlers
  • 16.
  • 17.
    Sensu facts • Active-activecluster • Hosts automagically registered when agent starts
  • 18.
    Sensu facts • Active-activecluster • Hosts automagically registered when agent starts • Written in Ruby
  • 19.
    Sensu facts • Clientsexecute all the checks (all of them!)
  • 20.
    Sensu facts • Clientsexecute all the checks (all of them!) • No need to have slaves (but you can if you want to)
  • 21.
    Sensu facts • Clientsexecute all the checks (all of them!) • No need to have slaves (but you can if you want to) • Configured with small JSON files
  • 22.
    Sensu facts • Clientsexecute all the checks (all of them!) • No need to have slaves (but you can if you want to) • Configured with small JSON files • Compatible with Nagios plugins!
  • 23.
  • 24.
  • 25.
    What does Sensuprovide? • Agent (client) to gather check results • Server to take actions (or not) on these results • REST API to query the state • Optional dashboard to query the API
  • 26.
    What does SensuNOT provide? • Storage for the results history • Notifications • Graphing
  • 27.
  • 28.
    Two modes ofoperation
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
    • Modes canbe mixed • The client also listens for results in a socket – Throw JSON to it from your application
  • 36.
    Handlers • Each checkcan have its own handler • Sensu considers only results flagged as “metrics” and not-ok’s
  • 37.
    Handler types • Pipe:the server executes a script and throws the event as JSON to the standard input • TPC and UDP: the server connects to a host and port and delivers the event • AMQP: the event can be reinjected in the broker for further routing
  • 39.
    Handling events selectively •What if you don’t want to handle some events? • REST API to the rescue
  • 40.
    Stashes • A stashis associated with an ID (a “path”) • You can store here as much shit as you want • Put a “silence” in the stash, and make your handlers check for it.
  • 41.
    Mutators • A filterto modify event data before passing it to a handler. • Example: manipulate events to make them suitable for graphite before passing them to a TCP handler
  • 42.
    Extensions • Mutators andHandlers running in the same process than Sensu • They must be written also in Ruby • Run in the same event loop
  • 44.
    Things missing /rant • You can’t monitor something if you can’t install an agent on it. – There’s a ticket open since more than one year ago... – Some patches, but still not a solution • As with nagios, the state is decided by the plugins.
  • 45.
    Further research • http://sensuapp.org/docs/0.12 •https://github.com/sensu • http://sensuapp.org/support/ • https://metacpan.org/pod/Sensu::API::Client

Editor's Notes

  • #10 Checks NRPE también son ejecutados por el master Lo que además implica tener “algo” escuchando hacia afuera en las máquinas a monitorizar O poder acceder por VPN
  • #13 Hace más de 15 años
  • #14 Hace más de 15 años. Ha servido durante mucho tiempo y lo ha hecho bien. Pero no me malinterpretéis. Herramienta equivocada.
  • #26 Sensu es un framework. Esto es lo único que da
  • #30 suscripcion a uno o mas conjuntos de checks - una vez cuando el cliente arranca
  • #31 El servidor publica periódicamente peticiones de check
  • #32 respuesta
  • #34 Cliente planifica sus checks y va enviando
  • #37 For each event, sensu will send the event data to its handler If you want to take actions on Oks, mark them as metrics