Metrics and Monitoring in the cloudDavid Lutz@dlutzy
The objective of metrics is tomake pretty graphs…
The objective of metrics is tomake pretty graphs...in order to understand the performanceand capacityof your systems and how they vary over time.
The objective of monitoring is to…make the Operations-guy-on-call’s life hell.
The objective of monitoring is tocheck that the system is working as expectedand take action if some component isnt.
“Those who cannot remember the past arecondemned to repeat it” - George Santayana So here’s a case study…
A long time ago in a data centre far,far away….
Complete system includes humans to run it!Human Factors Engineering.http://en.wikipedia.org/wiki/Human_factors2 x Linux Engineers1 x Network Engineer1 x Do Anything Guy1 x Developer
No Monitoring or Metrics. Black Box. Completely blind.
Large Development team External Consultants ITIL Process people5 x Linux Engineers1 x Network Engineer2 x Database AdministratorsandPart of an Infrastructure team that includedVirtualization specialistsStorage specialistsHardware specialists
WTF happened? It grew… Virtualization / Cloud Cloud / Virtualization
Approximately 400 serversStill using Nagios and Cacti15 minutes to add server manually.1 hour or more to add a new check.
And Ganglia.And External SAAS tools:New Relic. Gomez. Omniture.
Getting it right
Getting it wrong
What’s different about the cloud?• Servers come and go• Sometimes automatically with auto-scaling• Topologies and Architectures change rapidly• Driven from Configuration Management Systems
The problems with Nagios• Clunky UI.• Monolithic design.• Hard to scale.• Hard to add nodes dynamically.
Sensu… Is it the Nagios killer?
• JSON everywhere• Can re-use Nagios checks• Messaging oriented architecture• Designed to be driven from Config Management tools• Supports dynamic topologies
David Lutz 99designs @dlutzymeetup.com/Infrastructure-Coders/