SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
1.
From #MonitoringSucks to From #MonitoringSucks to
#MonitoringLove #MonitoringLove
(and back)(and back)
@KrisBuytaert
T-Dose 2015, Eindhoven,.nl
2.
Kris BuytaertKris Buytaert
● I used to be a Dev,I used to be a Dev,
● Then Became an OpThen Became an Op
● Chief Trolling Officer and Open SourceChief Trolling Officer and Open Source
Consultant @inuits.euConsultant @inuits.eu
● Everything is an effing DNS ProblemEverything is an effing DNS Problem
● Building Clouds since before the bookstoreBuilding Clouds since before the bookstore
● Organising ConferencesOrganising Conferences
● Evangelizing devopsEvangelizing devops
3.
An opinionated talk about the Open SourceAn opinionated talk about the Open Source
Monitoring tooling landscapeMonitoring tooling landscape
In which I hope to learn from YOUIn which I hope to learn from YOU
4.
#devops=~C(L)AMS#devops=~C(L)AMS
● CultureCulture
● (Lean)(Lean)
● AutomationAutomation
● Monitoring and MeasurementMonitoring and Measurement
● SharingSharing
Damon Edwards and John WillisDamon Edwards and John Willis
Gene KimGene Kim
5.
Monitoring is usually anMonitoring is usually an
aftertoughtaftertought
ENOBUDGET, ENOTIMEENOBUDGET, ENOTIME
6.
An 2008 OLS PaperAn 2008 OLS Paper
● We have bloated Java toolsWe have bloated Java tools
● Some open Core stufSome open Core stuf
● DYI folks want traditional NagiosDYI folks want traditional Nagios
● DBA RequiredDBA Required
7.
#monitoringsucks#monitoringsucks
● John Vincent (@lusis), june 2011John Vincent (@lusis), june 2011
● A sub #devops movementA sub #devops movement
● https://github.com/monitoringsucks/https://github.com/monitoringsucks/
8.
Why #monitoringsucksWhy #monitoringsucks
● Manual config (gui)Manual config (gui)
● Not in sync with realityNot in sync with reality
● Hosts onlyHosts only
● Services sometimesServices sometimes
● Aplication neverAplication never
● Chaos or out of sync with realityChaos or out of sync with reality
● Alert FatigueAlert Fatigue
9.
Let's forget aboutLet's forget about
● Tools with no (stable) APITools with no (stable) API
● Tools with strong focus on GUITools with strong focus on GUI
● Unless you are an SME with < 100 nodesUnless you are an SME with < 100 nodes
● Zenoss, Hyperic, GroundWork, ....Zenoss, Hyperic, GroundWork, ....
● P.S. : don't even mention proprietary software to meP.S. : don't even mention proprietary software to me
10.
What we wantWhat we want
● Small , well suited componentsSmall , well suited components
•
CollectCollect
•
Transport / MangleTransport / Mangle
•
StoreStore
•
AnalyseAnalyse
•
Act / AlertAct / Alert
•
VisualizeVisualize
11.
#monitoringlove#monitoringlove
•
•
Ulf Mansson #devopsdays Rome 2011Ulf Mansson #devopsdays Rome 2011
•
A new era of toolingA new era of tooling
•
#monitoringlove hacksessions @inuits#monitoringlove hacksessions @inuits
•
#monitorama#monitorama
12.
IcingaIcinga
•
2009 Fork2009 Fork
•
I consider Nagios deadI consider Nagios dead
• Vibrant Community (or they stalk me)Vibrant Community (or they stalk me)
•
Throw great parties in NurnbergThrow great parties in Nurnberg
•
Nobody can pronounce it anyhowNobody can pronounce it anyhow
•
https://github.com/Inuits/puppet-icinga/https://github.com/Inuits/puppet-icinga/
14.
#monitoringlove#monitoringlove
But the love was about :But the love was about :
15.
SensuSensu
● Awesome for non staticAwesome for non static
environmentsenvironments
● Scaling a clustered RabbitMQ ?Scaling a clustered RabbitMQ ?
● This is Europe, U no do cloudThis is Europe, U no do cloud
16.
Automation ofAutomation of
#monitoring#monitoring
brought backbrought back
thethe #love#love
17.
Monitoring aMonitoring a serviceservice
vsvs
Monitoring aMonitoring a ServiceService
18.
definition of done:definition of done:
monitored and in productionmonitored and in production
19.
A software project is not doneA software project is not done
untill your last end user is deaduntill your last end user is dead
20.
Culture,Culture,
Automation,Automation,
Measurement :Measurement :
measure all the thingsmeasure all the things
SharingSharing
21.
Deploy StatisticsDeploy Statistics
● Time To DeployTime To Deploy
● DeployDeploy
FrequencyFrequency
● LifecycleLifecycle
frequencyfrequency
● Map to otherMap to other
metricsmetrics
22.
CollectD all the metrics,CollectD all the metrics,
at high intervalsat high intervals
28.
Triggers on GraphsTriggers on Graphs
● Export Java MetricsExport Java Metrics
● JMXTransJMXTrans
● Export JMXConfigsExport JMXConfigs
● Configure NRPE CheckConfigure NRPE Check
● Export NagiosCheckExport NagiosCheck
● Collect JMX Exports onCollect JMX Exports on
JMXTransNodeJMXTransNode
● Graph EmGraph Em
Collect Icinga ConfigsCollect Icinga Configs
on Icingaon Icinga
29.
AggregationAggregation
● Alert on streamsAlert on streams
● Alert on aggregated metricsAlert on aggregated metrics
30.
RiemannRiemann
● I still don't get it ?I still don't get it ?
● Distributed TopDistributed Top
● Do you like Clojure ?Do you like Clojure ?
● Riemann Health plugin ?Riemann Health plugin ?
● s/riemann-health/collectd/g;s/riemann-health/collectd/g;
● Output to graphiteOutput to graphite
31.
Graphs to KnowledgeGraphs to Knowledge
SkylineSkyline
•
OculusOculus
•
Creating Information out of this dataCreating Information out of this data
•
Big dataBig data
• Machine LearningMachine Learning
38.
So your DC failsSo your DC fails
Whom to alert when ?Whom to alert when ?
39.
'New' kids on the block'New' kids on the block
● FlapjackFlapjack
flapjack.ioflapjack.io
monitoring notification routing +monitoring notification routing +
event processing systemevent processing system
● OpenDutyOpenDuty
github.com/szechuen/OpenDutygithub.com/szechuen/OpenDuty
Duty managementDuty management
40.
My Alerting StrategyMy Alerting Strategy
Is still in betaIs still in beta
41.
And back :(And back :(
In 2014 I`m still running the same check forIn 2014 I`m still running the same check for
- service registration (consul)- service registration (consul)
- high availability (pacemaker/corosync)- high availability (pacemaker/corosync)
- monitoring (icinga)- monitoring (icinga)
42.
But I love where Monitoring is headingBut I love where Monitoring is heading
We have much less false positivesWe have much less false positives
And we have a Maintainable Monitoring InfraAnd we have a Maintainable Monitoring Infra
KindaKinda