1. From #MonitoringSucks toFrom #MonitoringSucks to
#MonitoringLove#MonitoringLove
Open Source Monitoring in 2018-2019Open Source Monitoring in 2018-2019
@KrisBuytaert
Devops Meetup, Brno
2. Kris BuytaertKris Buytaert
● I used to be a Dev,I used to be a Dev,
● Then Became an OpThen Became an Op
● Chief Twitter Ofcer and Open SourceChief Twitter Ofcer and Open Source
Consultant @inuits.euConsultant @inuits.eu
● Everything is an efng DNS ProblemEverything is an efng DNS Problem
● Building Clouds since before the bookstoreBuilding Clouds since before the bookstore
● Organising ConferencesOrganising Conferences
● Evangelizing devopsEvangelizing devops
3. An opinionated talk about the Open SourceAn opinionated talk about the Open Source
Monitoring tooling landscapeMonitoring tooling landscape
In which I hope to learn from YOUIn which I hope to learn from YOU
5. Monitoring is usually anMonitoring is usually an
aftertoughtaftertought
ENOBUDGET, ENOTIMEENOBUDGET, ENOTIME
6. AnAn 20082008 OLS PaperOLS Paper
● We have bloated Java toolsWe have bloated Java tools
● Some open Core stufSome open Core stuf
● DYI folks want traditional NagiosDYI folks want traditional Nagios
● DBA RequiredDBA Required
7. #monitoringsucks#monitoringsucks
● John Vincent (@lusis), june 2011John Vincent (@lusis), june 2011
● A sub #devops movementA sub #devops movement
● https://github.com/monitoringsucks/https://github.com/monitoringsucks/
8. Why #monitoringsucksWhy #monitoringsucks
● Manual confg (gui)Manual confg (gui)
● Not in sync with realityNot in sync with reality
● Hosts onlyHosts only
● Services sometimesServices sometimes
● Application neverApplication never
● Chaos or out of sync with realityChaos or out of sync with reality
● Alert FatigueAlert Fatigue
9. #monitoringlove#monitoringlove
•
•
Ulf Mansson #devopsdays Rome 2011Ulf Mansson #devopsdays Rome 2011
•
A new era of toolingA new era of tooling
• #monitoringlove hacksessions @inuits#monitoringlove hacksessions @inuits
•
#monitorama#monitorama
10. What we wantWhat we want
● Small , well suited componentsSmall , well suited components
•
CollectCollect
•
Transport / MangleTransport / Mangle
•
StoreStore
•
AnalyseAnalyse
•
Act / AlertAct / Alert
•
VisualizeVisualize
13. The love was : SensuThe love was : Sensu
● Awesome for non staticAwesome for non static
environmentsenvironments
● Scaling a clustered RabbitMQ ?Scaling a clustered RabbitMQ ?
● Looking more and more likeLooking more and more like
PrometheusPrometheus
● This is Europe, U no do cloudThis is Europe, U no do cloud
25. Store :Store :
● TSDB : Time Series DBTSDB : Time Series DB
● Optimized DB for Time SeriesOptimized DB for Time Series
● Graphite/ Infux / OpenTSDB / ....Graphite/ Infux / OpenTSDB / ....
● ElasticElastic
● Long Term vs Short Term StorageLong Term vs Short Term Storage
27. PrometheusPrometheus
● Started 2012Started 2012
● SoundCloudSoundCloud
● Metrics BasedMetrics Based
● ScrapesScrapes
EndpointsEndpoints
•
ExistingExisting
endpoints forendpoints for
limited toolslimited tools
● GraphiteGraphite
ExporterExporter
● Push GatewayPush Gateway
● Great AlertingGreat Alerting
28. PrometheusPrometheus
● Mostly for Short TermMostly for Short Term
● Still Ship longterm metrics to otherStill Ship longterm metrics to other
TSDBTSDB
● Nginx gw’s all over the placeNginx gw’s all over the place
•
(ssl fun)(ssl fun)
29. Infnite Diskspace ?Infnite Diskspace ?
● Logstash outputLogstash output
•
Statsd => GraphiteStatsd => Graphite
•
Keep patterns around,Keep patterns around,
•
Selectively purge dataSelectively purge data
● Prometheus for Short TermPrometheus for Short Term
•
Graphite for Long termGraphite for Long term
31. Prometheus ?Prometheus ?
● Only For Containers ?Only For Containers ?
● Also for other setups !Also for other setups !
● Is this sufcient ?Is this sufcient ?
34. Waking you up at nightWaking you up at night
● FlapjackFlapjack
fapjack.iofapjack.io
monitoring notifcation routing +monitoring notifcation routing +
event processing systemevent processing system
● OpenDutyOpenDuty
github.com/szechuen/OpenDutygithub.com/szechuen/OpenDuty
Duty managementDuty management
35. Waking you up at nightWaking you up at night
● AnagAnag
● Custom written stufCustom written stuf
38. Graphs to KnowledgeGraphs to Knowledge
SkylineSkyline
•
OculusOculus
•
Creating Information out of this dataCreating Information out of this data
•
Big dataBig data
•
Machine LearningMachine Learning
•
Hastic.ioHastic.io
39. Hastic.ioHastic.io
● Open Source Pattern DetectionOpen Source Pattern Detection
● Label patterns → Wait for learning toLabel patterns → Wait for learning to
complete → Get detectionscomplete → Get detections
● Hastic Server + Grafana AppHastic Server + Grafana App
44. ChallengeChallenge
● *ana as code*ana as code
● Template your ...Template your ...
● e.g grafonnet-libe.g grafonnet-lib
•
A jsonnet lib to generate GrafanaA jsonnet lib to generate Grafana
dashboards ...dashboards ...
48. APMAPM
Application Performance MonitoringApplication Performance Monitoring
But what about my apps ?But what about my apps ?
● agent required that ties to codeagent required that ties to code
● Code modifcationsCode modifcations
52. OpenTracing 101OpenTracing 101
● The problem : It was not reasonable to ask all OSS services and all OSSThe problem : It was not reasonable to ask all OSS services and all OSS
packages and all application-specifc code to use a single tracingpackages and all application-specifc code to use a single tracing
vendor => Open Ttracingvendor => Open Ttracing
● Distributed Tracing StandardDistributed Tracing Standard
● CNCFCNCF
● Dapper inside GoogleDapper inside Google
● ““OpenTracing is not a download or a program. Distributed tracingOpenTracing is not a download or a program. Distributed tracing
requires that software developers add instrumentation to the code ofrequires that software developers add instrumentation to the code of
an application, or to the frameworks used in the application”an application, or to the frameworks used in the application”
53. Complexity is the EnemyComplexity is the Enemy
of Reliabilityof Reliability
54. I love where Monitoring is headingI love where Monitoring is heading
““Wait , was I oncall last week ?”Wait , was I oncall last week ?”
True words said by one of our oncall engineersTrue words said by one of our oncall engineers