Open Source Monitoring in 2015

1,199 views

Published on

Slides from my 2015 , T-Dose talk

Published in: Technology

Open Source Monitoring in 2015

  1. 1. From #MonitoringSucks to  From #MonitoringSucks to   #MonitoringLove #MonitoringLove  (and back)(and back) @KrisBuytaert T-Dose 2015, Eindhoven,.nl
  2. 2. Kris BuytaertKris Buytaert ● I used to be a Dev,I used to be a Dev, ● Then Became an OpThen Became an Op ● Chief Trolling Officer and Open SourceChief Trolling Officer and Open Source Consultant @inuits.euConsultant @inuits.eu ● Everything is an effing DNS ProblemEverything is an effing DNS Problem ● Building Clouds since before the bookstoreBuilding Clouds since before the bookstore ● Organising ConferencesOrganising Conferences ● Evangelizing devopsEvangelizing devops
  3. 3. An opinionated talk about the Open SourceAn opinionated talk about the Open Source Monitoring tooling landscapeMonitoring tooling landscape In which I hope to learn from YOUIn which I hope to learn from YOU
  4. 4. #devops=~C(L)AMS#devops=~C(L)AMS ● CultureCulture ● (Lean)(Lean) ● AutomationAutomation ● Monitoring and MeasurementMonitoring and Measurement ● SharingSharing Damon Edwards and John WillisDamon Edwards and John Willis Gene KimGene Kim
  5. 5. Monitoring is usually anMonitoring is usually an aftertoughtaftertought ENOBUDGET, ENOTIMEENOBUDGET, ENOTIME
  6. 6. An 2008 OLS PaperAn 2008 OLS Paper ● We have bloated Java toolsWe have bloated Java tools ● Some open Core stufSome open Core stuf ● DYI folks want traditional NagiosDYI folks want traditional Nagios ● DBA RequiredDBA Required
  7. 7. #monitoringsucks#monitoringsucks ● John Vincent (@lusis), june 2011John Vincent (@lusis), june 2011 ● A sub #devops movementA sub #devops movement ● https://github.com/monitoringsucks/https://github.com/monitoringsucks/
  8. 8. Why #monitoringsucksWhy #monitoringsucks ● Manual config (gui)Manual config (gui) ● Not in sync with realityNot in sync with reality ● Hosts onlyHosts only ● Services sometimesServices sometimes ● Aplication neverAplication never ● Chaos or out of sync with realityChaos or out of sync with reality ● Alert FatigueAlert Fatigue
  9. 9. Let's forget aboutLet's forget about ● Tools with no (stable) APITools with no (stable) API ● Tools with strong focus on GUITools with strong focus on GUI ● Unless you are an SME with < 100 nodesUnless you are an SME with < 100 nodes ● Zenoss, Hyperic, GroundWork, ....Zenoss, Hyperic, GroundWork, .... ● P.S. : don't even mention proprietary software to meP.S. : don't even mention proprietary software to me
  10. 10. What we wantWhat we want ● Small , well suited componentsSmall , well suited components • CollectCollect • Transport / MangleTransport / Mangle • StoreStore • AnalyseAnalyse • Act / AlertAct / Alert • VisualizeVisualize
  11. 11. #monitoringlove#monitoringlove • • Ulf Mansson #devopsdays Rome 2011Ulf Mansson #devopsdays Rome 2011 • A new era of toolingA new era of tooling • #monitoringlove hacksessions @inuits#monitoringlove hacksessions @inuits • #monitorama#monitorama
  12. 12. IcingaIcinga • 2009 Fork2009 Fork • I consider Nagios deadI consider Nagios dead • Vibrant Community (or they stalk me)Vibrant Community (or they stalk me) • Throw great parties in NurnbergThrow great parties in Nurnberg • Nobody can pronounce it anyhowNobody can pronounce it anyhow • https://github.com/Inuits/puppet-icinga/https://github.com/Inuits/puppet-icinga/
  13. 13. AutomationAutomation
  14. 14. #monitoringlove#monitoringlove But the love was about :But the love was about :
  15. 15. SensuSensu ● Awesome for non staticAwesome for non static environmentsenvironments ● Scaling a clustered RabbitMQ ?Scaling a clustered RabbitMQ ? ● This is Europe, U no do cloudThis is Europe, U no do cloud
  16. 16. Automation ofAutomation of #monitoring#monitoring brought backbrought back thethe #love#love
  17. 17. Monitoring aMonitoring a serviceservice vsvs Monitoring aMonitoring a ServiceService
  18. 18. definition of done:definition of done: monitored and in productionmonitored and in production
  19. 19. A software project is not doneA software project is not done untill your last end user is deaduntill your last end user is dead
  20. 20. Culture,Culture, Automation,Automation, Measurement :Measurement : measure all the thingsmeasure all the things SharingSharing
  21. 21. Deploy StatisticsDeploy Statistics ● Time To DeployTime To Deploy ● DeployDeploy FrequencyFrequency ● LifecycleLifecycle frequencyfrequency ● Map to otherMap to other metricsmetrics
  22. 22. CollectD all the metrics,CollectD all the metrics, at high intervalsat high intervals
  23. 23. Oldschool graphiteOldschool graphite
  24. 24. Self ServiceSelf Service Gdash based pipelinesGdash based pipelines Puppetized Templates (wip)Puppetized Templates (wip)
  25. 25. GdashGdash
  26. 26. GrafanaGrafana
  27. 27. Graphite++Graphite++ ● DashboardsDashboards • GrafanaGrafana ● Engines :Engines : • InfluxDBInfluxDB • CyaniteCyanite
  28. 28. Triggers on GraphsTriggers on Graphs ● Export Java MetricsExport Java Metrics ● JMXTransJMXTrans ● Export JMXConfigsExport JMXConfigs ● Configure NRPE CheckConfigure NRPE Check ● Export NagiosCheckExport NagiosCheck ● Collect JMX Exports onCollect JMX Exports on JMXTransNodeJMXTransNode ● Graph EmGraph Em Collect Icinga ConfigsCollect Icinga Configs on Icingaon Icinga
  29. 29. AggregationAggregation ● Alert on streamsAlert on streams ● Alert on aggregated metricsAlert on aggregated metrics
  30. 30. RiemannRiemann ● I still don't get it ?I still don't get it ? ● Distributed TopDistributed Top ● Do you like Clojure ?Do you like Clojure ? ● Riemann Health plugin ?Riemann Health plugin ? ● s/riemann-health/collectd/g;s/riemann-health/collectd/g; ● Output to graphiteOutput to graphite
  31. 31. Graphs to KnowledgeGraphs to Knowledge SkylineSkyline • OculusOculus • Creating Information out of this dataCreating Information out of this data • Big dataBig data • Machine LearningMachine Learning
  32. 32. But I have log files..But I have log files..
  33. 33. Logs and MetricsLogs and Metrics ● Graylog2Graylog2 ● ELSA (Enterprise Log Search andELSA (Enterprise Log Search and Archive)Archive) ● ELK StackELK Stack
  34. 34. ● Collect fromCollect from anywhereanywhere ● FilterFilter ● Send anywhereSend anywhere
  35. 35. APMAPM But what about my apps ?But what about my apps ? Half the world cheers about SAASHalf the world cheers about SAAS tools :(tools :(
  36. 36. PacketbeatPacketbeat ● Traffic FlowTraffic Flow through networkthrough network ● TransactionsTransactions causing errroscausing errros ● SQL per HTTPSQL per HTTP ● API call usageAPI call usage
  37. 37. PacketBeatPacketBeat
  38. 38. So your DC failsSo your DC fails Whom to alert when ?Whom to alert when ?
  39. 39. 'New' kids on the block'New' kids on the block ● FlapjackFlapjack flapjack.ioflapjack.io monitoring notification routing +monitoring notification routing + event processing systemevent processing system ● OpenDutyOpenDuty github.com/szechuen/OpenDutygithub.com/szechuen/OpenDuty Duty managementDuty management
  40. 40. My Alerting StrategyMy Alerting Strategy Is still in betaIs still in beta
  41. 41. And back :(And back :( In 2014 I`m still running the same check forIn 2014 I`m still running the same check for - service registration (consul)- service registration (consul) - high availability (pacemaker/corosync)- high availability (pacemaker/corosync) - monitoring (icinga)- monitoring (icinga)
  42. 42. But I love where Monitoring is headingBut I love where Monitoring is heading We have much less false positivesWe have much less false positives And we have a Maintainable Monitoring InfraAnd we have a Maintainable Monitoring Infra KindaKinda
  43. 43. ContactContact Kris.Buytaert@inuits.euKris.Buytaert@inuits.eu Further ReadingFurther Reading @krisbuytaert@krisbuytaert http://www.krisbuytaert.be/blog/http://www.krisbuytaert.be/blog/ http://www.inuits.eu/http://www.inuits.eu/ InuitsInuits Duboistraat 50Duboistraat 50 2060 Antwerpen2060 Antwerpen BelgiumBelgium 891.514.231891.514.231 +32 475 961221+32 475 961221

×