Monitoring in an Infrastructure as Code Age

10,926 views

Published on

My PuppetConf 2013 Talk
August 23, 2013
San Francisco

Published in: Technology
1 Comment
3 Likes
Statistics
Notes
  • There should be requirement template for each production service to specify the SLA, required HW performance and resources, data backup schedules, monitoring parameters that make business sense, etc so that puppet designers can automate the deployment, monitoring, and alerts. These will cost money and resources so there must be business justification and enforcement. IT operation teams tend to be passive to accept dev teams' configuration and clean it up for them. With recent new frameworks and aggregation of modules, no one may really understand each new service without a team effort to specify all these production requirements. I like your Puppet automation for the whole life cycle (excluding bare metal which I don't think Foreman is good enough). Network guys also need help and they will not get it from Cisco. So expand your work to cover the whole ecology to ensure good Monday mornings after hectic weekends.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
10,926
On SlideShare
0
From Embeds
0
Number of Embeds
6,537
Actions
Shares
0
Downloads
46
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide

Monitoring in an Infrastructure as Code Age

  1. 1. Monitoring in an IAC AgeMonitoring in an IAC Age PuppetConf 2013 Kris Buytaert
  2. 2. Kris BuytaertKris Buytaert ● I used to be a Dev,I used to be a Dev, ● Then Became an OpThen Became an Op ● Chief Trolling Officer and Open SourceChief Trolling Officer and Open Source Consultant @inuits.euConsultant @inuits.eu ● Everything is an effing DNS ProblemEverything is an effing DNS Problem ● Building Clouds since before the bookstoreBuilding Clouds since before the bookstore ● Some books, some papers, some blogsSome books, some papers, some blogs ● Evangelizing devopsEvangelizing devops
  3. 3. devops = clamsdevops = clams ● CultureCulture ● (Lean)(Lean) ● Automate all the things ...Automate all the things ... • Build AutomationBuild Automation • Test AutomationTest Automation • IACIAC ● Monitoring , Metrics ...Monitoring , Metrics ... ● SharingSharing
  4. 4. Monitoring is usually anMonitoring is usually an aftertoughtaftertought ENOBUDGET, ENOTIMEENOBUDGET, ENOTIME
  5. 5. #monitoringsucks#monitoringsucks ● John Vincent (@lusis)John Vincent (@lusis) ● A sub movementA sub movement ● https://github.com/monitoringsucks/https://github.com/monitoringsucks/
  6. 6. #monitoringlove#monitoringlove • #monitoringlove hacksessions#monitoringlove hacksessions • #monitorama#monitorama
  7. 7. Infrastructure as CodeInfrastructure as Code ● Model our infrastructureModel our infrastructure ● A fast reproducable platformA fast reproducable platform ● Disaster discovery for “free”Disaster discovery for “free”
  8. 8. For years we've tolerated humans to to makeFor years we've tolerated humans to to make structural manual changes to the infrastructurestructural manual changes to the infrastructure our critical applications are running on.our critical applications are running on. Whilst at the same time demanding those criticalWhilst at the same time demanding those critical applications to go trough rigid test scenarios.applications to go trough rigid test scenarios. Who let this happen ?Who let this happen ?
  9. 9. Infrastructure as CodeInfrastructure as Code ● Code = CodeCode = Code ● Version ControlVersion Control ● Quality ChecksQuality Checks ● TestingTesting ● Continuous IntegrationContinuous Integration ● Continous DeliveryContinous Delivery
  10. 10. Infrastructure as CodeInfrastructure as Code ● Core InfrastructureCore Infrastructure ● Middleware deployment andMiddleware deployment and integrationintegration ● Automated continuous applicationAutomated continuous application deploymentdeployment ● Integrated Security enforcementIntegrated Security enforcement ● Host, Service and ApplicationHost, Service and Application Monitoring configuredMonitoring configured
  11. 11. Why #monitoringsucksWhy #monitoringsucks ● Manual config (gui)Manual config (gui) ● Not in sync with realityNot in sync with reality ● Hosts onlyHosts only ● Services sometimesServices sometimes ● Appliccation neverAppliccation never ● ChaosChaos
  12. 12. Let's forget aboutLet's forget about ● Tools with no (stable) APITools with no (stable) API ● Tools with strong focus on GUITools with strong focus on GUI ● Unless you are an SME with < 100 nodesUnless you are an SME with < 100 nodes ● Zabixx, Zenoss, Hyperic, GroundWork, ....Zabixx, Zenoss, Hyperic, GroundWork, ....
  13. 13. Where to monitor ?Where to monitor ? ● DevDev ● AcceptanceAcceptance ● ProdProd
  14. 14. What we wantWhat we want ● Small , wel suited componentsSmall , wel suited components • CollectCollect • Transport / MangleTransport / Mangle • Analyse / ActAnalyse / Act • VisualizeVisualize
  15. 15. Monitoring BaselineMonitoring Baseline ● Deploy a host,Deploy a host, ● Add it to the monitoringAdd it to the monitoring ● Add collection toolsAdd collection tools ● Add check definitionsAdd check definitions ● Update the monitoring tool configUpdate the monitoring tool config
  16. 16. Apache Example:Apache Example:
  17. 17. Icinga ?Icinga ? • Isn't nagios dead ?Isn't nagios dead ? • Vibrant CommunityVibrant Community • Throw great parties in NurnbergThrow great parties in Nurnberg • Nobody can pronounce it anyhowNobody can pronounce it anyhow • https://github.com/Inuits/puppet-icinga/https://github.com/Inuits/puppet-icinga/
  18. 18. Stored ConfigsStored Configs
  19. 19. Collection and ExportCollection and Export Export :Export : @@resource {@@resource { ... }... } Collect:Collect: Resource <<|Resource <<| query |>>query |>> Clean out nodes that dissapearClean out nodes that dissapear puppet node cleanpuppet node clean
  20. 20. Exporting and CollectingExporting and Collecting
  21. 21. Monitoring a VhostMonitoring a Vhost
  22. 22. ● AutodetectionAutodetection ● MultiplexingMultiplexing ● Trend ForecastingTrend Forecasting I love CheckMKI love CheckMK
  23. 23. • Autodetection ?Autodetection ? • Service,Service, • FunctionalitiesFunctionalities • eg. vhosts etceg. vhosts etc • Single Source of TruthSingle Source of Truth I hate CheckMKI hate CheckMK
  24. 24. Monitoring a service vs Monitoring a serviceMonitoring a service vs Monitoring a service
  25. 25. Definition of Done:Definition of Done: monitored and in productionmonitored and in production
  26. 26. A software project is not doneA software project is not done untill your last end user is deaduntill your last end user is dead
  27. 27. Exit DODExit DOD Measure Application UsageMeasure Application Usage
  28. 28. But , err how do I ?But , err how do I ?
  29. 29. Culture,Culture, Automation,Automation, Measurement :Measurement : measure all the thingsmeasure all the things SharingSharing
  30. 30. Deploy StatisticsDeploy Statistics ● Time To DeployTime To Deploy ● DeployDeploy FrequencyFrequency ● LifecycleLifecycle frequencyfrequency ● Map toMap to
  31. 31. Application MetricsApplication Metrics ● Number of current usersNumber of current users ● Number of sign upsNumber of sign ups ● Response timesResponse times ● TroughputTroughput ● XYZ UsageXYZ Usage ● # restarts# restarts ● Insert your specific valuable stuffInsert your specific valuable stuff
  32. 32. Graphite APIGraphite API
  33. 33. Triggers on GraphsTriggers on Graphs ● Export Java MetricsExport Java Metrics ● JMXTransJMXTrans ● Export JMXConfigsExport JMXConfigs ● Configure NRPE CheckConfigure NRPE Check ● Export NagiosCheckExport NagiosCheck ● Collect JMX Exports onCollect JMX Exports on JMXTransNodeJMXTransNode ● Graph EmGraph Em Collect Nagios ConfigsCollect Nagios Configs on Nagios Serveron Nagios Server
  34. 34. Triggers on GraphsTriggers on Graphs
  35. 35. Triggers on GraphsTriggers on Graphs
  36. 36. Self ServiceSelf Service Gdash based pipelinesGdash based pipelines Puppetized Templates (wip)Puppetized Templates (wip)
  37. 37. Up Next:Up Next: • Creating Information out of this dataCreating Information out of this data • Big dataBig data • Machine LearningMachine Learning
  38. 38. HomeworkHomework SkylineSkyline OculusOculus DuskDusk RiemannRiemann EsperEsper Puppetdb externalPuppetdb external NaginatorNaginator
  39. 39. ContactContact Kris.Buytaert@inuits.euKris.Buytaert@inuits.eu Further ReadingFurther Reading @krisbuytaert@krisbuytaert http://www.krisbuytaert.be/blog/http://www.krisbuytaert.be/blog/ http://www.inuits.eu/http://www.inuits.eu/ InuitsInuits Duboistraat 50Duboistraat 50 2060 Antwerpen2060 Antwerpen BelgiumBelgium 891.514.231891.514.231 +32 475 961221+32 475 961221

×