SlideShare a Scribd company logo
1 of 44
Barry Laffoy – Senior DevOps Engineer
Scaling a Monitoring Strategy
For a Microservices Architecture
Thanks to Our Sponsors
http://community.kloia.co.ukJoin Our Community Slack Channel
Monitoring In A Microservices
Environment
Or how to scale your alerting strategy with your team and application
Who AmI?
 Why Should You Listen to me?
 Physics
 Actuarial Science
 Build Engineering
 Experience building and maintaining Excel and
Jenkins
 DevOps at ClearScore
Who Are
ClearScore
 Aim to Solve Money for the World
 Present people their data in a beautiful way, to
empower financial decision making
 Committed to best-in-class technical solutions
 Committed to having fun while we do it
1. What’s the point?
WhatAre
Microservices
Whywouldwe
wantthem?
 12 factor app
 Scalable
 Releasable
 Loggable
 Discoverable
 Monitorable
 And several more “–ables”
 We handle
 Scaling
 Releasing
 Logging
 Discovering
 Monitoring
 Released independently
 Empower ownership
 Distribute risk
MoreReleases
MoreStability
CALMS
 Culture
 Automation
 Lean
 Monitoring
 Sharing
Business
Objectives
 Autonomous Cross Functional Teams
 Increased releases
 Deliver feature more quickly with less risk
 Uptime
ThePlatform
 Drive developer ownership
 Not just a scheduler (Nomad/k8s/etc)
 CI/CD
 Local dev tools
 Cloud native resources
 Logging
 Metrics
 Monitoring
 Alerting
ThePlatform
 Amazon Web Services
 Immutable infrastructure (Packer)
 Infrastructure as code (Terraform)
 Service discovery (Consul)
 Scheduling (Nomad)
 CI/CD (Jenkins)
2. How Hard Is
Monitoring?
Application Performance Monitoring
TraditionalAPM
 Worked great for bare-metal deployment of single Java
app
 Tracing
 Alerts
 Health dashboards
Notsogreatfor
microservices
 Instrumented inside container (not 12 factor)
 Paying for license per process (not scalable)
 Manual configuration of alerting rules
 Limited Language support
 Tracing from service to service very difficult
 Alerting on ”abnormal traffic” limited by simple
statistical model
3. How to move
forward?
Tools,tools,
tools
 Pingdom
 Liveness probes
 CloudWatch
 ElasticSearch
 StatsD, influxdb, grafana
 Next Gen APM, Instana
ThirdParty
Services
 Partner integrations
 Flaky
 Knock-on effects
OffTheShelf
 External Synthetics with PingDom
 Container security scanning with quay.io
 Dependency security scanning with maven/npm
 AMI security scanning with Inspector
 Performance monitoring as part of CI pipeline
 Internal Synthetics with consul-alerts/liveness-readiness probes
Highly
customizable
 Cloud Native with CloudWatch
 Annotating releases in Grafana
 Self managed with statsd
 Infrastructure metrics
 Custom Application Metrics
 Third party integration monitoring
 Alerting rules are “all or nothing”
4. What about APM?
Whatdowe
need?
 Light touch configuration
 Scalable deployment model
 Auto service discovery
 Sensible default alerting rules
 Flexible configuration
 Tracing between asynchronous services
Rollourown
 Already collecting statsd
 Means writing and supporting a lot of logic
 Using some sort of ML?
Traditional
Vendors
 Poor support for distributed microservices
 Poor language support (Scala/akka)
 Mixed results on configurability
EnterInstana
 Discovered quite by accident
 Beautiful UI
 Extremely easy to set up
 Covered most of our desired features out of the box
 Infrastructure monitoring
 Microservice APM
 End-User-Monitoring
5. Culture of Ownership
(It’s not just about tools)
Youbuildityour
runit!
 Delivery teams own their microservices
 Responsible for performance and monitoring in
dev/ci/stg environments
 Ideally, incidents alert to dev team responsible
 Unfortunately, we don’t quite do that
 Sophisticated routing system <picture of me>
Peoplecause
problems
 Things go wrong, when people change things
 Luckily, this means things go wrong during business
hours (mostly)
 Everyone empowered to inspect monitoring tools
 On-call teams supports problem resolution, doesn’t fix
everything
 Understanding teams and services drives platform
improvement
AlertGrooming
 Lots of noise on alert channels
 Alert Fatigue
 ”Boy who cried wolf” syndrome
 Requires proactive maintenance of alerts
 Fix ALL annoying alerts, even if that means fixing the
the alert, not the underlying service
 Investment takes time, but pays dividends in
productivity
MajorIncidents
 Zero blame retros
 Involve stake-holders
 Generate action points with owners (and follow up)
 Detailed incident report with business-friendly
summaries and cost estimates
6. Our Platform Future
Replatforming
 Hashicorp platform
 Great choice to get us to the cloud
 Focused on supporting zillions of containers in HPC
environment
 Limiting our scalability and speed of delivery
 Encouraged anti-pattern of integrating platform details
into services
 Kubernetes migration
 Solves many of our problems
 Natively supports blue-green
 Instana support for cluster health monitoring
 Prometheus on-cluster monitoring
 What to do with our statsd?
Continuous
Deployment2.0
 Investigating CD platforms
Spinnaker/Concourse/Drone
 Routing non-prod alerts to development teams
 Performance, Tracing, Vulnerability issues should be
flagged
GoingGlobal
 Support across timezones
 More and more services
 More and more teams
Serverless
 Functions as a service (on AWS lambda)
 Horizontal auto-scaling
 “No Ops”
 Cheap
 Unsupported by traditional monitoring/tracing
solutions
 X-Ray tracing features with Instana
Thank you
We’re hiring

More Related Content

What's hot

APIdays Paris 2019 - Adopting Service Mesh by Marco Palladino , Kong
APIdays Paris 2019 - Adopting Service Mesh by Marco Palladino , KongAPIdays Paris 2019 - Adopting Service Mesh by Marco Palladino , Kong
APIdays Paris 2019 - Adopting Service Mesh by Marco Palladino , Kongapidays
 
Microsoft: Enterprise search for cloud native applications
Microsoft: Enterprise search for cloud native applicationsMicrosoft: Enterprise search for cloud native applications
Microsoft: Enterprise search for cloud native applicationsElasticsearch
 
apidays LIVE Australia - Building a scalable API platform for an IoT ecosyste...
apidays LIVE Australia - Building a scalable API platform for an IoT ecosyste...apidays LIVE Australia - Building a scalable API platform for an IoT ecosyste...
apidays LIVE Australia - Building a scalable API platform for an IoT ecosyste...apidays
 
apidays LIVE Singapore - Green APIs by Alex-Adrien Auger, Sipios
apidays LIVE Singapore - Green APIs by Alex-Adrien Auger, Sipiosapidays LIVE Singapore - Green APIs by Alex-Adrien Auger, Sipios
apidays LIVE Singapore - Green APIs by Alex-Adrien Auger, Sipiosapidays
 
Observability, Distributed Tracing, and Open Source: The Missing Primer
Observability, Distributed Tracing, and Open Source: The Missing PrimerObservability, Distributed Tracing, and Open Source: The Missing Primer
Observability, Distributed Tracing, and Open Source: The Missing PrimerVMware Tanzu
 
Enterprise DevOps Series: Using VS Code & Zowe
Enterprise DevOps Series: Using VS Code & ZoweEnterprise DevOps Series: Using VS Code & Zowe
Enterprise DevOps Series: Using VS Code & ZoweDevOps.com
 
Addressing the 8 Key Pain Points of Kubernetes Cluster Management
Addressing the 8 Key Pain Points of Kubernetes Cluster ManagementAddressing the 8 Key Pain Points of Kubernetes Cluster Management
Addressing the 8 Key Pain Points of Kubernetes Cluster ManagementEnterprise Management Associates
 
apidays LIVE Singapore 2021 - Protecting the API ecosystem by Omaru Maruatona...
apidays LIVE Singapore 2021 - Protecting the API ecosystem by Omaru Maruatona...apidays LIVE Singapore 2021 - Protecting the API ecosystem by Omaru Maruatona...
apidays LIVE Singapore 2021 - Protecting the API ecosystem by Omaru Maruatona...apidays
 
PuppetConf 2017 | Adobe Advertising Cloud: A Lean Puppet Workflow to Support ...
PuppetConf 2017 | Adobe Advertising Cloud: A Lean Puppet Workflow to Support ...PuppetConf 2017 | Adobe Advertising Cloud: A Lean Puppet Workflow to Support ...
PuppetConf 2017 | Adobe Advertising Cloud: A Lean Puppet Workflow to Support ...Nicolas Brousse
 
Modernizing Your Aging Architecture: What Enterprise Architects Need To Know ...
Modernizing Your Aging Architecture: What Enterprise Architects Need To Know ...Modernizing Your Aging Architecture: What Enterprise Architects Need To Know ...
Modernizing Your Aging Architecture: What Enterprise Architects Need To Know ...Legacy Typesafe (now Lightbend)
 
Agile Tour Pune 2015: Agility with Microservices and Devops: Archana Joshi an...
Agile Tour Pune 2015: Agility with Microservices and Devops: Archana Joshi an...Agile Tour Pune 2015: Agility with Microservices and Devops: Archana Joshi an...
Agile Tour Pune 2015: Agility with Microservices and Devops: Archana Joshi an...India Scrum Enthusiasts Community
 
Henrique Dantas - API fuzzing using Swagger
Henrique Dantas - API fuzzing using SwaggerHenrique Dantas - API fuzzing using Swagger
Henrique Dantas - API fuzzing using SwaggerDevSecCon
 
apidays LIVE Paris - Creating a scalable ecosystem of Microservices by Archan...
apidays LIVE Paris - Creating a scalable ecosystem of Microservices by Archan...apidays LIVE Paris - Creating a scalable ecosystem of Microservices by Archan...
apidays LIVE Paris - Creating a scalable ecosystem of Microservices by Archan...apidays
 
Enable DevSecOps using Jira Software
 Enable DevSecOps using Jira Software Enable DevSecOps using Jira Software
Enable DevSecOps using Jira SoftwareAtlassian
 
Agile Tour Pune 2015: Dev-ops- niche or mainstream: Bhaskar Venugopalan
Agile Tour Pune 2015: Dev-ops- niche or mainstream: Bhaskar VenugopalanAgile Tour Pune 2015: Dev-ops- niche or mainstream: Bhaskar Venugopalan
Agile Tour Pune 2015: Dev-ops- niche or mainstream: Bhaskar VenugopalanIndia Scrum Enthusiasts Community
 
Security architecture best practices for saas applications
Security architecture best practices for saas applicationsSecurity architecture best practices for saas applications
Security architecture best practices for saas applicationskanimozhin
 
Aliaksei Bahachuk - JavaScript and Solution Architecture
Aliaksei Bahachuk - JavaScript and Solution ArchitectureAliaksei Bahachuk - JavaScript and Solution Architecture
Aliaksei Bahachuk - JavaScript and Solution ArchitectureAliaksei Bahachuk
 

What's hot (20)

APIdays Paris 2019 - Adopting Service Mesh by Marco Palladino , Kong
APIdays Paris 2019 - Adopting Service Mesh by Marco Palladino , KongAPIdays Paris 2019 - Adopting Service Mesh by Marco Palladino , Kong
APIdays Paris 2019 - Adopting Service Mesh by Marco Palladino , Kong
 
Microsoft: Enterprise search for cloud native applications
Microsoft: Enterprise search for cloud native applicationsMicrosoft: Enterprise search for cloud native applications
Microsoft: Enterprise search for cloud native applications
 
apidays LIVE Australia - Building a scalable API platform for an IoT ecosyste...
apidays LIVE Australia - Building a scalable API platform for an IoT ecosyste...apidays LIVE Australia - Building a scalable API platform for an IoT ecosyste...
apidays LIVE Australia - Building a scalable API platform for an IoT ecosyste...
 
apidays LIVE Singapore - Green APIs by Alex-Adrien Auger, Sipios
apidays LIVE Singapore - Green APIs by Alex-Adrien Auger, Sipiosapidays LIVE Singapore - Green APIs by Alex-Adrien Auger, Sipios
apidays LIVE Singapore - Green APIs by Alex-Adrien Auger, Sipios
 
Soluciones Dynatrace
Soluciones DynatraceSoluciones Dynatrace
Soluciones Dynatrace
 
Observability, Distributed Tracing, and Open Source: The Missing Primer
Observability, Distributed Tracing, and Open Source: The Missing PrimerObservability, Distributed Tracing, and Open Source: The Missing Primer
Observability, Distributed Tracing, and Open Source: The Missing Primer
 
Enterprise DevOps Series: Using VS Code & Zowe
Enterprise DevOps Series: Using VS Code & ZoweEnterprise DevOps Series: Using VS Code & Zowe
Enterprise DevOps Series: Using VS Code & Zowe
 
Addressing the 8 Key Pain Points of Kubernetes Cluster Management
Addressing the 8 Key Pain Points of Kubernetes Cluster ManagementAddressing the 8 Key Pain Points of Kubernetes Cluster Management
Addressing the 8 Key Pain Points of Kubernetes Cluster Management
 
apidays LIVE Singapore 2021 - Protecting the API ecosystem by Omaru Maruatona...
apidays LIVE Singapore 2021 - Protecting the API ecosystem by Omaru Maruatona...apidays LIVE Singapore 2021 - Protecting the API ecosystem by Omaru Maruatona...
apidays LIVE Singapore 2021 - Protecting the API ecosystem by Omaru Maruatona...
 
PuppetConf 2017 | Adobe Advertising Cloud: A Lean Puppet Workflow to Support ...
PuppetConf 2017 | Adobe Advertising Cloud: A Lean Puppet Workflow to Support ...PuppetConf 2017 | Adobe Advertising Cloud: A Lean Puppet Workflow to Support ...
PuppetConf 2017 | Adobe Advertising Cloud: A Lean Puppet Workflow to Support ...
 
Modernizing Your Aging Architecture: What Enterprise Architects Need To Know ...
Modernizing Your Aging Architecture: What Enterprise Architects Need To Know ...Modernizing Your Aging Architecture: What Enterprise Architects Need To Know ...
Modernizing Your Aging Architecture: What Enterprise Architects Need To Know ...
 
Agile Tour Pune 2015: Agility with Microservices and Devops: Archana Joshi an...
Agile Tour Pune 2015: Agility with Microservices and Devops: Archana Joshi an...Agile Tour Pune 2015: Agility with Microservices and Devops: Archana Joshi an...
Agile Tour Pune 2015: Agility with Microservices and Devops: Archana Joshi an...
 
Henrique Dantas - API fuzzing using Swagger
Henrique Dantas - API fuzzing using SwaggerHenrique Dantas - API fuzzing using Swagger
Henrique Dantas - API fuzzing using Swagger
 
apidays LIVE Paris - Creating a scalable ecosystem of Microservices by Archan...
apidays LIVE Paris - Creating a scalable ecosystem of Microservices by Archan...apidays LIVE Paris - Creating a scalable ecosystem of Microservices by Archan...
apidays LIVE Paris - Creating a scalable ecosystem of Microservices by Archan...
 
Enable DevSecOps using Jira Software
 Enable DevSecOps using Jira Software Enable DevSecOps using Jira Software
Enable DevSecOps using Jira Software
 
Microevent
MicroeventMicroevent
Microevent
 
Architecting SaaS
Architecting SaaSArchitecting SaaS
Architecting SaaS
 
Agile Tour Pune 2015: Dev-ops- niche or mainstream: Bhaskar Venugopalan
Agile Tour Pune 2015: Dev-ops- niche or mainstream: Bhaskar VenugopalanAgile Tour Pune 2015: Dev-ops- niche or mainstream: Bhaskar Venugopalan
Agile Tour Pune 2015: Dev-ops- niche or mainstream: Bhaskar Venugopalan
 
Security architecture best practices for saas applications
Security architecture best practices for saas applicationsSecurity architecture best practices for saas applications
Security architecture best practices for saas applications
 
Aliaksei Bahachuk - JavaScript and Solution Architecture
Aliaksei Bahachuk - JavaScript and Solution ArchitectureAliaksei Bahachuk - JavaScript and Solution Architecture
Aliaksei Bahachuk - JavaScript and Solution Architecture
 

Similar to DevOps Underground - Microservices Monitoring

Introduction to Puppet Enterprise - Jan 30, 2019
Introduction to Puppet Enterprise - Jan 30, 2019Introduction to Puppet Enterprise - Jan 30, 2019
Introduction to Puppet Enterprise - Jan 30, 2019Puppet
 
Auditing in the Cloud
Auditing in the CloudAuditing in the Cloud
Auditing in the Cloudtcarrucan
 
Pete Marshall - casmadrid2015 - Continuous Delivery in Legacy Environments
Pete Marshall - casmadrid2015 - Continuous Delivery in Legacy EnvironmentsPete Marshall - casmadrid2015 - Continuous Delivery in Legacy Environments
Pete Marshall - casmadrid2015 - Continuous Delivery in Legacy EnvironmentsPeter Marshall
 
From Monoliths to Microservices at Realestate.com.au
From Monoliths to Microservices at Realestate.com.auFrom Monoliths to Microservices at Realestate.com.au
From Monoliths to Microservices at Realestate.com.auevanbottcher
 
Cloud Applications Management Nirvana
Cloud Applications Management NirvanaCloud Applications Management Nirvana
Cloud Applications Management NirvanaSeema Jethani
 
Raise the Bar! Reloaded
Raise the Bar! ReloadedRaise the Bar! Reloaded
Raise the Bar! ReloadedCodemotion
 
Securing Your Public Cloud Infrastructure
Securing Your Public Cloud InfrastructureSecuring Your Public Cloud Infrastructure
Securing Your Public Cloud InfrastructureQualys
 
NetFlow Auditor Anomaly Detection Plus Forensics February 2010 08
NetFlow Auditor Anomaly Detection Plus Forensics February 2010 08NetFlow Auditor Anomaly Detection Plus Forensics February 2010 08
NetFlow Auditor Anomaly Detection Plus Forensics February 2010 08NetFlowAuditor
 
Chaos Engineering and Systems Reliability
Chaos Engineering and Systems ReliabilityChaos Engineering and Systems Reliability
Chaos Engineering and Systems ReliabilitySylvain Hellegouarch
 
Peloton Cycle Streaming Live Spin Classes to Thousands with Loggly & AWS
Peloton Cycle  Streaming Live Spin Classes to Thousands with Loggly & AWSPeloton Cycle  Streaming Live Spin Classes to Thousands with Loggly & AWS
Peloton Cycle Streaming Live Spin Classes to Thousands with Loggly & AWSAmazon Web Services
 
Prometheus - Open Source Forum Japan
Prometheus  - Open Source Forum JapanPrometheus  - Open Source Forum Japan
Prometheus - Open Source Forum JapanBrian Brazil
 
WSO2 Integration Platform: Vision and Roadmap
WSO2 Integration Platform: Vision and RoadmapWSO2 Integration Platform: Vision and Roadmap
WSO2 Integration Platform: Vision and RoadmapWSO2
 
Cyber Crime Conference 2017 - DFLabs Supervised Active Intelligence - Andrea ...
Cyber Crime Conference 2017 - DFLabs Supervised Active Intelligence - Andrea ...Cyber Crime Conference 2017 - DFLabs Supervised Active Intelligence - Andrea ...
Cyber Crime Conference 2017 - DFLabs Supervised Active Intelligence - Andrea ...DFLABS SRL
 
Enterprise platform 3.0v4 for webinar
Enterprise platform 3.0v4 for webinarEnterprise platform 3.0v4 for webinar
Enterprise platform 3.0v4 for webinarJohn Mathon
 
Grafana overview deck - Tech - 2023 May v1.pdf
Grafana overview deck  - Tech - 2023 May v1.pdfGrafana overview deck  - Tech - 2023 May v1.pdf
Grafana overview deck - Tech - 2023 May v1.pdfBillySin5
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
DevOps at Scale: How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
DevOps at Scale:  How Datadog is using AWS and PagerDuty to Keep Pace with Gr...DevOps at Scale:  How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
DevOps at Scale: How Datadog is using AWS and PagerDuty to Keep Pace with Gr...Amazon Web Services
 
Critical Considerations for Continuous Delivery 04.09.2018
Critical Considerations for Continuous Delivery 04.09.2018Critical Considerations for Continuous Delivery 04.09.2018
Critical Considerations for Continuous Delivery 04.09.2018Claire Priester Papas
 

Similar to DevOps Underground - Microservices Monitoring (20)

Introduction to Puppet Enterprise - Jan 30, 2019
Introduction to Puppet Enterprise - Jan 30, 2019Introduction to Puppet Enterprise - Jan 30, 2019
Introduction to Puppet Enterprise - Jan 30, 2019
 
Auditing in the Cloud
Auditing in the CloudAuditing in the Cloud
Auditing in the Cloud
 
Pete Marshall - casmadrid2015 - Continuous Delivery in Legacy Environments
Pete Marshall - casmadrid2015 - Continuous Delivery in Legacy EnvironmentsPete Marshall - casmadrid2015 - Continuous Delivery in Legacy Environments
Pete Marshall - casmadrid2015 - Continuous Delivery in Legacy Environments
 
From Monoliths to Microservices at Realestate.com.au
From Monoliths to Microservices at Realestate.com.auFrom Monoliths to Microservices at Realestate.com.au
From Monoliths to Microservices at Realestate.com.au
 
Cloud Applications Management Nirvana
Cloud Applications Management NirvanaCloud Applications Management Nirvana
Cloud Applications Management Nirvana
 
Raise the Bar! Reloaded
Raise the Bar! ReloadedRaise the Bar! Reloaded
Raise the Bar! Reloaded
 
Raise the bar! Reloaded
Raise the bar! ReloadedRaise the bar! Reloaded
Raise the bar! Reloaded
 
Securing Your Public Cloud Infrastructure
Securing Your Public Cloud InfrastructureSecuring Your Public Cloud Infrastructure
Securing Your Public Cloud Infrastructure
 
NetFlow Auditor Anomaly Detection Plus Forensics February 2010 08
NetFlow Auditor Anomaly Detection Plus Forensics February 2010 08NetFlow Auditor Anomaly Detection Plus Forensics February 2010 08
NetFlow Auditor Anomaly Detection Plus Forensics February 2010 08
 
Introduction to DevOps
Introduction to DevOpsIntroduction to DevOps
Introduction to DevOps
 
Chaos Engineering and Systems Reliability
Chaos Engineering and Systems ReliabilityChaos Engineering and Systems Reliability
Chaos Engineering and Systems Reliability
 
Peloton Cycle Streaming Live Spin Classes to Thousands with Loggly & AWS
Peloton Cycle  Streaming Live Spin Classes to Thousands with Loggly & AWSPeloton Cycle  Streaming Live Spin Classes to Thousands with Loggly & AWS
Peloton Cycle Streaming Live Spin Classes to Thousands with Loggly & AWS
 
Prometheus - Open Source Forum Japan
Prometheus  - Open Source Forum JapanPrometheus  - Open Source Forum Japan
Prometheus - Open Source Forum Japan
 
WSO2 Integration Platform: Vision and Roadmap
WSO2 Integration Platform: Vision and RoadmapWSO2 Integration Platform: Vision and Roadmap
WSO2 Integration Platform: Vision and Roadmap
 
Cyber Crime Conference 2017 - DFLabs Supervised Active Intelligence - Andrea ...
Cyber Crime Conference 2017 - DFLabs Supervised Active Intelligence - Andrea ...Cyber Crime Conference 2017 - DFLabs Supervised Active Intelligence - Andrea ...
Cyber Crime Conference 2017 - DFLabs Supervised Active Intelligence - Andrea ...
 
Enterprise platform 3.0v4 for webinar
Enterprise platform 3.0v4 for webinarEnterprise platform 3.0v4 for webinar
Enterprise platform 3.0v4 for webinar
 
Grafana overview deck - Tech - 2023 May v1.pdf
Grafana overview deck  - Tech - 2023 May v1.pdfGrafana overview deck  - Tech - 2023 May v1.pdf
Grafana overview deck - Tech - 2023 May v1.pdf
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
DevOps at Scale: How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
DevOps at Scale:  How Datadog is using AWS and PagerDuty to Keep Pace with Gr...DevOps at Scale:  How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
DevOps at Scale: How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
 
Critical Considerations for Continuous Delivery 04.09.2018
Critical Considerations for Continuous Delivery 04.09.2018Critical Considerations for Continuous Delivery 04.09.2018
Critical Considerations for Continuous Delivery 04.09.2018
 

More from kloia

re:Invent recap - Application Modernization
re:Invent recap - Application Modernizationre:Invent recap - Application Modernization
re:Invent recap - Application Modernizationkloia
 
Isovalent-kloia Cilium Workshop
Isovalent-kloia Cilium WorkshopIsovalent-kloia Cilium Workshop
Isovalent-kloia Cilium Workshopkloia
 
Kloia - Why Microsoft Modernisation Matters
Kloia - Why Microsoft Modernisation MattersKloia - Why Microsoft Modernisation Matters
Kloia - Why Microsoft Modernisation Matterskloia
 
DotNetKonf23 - NET Modernization Problems & Solutions.pdf
DotNetKonf23 - NET Modernization Problems & Solutions.pdfDotNetKonf23 - NET Modernization Problems & Solutions.pdf
DotNetKonf23 - NET Modernization Problems & Solutions.pdfkloia
 
AWS User Group Meetup Feb2023.pptx
AWS User Group Meetup Feb2023.pptxAWS User Group Meetup Feb2023.pptx
AWS User Group Meetup Feb2023.pptxkloia
 
re:Invent Recap
re:Invent Recapre:Invent Recap
re:Invent Recapkloia
 
The New era in QA: k6
The New era in QA: k6The New era in QA: k6
The New era in QA: k6kloia
 
Etkili Blog Yazım Teknikleri - Tuğba Sertkaya
Etkili Blog Yazım Teknikleri - Tuğba SertkayaEtkili Blog Yazım Teknikleri - Tuğba Sertkaya
Etkili Blog Yazım Teknikleri - Tuğba Sertkayakloia
 
AWS re:Invent 2021 Recap by APN Ambassador
AWS re:Invent 2021 Recap by APN AmbassadorAWS re:Invent 2021 Recap by APN Ambassador
AWS re:Invent 2021 Recap by APN Ambassadorkloia
 
Camunda BPM - Said Mengi
Camunda BPM - Said MengiCamunda BPM - Said Mengi
Camunda BPM - Said Mengikloia
 
AlOps - Yetişkan Eliaçık
AlOps - Yetişkan EliaçıkAlOps - Yetişkan Eliaçık
AlOps - Yetişkan Eliaçıkkloia
 
Zaman Yönetimi - Aras Bilgen
Zaman Yönetimi - Aras Bilgen Zaman Yönetimi - Aras Bilgen
Zaman Yönetimi - Aras Bilgen kloia
 
Gravitee API Management - Ahmet AYDIN
 Gravitee API Management  -  Ahmet AYDIN Gravitee API Management  -  Ahmet AYDIN
Gravitee API Management - Ahmet AYDINkloia
 
React Bootcamp Day 2 - Yunus Demirpolat
React Bootcamp Day 2 - Yunus DemirpolatReact Bootcamp Day 2 - Yunus Demirpolat
React Bootcamp Day 2 - Yunus Demirpolatkloia
 
React Bootcamp Day 1 - Yunus Demirpolat
React Bootcamp Day 1 - Yunus DemirpolatReact Bootcamp Day 1 - Yunus Demirpolat
React Bootcamp Day 1 - Yunus Demirpolatkloia
 
Contract testing - Baran Gayretli
Contract testing - Baran Gayretli Contract testing - Baran Gayretli
Contract testing - Baran Gayretli kloia
 
Contract Testing
Contract TestingContract Testing
Contract Testingkloia
 
Using Design Methods to Establish Healthy DevOps Practices - Aras Bilgen
Using Design Methods to Establish Healthy DevOps Practices - Aras BilgenUsing Design Methods to Establish Healthy DevOps Practices - Aras Bilgen
Using Design Methods to Establish Healthy DevOps Practices - Aras Bilgenkloia
 
Kloia Quality Assurance
Kloia Quality AssuranceKloia Quality Assurance
Kloia Quality Assurancekloia
 
DevOps Turkey Test Automation with Docker and Seleniumhub
DevOps Turkey Test Automation with Docker and SeleniumhubDevOps Turkey Test Automation with Docker and Seleniumhub
DevOps Turkey Test Automation with Docker and Seleniumhubkloia
 

More from kloia (20)

re:Invent recap - Application Modernization
re:Invent recap - Application Modernizationre:Invent recap - Application Modernization
re:Invent recap - Application Modernization
 
Isovalent-kloia Cilium Workshop
Isovalent-kloia Cilium WorkshopIsovalent-kloia Cilium Workshop
Isovalent-kloia Cilium Workshop
 
Kloia - Why Microsoft Modernisation Matters
Kloia - Why Microsoft Modernisation MattersKloia - Why Microsoft Modernisation Matters
Kloia - Why Microsoft Modernisation Matters
 
DotNetKonf23 - NET Modernization Problems & Solutions.pdf
DotNetKonf23 - NET Modernization Problems & Solutions.pdfDotNetKonf23 - NET Modernization Problems & Solutions.pdf
DotNetKonf23 - NET Modernization Problems & Solutions.pdf
 
AWS User Group Meetup Feb2023.pptx
AWS User Group Meetup Feb2023.pptxAWS User Group Meetup Feb2023.pptx
AWS User Group Meetup Feb2023.pptx
 
re:Invent Recap
re:Invent Recapre:Invent Recap
re:Invent Recap
 
The New era in QA: k6
The New era in QA: k6The New era in QA: k6
The New era in QA: k6
 
Etkili Blog Yazım Teknikleri - Tuğba Sertkaya
Etkili Blog Yazım Teknikleri - Tuğba SertkayaEtkili Blog Yazım Teknikleri - Tuğba Sertkaya
Etkili Blog Yazım Teknikleri - Tuğba Sertkaya
 
AWS re:Invent 2021 Recap by APN Ambassador
AWS re:Invent 2021 Recap by APN AmbassadorAWS re:Invent 2021 Recap by APN Ambassador
AWS re:Invent 2021 Recap by APN Ambassador
 
Camunda BPM - Said Mengi
Camunda BPM - Said MengiCamunda BPM - Said Mengi
Camunda BPM - Said Mengi
 
AlOps - Yetişkan Eliaçık
AlOps - Yetişkan EliaçıkAlOps - Yetişkan Eliaçık
AlOps - Yetişkan Eliaçık
 
Zaman Yönetimi - Aras Bilgen
Zaman Yönetimi - Aras Bilgen Zaman Yönetimi - Aras Bilgen
Zaman Yönetimi - Aras Bilgen
 
Gravitee API Management - Ahmet AYDIN
 Gravitee API Management  -  Ahmet AYDIN Gravitee API Management  -  Ahmet AYDIN
Gravitee API Management - Ahmet AYDIN
 
React Bootcamp Day 2 - Yunus Demirpolat
React Bootcamp Day 2 - Yunus DemirpolatReact Bootcamp Day 2 - Yunus Demirpolat
React Bootcamp Day 2 - Yunus Demirpolat
 
React Bootcamp Day 1 - Yunus Demirpolat
React Bootcamp Day 1 - Yunus DemirpolatReact Bootcamp Day 1 - Yunus Demirpolat
React Bootcamp Day 1 - Yunus Demirpolat
 
Contract testing - Baran Gayretli
Contract testing - Baran Gayretli Contract testing - Baran Gayretli
Contract testing - Baran Gayretli
 
Contract Testing
Contract TestingContract Testing
Contract Testing
 
Using Design Methods to Establish Healthy DevOps Practices - Aras Bilgen
Using Design Methods to Establish Healthy DevOps Practices - Aras BilgenUsing Design Methods to Establish Healthy DevOps Practices - Aras Bilgen
Using Design Methods to Establish Healthy DevOps Practices - Aras Bilgen
 
Kloia Quality Assurance
Kloia Quality AssuranceKloia Quality Assurance
Kloia Quality Assurance
 
DevOps Turkey Test Automation with Docker and Seleniumhub
DevOps Turkey Test Automation with Docker and SeleniumhubDevOps Turkey Test Automation with Docker and Seleniumhub
DevOps Turkey Test Automation with Docker and Seleniumhub
 

Recently uploaded

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

DevOps Underground - Microservices Monitoring

  • 1. Barry Laffoy – Senior DevOps Engineer Scaling a Monitoring Strategy For a Microservices Architecture Thanks to Our Sponsors http://community.kloia.co.ukJoin Our Community Slack Channel
  • 2. Monitoring In A Microservices Environment Or how to scale your alerting strategy with your team and application
  • 3. Who AmI?  Why Should You Listen to me?  Physics  Actuarial Science  Build Engineering  Experience building and maintaining Excel and Jenkins  DevOps at ClearScore
  • 4. Who Are ClearScore  Aim to Solve Money for the World  Present people their data in a beautiful way, to empower financial decision making  Committed to best-in-class technical solutions  Committed to having fun while we do it
  • 5.
  • 8.
  • 9. Whywouldwe wantthem?  12 factor app  Scalable  Releasable  Loggable  Discoverable  Monitorable  And several more “–ables”  We handle  Scaling  Releasing  Logging  Discovering  Monitoring  Released independently  Empower ownership  Distribute risk
  • 11. CALMS  Culture  Automation  Lean  Monitoring  Sharing
  • 12. Business Objectives  Autonomous Cross Functional Teams  Increased releases  Deliver feature more quickly with less risk  Uptime
  • 13. ThePlatform  Drive developer ownership  Not just a scheduler (Nomad/k8s/etc)  CI/CD  Local dev tools  Cloud native resources  Logging  Metrics  Monitoring  Alerting
  • 14. ThePlatform  Amazon Web Services  Immutable infrastructure (Packer)  Infrastructure as code (Terraform)  Service discovery (Consul)  Scheduling (Nomad)  CI/CD (Jenkins)
  • 15. 2. How Hard Is Monitoring? Application Performance Monitoring
  • 16. TraditionalAPM  Worked great for bare-metal deployment of single Java app  Tracing  Alerts  Health dashboards
  • 17. Notsogreatfor microservices  Instrumented inside container (not 12 factor)  Paying for license per process (not scalable)  Manual configuration of alerting rules  Limited Language support  Tracing from service to service very difficult  Alerting on ”abnormal traffic” limited by simple statistical model
  • 18. 3. How to move forward?
  • 19. Tools,tools, tools  Pingdom  Liveness probes  CloudWatch  ElasticSearch  StatsD, influxdb, grafana  Next Gen APM, Instana
  • 20.
  • 22. OffTheShelf  External Synthetics with PingDom  Container security scanning with quay.io  Dependency security scanning with maven/npm  AMI security scanning with Inspector  Performance monitoring as part of CI pipeline  Internal Synthetics with consul-alerts/liveness-readiness probes
  • 23. Highly customizable  Cloud Native with CloudWatch  Annotating releases in Grafana  Self managed with statsd  Infrastructure metrics  Custom Application Metrics  Third party integration monitoring  Alerting rules are “all or nothing”
  • 25. Whatdowe need?  Light touch configuration  Scalable deployment model  Auto service discovery  Sensible default alerting rules  Flexible configuration  Tracing between asynchronous services
  • 26. Rollourown  Already collecting statsd  Means writing and supporting a lot of logic  Using some sort of ML?
  • 27. Traditional Vendors  Poor support for distributed microservices  Poor language support (Scala/akka)  Mixed results on configurability
  • 28. EnterInstana  Discovered quite by accident  Beautiful UI  Extremely easy to set up  Covered most of our desired features out of the box  Infrastructure monitoring  Microservice APM  End-User-Monitoring
  • 29.
  • 30.
  • 31.
  • 32. 5. Culture of Ownership (It’s not just about tools)
  • 33. Youbuildityour runit!  Delivery teams own their microservices  Responsible for performance and monitoring in dev/ci/stg environments  Ideally, incidents alert to dev team responsible  Unfortunately, we don’t quite do that  Sophisticated routing system <picture of me>
  • 34.
  • 35. Peoplecause problems  Things go wrong, when people change things  Luckily, this means things go wrong during business hours (mostly)  Everyone empowered to inspect monitoring tools  On-call teams supports problem resolution, doesn’t fix everything  Understanding teams and services drives platform improvement
  • 36. AlertGrooming  Lots of noise on alert channels  Alert Fatigue  ”Boy who cried wolf” syndrome  Requires proactive maintenance of alerts  Fix ALL annoying alerts, even if that means fixing the the alert, not the underlying service  Investment takes time, but pays dividends in productivity
  • 37. MajorIncidents  Zero blame retros  Involve stake-holders  Generate action points with owners (and follow up)  Detailed incident report with business-friendly summaries and cost estimates
  • 38. 6. Our Platform Future
  • 39. Replatforming  Hashicorp platform  Great choice to get us to the cloud  Focused on supporting zillions of containers in HPC environment  Limiting our scalability and speed of delivery  Encouraged anti-pattern of integrating platform details into services  Kubernetes migration  Solves many of our problems  Natively supports blue-green  Instana support for cluster health monitoring  Prometheus on-cluster monitoring  What to do with our statsd?
  • 40. Continuous Deployment2.0  Investigating CD platforms Spinnaker/Concourse/Drone  Routing non-prod alerts to development teams  Performance, Tracing, Vulnerability issues should be flagged
  • 41. GoingGlobal  Support across timezones  More and more services  More and more teams
  • 42. Serverless  Functions as a service (on AWS lambda)  Horizontal auto-scaling  “No Ops”  Cheap  Unsupported by traditional monitoring/tracing solutions  X-Ray tracing features with Instana