SlideShare a Scribd company logo
1 of 38
Observability for Modern
Applications.
- How does it help track KPIs & react faster?
About me
Abhinav Kapoor
• Principal Engineer, Nagarro
• Work as a Technical Architect for .NET Solutions.
• 15+ years of technical experience in architecting & developing solutions for Banking, Supply chain &
Embedded domains.
• I enjoy writing about software architecture, communication patterns, modernization & more on
Medium.com https://medium.com/@kapoorabhinav.
Accreditations
I‘m available on Linked at
https://www.linkedin.com/in/abhinav-kapoor
0394456/
What we’ll cover
1. What is observability ?
2. Why do we need observability?
3. How does it fit in a cloud native landscape?
4. Application Instrumentation – The pillars for getting insights
1. Logs
2. Traces
3. Metrics
4. Health Check
5. Future of Observability - OpenTelemetry
1. What is it & why is it needed?
2. Its components.
What is observability ?
Sir Lewis Carl Davidson
Hamilton
- 7-time F1 champion
Steering Wheel with controls and dashboard
- Image credit Mercedes X
What is observability ?
● Moving from Formula 1 to Modern Software applications
○ Complex systems – Lot of moving parts
○ High stakes
○ Cutting edge engineering
○ Fast Reaction
○ Several teams with different specializations at work
● Observability is the ability to observe what is going on inside the system.
● It can be considered as an NFR which helps measures other NFRs
Why do we need Observability?
● We want to avoid these reactions
Why do we need Observability?
● Issue Timeline in Operation Support
Detect Identify Fix Deploy
Issue
Start
Issue
End
Mean Time To Repair (MTTR)
Mean Time To Identify
(MTTI)
Mean Time To Detect (MTTD)
Why do we need Observability?
● How does it impact KPIs
○ Faster Issue Detection and Resolution
○ Enhanced System Performance
○ Proactive Capacity Planning
○ Security and Compliance
○ Better User Experience
○ Data-Driven Decision-Making
○ Continuous Improvement
Why is Observability more critical now?
● Evolution to smaller but more services - high cohesion and low coupling.
“Complexities arises when the dependencies among the elements become important”
- We have reduced code complexity but introduced much more moving parts
Why is Observability more critical now?
Technology transformation trends
Microservices Polyglot Applications Ephemeral & cloud native environments
How does it fit in a cloud native
landscape?
How does it fit in a cloud native landscape?
Instrument Capture Transport Analyze Act
SDK, Agent
Scrapping/
Agent
Dashboard Alerts / Notifications
How does it fit in a cloud native landscape?
Active Directory / IAM
Certificate Manager
Resources – DB, Broker, API Gateway
Service A
Sidecar
Platform / Orchestrator
Service B
Sidecar
Shared
Services,
Infrastructure
Applications
Audit logs,
Activity logs,
Metrics
Central Aggregation
& Collection
Application logs,
metrics, traces Monitoring & Alerts
Access Control
Deep Dive - Application
Instrumentation
Application Instrumentation
● Logs – What happened & who did it ?
● Traces – How did it happen ?
● Metrics – How much happened ?
● Health – How is the system doing ?
Big picture
● We don’t just want to have connected black boxes.
● While infrastructure & sidecars can give a good overview, there are always
blind spots.
● It speeds up development.
Logs, metrics,
Traces, Health
Logs, metrics,
Traces, Health
Logs, metrics,
Traces, Health
Logs, metrics,
Traces, Health
Logs – What Happened ?
Logs – Components of Logging
Logging Framework
Log Message
- Severity, tags
Application
SDK
Filter based on Severity
- Destination Specific
Log Providers
ILogger
Types of Logs
● Based on Information they contain – Governs where they are stored, how long
they are stored, who can access them
○ Technical – Startup configurations, system event that the application receives
○ Business - Business and User Information
● Based on the schema
○ Unstructured – Simple statements with severity.
○ Structured / Schematic logs - Logs with Metadata to provide context. Pod, Service, IP address
& more.
Duality of Logs - Readable messages &
Quarriable metrics
Traces - How did it happen ?
Traces – Types of traces
● Tracing – This is similar to logging. But noisier and collects more information
from deeper parts of the application.
● Distributed Trancing – It’s the method to observe requests as they propagate
through different services.
○ Its structured log which comes from different processes, nodes and can be stitched together
to give an end to end view.
Distributed Tracing
Trace Collector
Services Web API Broker Background Services
Traces
HTTP Request
with tracing
context
TCP message
with tracing
context
TCP message
with tracing
context
DB
Store tracing
context with
data in db
Services
Read tracing
context
• Correlation IDs are passed in the remote calls/messages.
• The receiver regenerates context proceeds with its unit of work.
• Depending upon business transaction, traces can be of few seconds to hours or days.
Metrics–How much is happening?
Metrics – Components
Vendor specific
Instrumentation &
collection
Metrics-
Name, description, Labels &
Value
Application
Backend – time series database
- Adds timestamp
- Supports aggregation (eg PromQL)
SDK
/Metrics
scrape
Visualization
PromQL expressions
Alerts
Types of Prometheus Metrics - Counters
Track the occurrence of an event in application.
● The absolute value may not be helpful.
● The delta between two timestamps or the rate of change over time is helpful. Functions like Increase(),
Rate().
● Consider the case when applications restart.
● Examples - Orders processed counter, unsuccessful counters, business error, technical errors.
● Sample output of meta-endpoint (/metrics)
# HELP http_requests_total Total number of http api requests
# TYPE http_requests_total
counter http_requests_total{api=“add_Product“, company=“ITC”} 3433
counter http_requests_total{api=“add_Product“, company =“TATA”} 2000
PromQL
increase(http_requests_total{api="add_product“, company=“ITC”}[5m])
Types of Prometheus Metrics - Gauges
Snapshots of a metric at a single point in time. Not an event.
● Example is message queue size, CPU utilization at a given time.
● Useful functions - avg_over_time, max_over_time,
min_over_time, and quantile_over_time
Types of Prometheus Metrics - Histograms
● Histograms divide the entire range of measurements into a set of intervals—
named buckets—and count how many measurements fall into each bucket.
● These buckets are defined at compile time, they are upper inclusive (le).
● Histogram looks like the following
http_request_duration_seconds_sum{api="add_product" instance=" host1 "} 8953.332
http_request_duration_seconds_count{api="add_product" instance=" host1"} 27892
http_request_duration_seconds_bucket{api="add_product", instance=" host1", le="0.05"}
1672
http_request_duration_seconds_bucket{api="add_product", instance=" host1", le="0.1"} 8954
http_request_duration_seconds_bucket{api="add_product", instance=" host1", le="0.25"}
14251
sum by (le) (rate(http_request_duration_seconds_bucket[5m]))
Types of Prometheus Metrics - Summaries
● Similar to Histogram but they add quantile label to bound the summary.
● OpenTelemetry has been marked it legacy.
● Summary looks like the following
http_request_duration_seconds_sum{api="add_product" instance="host1.domain.com"} 8953.332
http_request_duration_seconds_count{api="add_product" instance="host1.domain.com"} 27892
http_request_duration_seconds{api="add_product" instance=" host1 " quantile="0"}
http_request_duration_seconds{api="add_product" instance="host1" quantile="0.5"} 0.232227334
● Considerations due to quantile.
Metrics - What happens after instrumentation?
● The orchestrator can enrich the application instrumentation with
infrastructure information such as Service Name, Pod, IP address.
● The backend saves it with timestamp
● Add Alerts & dashboards.
Health Check– How is the system
doing ?
Health check
● Basic use is liveness probe.
● More useful when we add dependencies.
● Prebuilt libraries depending upon the language
● Considerations
○ Orchestrator may restart the service while the issue is in a
dependency.
○ Checking all downstream services could be time consuming.
○ Meta-end points (/health) may not be exposed. Instead of
writing to the response metrics may be used to indicate
issue.
OpenTelemetry
What is OpenTelemetry ?
● It is an open-source observability framework for infrastructure & application
instrumentation.
● It’s the second-highest ranked CNCF project by activity and contributors after
K8s
● It aims to be vendor agnostic & the SDK is produced by OpenTelemetry.
● Its still young so not all features are available for all languages.
● As Open Telemetry doesn’t provide a backend implementation (its concern is
creating, collecting, and sending signals), the data will flow to another system
or systems for storage and querying.
Why OpenTelemetry?
● Before Open Telemetry
○ APM vendors had their own things – SDKs,
Agents, Grammer
○ Vendor lock-in.
○ Application developers, built wrappers,
event hooks to keep application isolated.
○ Competing Open standards – OpenTracing
(Traces), OpenCensus (Metrics, Traces)
OpenTelemetry Library -
Instrumentation (Automatic & Manual)
Exporters
Exporter
- Backends/APM
Application
Trace, metrics, Logs
SDK
Components of OpenTelemetry
Collector
Multiple receivers
Multiple exporters
Questions ?
Thank you!

More Related Content

Similar to Agile Gurugram 2023 | Observability for Modern Applications. How does it help track KPIs & react faster? - Abhinav Kapoor

MuleSoft Manchester Meetup #2 slides 29th October 2019
MuleSoft Manchester Meetup #2 slides 29th October 2019MuleSoft Manchester Meetup #2 slides 29th October 2019
MuleSoft Manchester Meetup #2 slides 29th October 2019Ieva Navickaite
 
Data Ops at TripActions
Data Ops at TripActionsData Ops at TripActions
Data Ops at TripActionsRob Winters
 
New Splunk Management Solutions Update: Splunk MINT and Splunk App for Stream
New Splunk Management Solutions Update: Splunk MINT and Splunk App for Stream New Splunk Management Solutions Update: Splunk MINT and Splunk App for Stream
New Splunk Management Solutions Update: Splunk MINT and Splunk App for Stream Splunk
 
Splunk App for Stream for Enhanced Operational Intelligence from Wire Data
Splunk App for Stream for Enhanced Operational Intelligence from Wire DataSplunk App for Stream for Enhanced Operational Intelligence from Wire Data
Splunk App for Stream for Enhanced Operational Intelligence from Wire DataSplunk
 
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16AppDynamics
 
Go Observability (in practice)
Go Observability (in practice)Go Observability (in practice)
Go Observability (in practice)Eran Levy
 
Webinar: Leveraging New Technologies with Migration
Webinar: Leveraging New Technologies with MigrationWebinar: Leveraging New Technologies with Migration
Webinar: Leveraging New Technologies with Migrationpanagenda
 
MuleSoft Manchester Meetup #4 slides 11th February 2021
MuleSoft Manchester Meetup #4 slides 11th February 2021MuleSoft Manchester Meetup #4 slides 11th February 2021
MuleSoft Manchester Meetup #4 slides 11th February 2021Ieva Navickaite
 
Tracing-for-fun-and-profit.pptx
Tracing-for-fun-and-profit.pptxTracing-for-fun-and-profit.pptx
Tracing-for-fun-and-profit.pptxHai Nguyen Duy
 
SplunkLive! Zurich 2018: Monitoring the End User Experience with Splunk
SplunkLive! Zurich 2018: Monitoring the End User Experience with SplunkSplunkLive! Zurich 2018: Monitoring the End User Experience with Splunk
SplunkLive! Zurich 2018: Monitoring the End User Experience with SplunkSplunk
 
Ledingkart Meetup #1: Monolithic to microservices in action
Ledingkart Meetup #1: Monolithic to microservices in actionLedingkart Meetup #1: Monolithic to microservices in action
Ledingkart Meetup #1: Monolithic to microservices in actionMukesh Singh
 
SplunkLive! Munich 2018: Monitoring the End-User Experience with Splunk
SplunkLive! Munich 2018: Monitoring the End-User Experience with SplunkSplunkLive! Munich 2018: Monitoring the End-User Experience with Splunk
SplunkLive! Munich 2018: Monitoring the End-User Experience with SplunkSplunk
 
Lanka government cloud: what, why & how?
Lanka government cloud: what, why & how?Lanka government cloud: what, why & how?
Lanka government cloud: what, why & how?Wasantha Deshapriya
 
SplunkLive! London - Splunk App for Stream & MINT Breakout
SplunkLive! London - Splunk App for Stream & MINT BreakoutSplunkLive! London - Splunk App for Stream & MINT Breakout
SplunkLive! London - Splunk App for Stream & MINT BreakoutSplunk
 
Elevate your Splunk Deployment by Better Understanding your Value Breakfast S...
Elevate your Splunk Deployment by Better Understanding your Value Breakfast S...Elevate your Splunk Deployment by Better Understanding your Value Breakfast S...
Elevate your Splunk Deployment by Better Understanding your Value Breakfast S...Splunk
 
Why we decided on RSA Security Analytics for network visibility
Why we decided on RSA Security Analytics for network visibilityWhy we decided on RSA Security Analytics for network visibility
Why we decided on RSA Security Analytics for network visibilityRecruit Technologies
 
OSMC 2023 | Journey to observability: tracking every function execution in pr...
OSMC 2023 | Journey to observability: tracking every function execution in pr...OSMC 2023 | Journey to observability: tracking every function execution in pr...
OSMC 2023 | Journey to observability: tracking every function execution in pr...NETWAYS
 
Top 8 Trends in Performance Engineering
Top 8 Trends in Performance EngineeringTop 8 Trends in Performance Engineering
Top 8 Trends in Performance EngineeringConvetit
 

Similar to Agile Gurugram 2023 | Observability for Modern Applications. How does it help track KPIs & react faster? - Abhinav Kapoor (20)

MuleSoft Manchester Meetup #2 slides 29th October 2019
MuleSoft Manchester Meetup #2 slides 29th October 2019MuleSoft Manchester Meetup #2 slides 29th October 2019
MuleSoft Manchester Meetup #2 slides 29th October 2019
 
Data Ops at TripActions
Data Ops at TripActionsData Ops at TripActions
Data Ops at TripActions
 
New Splunk Management Solutions Update: Splunk MINT and Splunk App for Stream
New Splunk Management Solutions Update: Splunk MINT and Splunk App for Stream New Splunk Management Solutions Update: Splunk MINT and Splunk App for Stream
New Splunk Management Solutions Update: Splunk MINT and Splunk App for Stream
 
Splunk App for Stream for Enhanced Operational Intelligence from Wire Data
Splunk App for Stream for Enhanced Operational Intelligence from Wire DataSplunk App for Stream for Enhanced Operational Intelligence from Wire Data
Splunk App for Stream for Enhanced Operational Intelligence from Wire Data
 
Thick client application security assessment
Thick client  application security assessmentThick client  application security assessment
Thick client application security assessment
 
UpdatedResume
UpdatedResumeUpdatedResume
UpdatedResume
 
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
 
Go Observability (in practice)
Go Observability (in practice)Go Observability (in practice)
Go Observability (in practice)
 
Webinar: Leveraging New Technologies with Migration
Webinar: Leveraging New Technologies with MigrationWebinar: Leveraging New Technologies with Migration
Webinar: Leveraging New Technologies with Migration
 
MuleSoft Manchester Meetup #4 slides 11th February 2021
MuleSoft Manchester Meetup #4 slides 11th February 2021MuleSoft Manchester Meetup #4 slides 11th February 2021
MuleSoft Manchester Meetup #4 slides 11th February 2021
 
Tracing-for-fun-and-profit.pptx
Tracing-for-fun-and-profit.pptxTracing-for-fun-and-profit.pptx
Tracing-for-fun-and-profit.pptx
 
SplunkLive! Zurich 2018: Monitoring the End User Experience with Splunk
SplunkLive! Zurich 2018: Monitoring the End User Experience with SplunkSplunkLive! Zurich 2018: Monitoring the End User Experience with Splunk
SplunkLive! Zurich 2018: Monitoring the End User Experience with Splunk
 
Ledingkart Meetup #1: Monolithic to microservices in action
Ledingkart Meetup #1: Monolithic to microservices in actionLedingkart Meetup #1: Monolithic to microservices in action
Ledingkart Meetup #1: Monolithic to microservices in action
 
SplunkLive! Munich 2018: Monitoring the End-User Experience with Splunk
SplunkLive! Munich 2018: Monitoring the End-User Experience with SplunkSplunkLive! Munich 2018: Monitoring the End-User Experience with Splunk
SplunkLive! Munich 2018: Monitoring the End-User Experience with Splunk
 
Lanka government cloud: what, why & how?
Lanka government cloud: what, why & how?Lanka government cloud: what, why & how?
Lanka government cloud: what, why & how?
 
SplunkLive! London - Splunk App for Stream & MINT Breakout
SplunkLive! London - Splunk App for Stream & MINT BreakoutSplunkLive! London - Splunk App for Stream & MINT Breakout
SplunkLive! London - Splunk App for Stream & MINT Breakout
 
Elevate your Splunk Deployment by Better Understanding your Value Breakfast S...
Elevate your Splunk Deployment by Better Understanding your Value Breakfast S...Elevate your Splunk Deployment by Better Understanding your Value Breakfast S...
Elevate your Splunk Deployment by Better Understanding your Value Breakfast S...
 
Why we decided on RSA Security Analytics for network visibility
Why we decided on RSA Security Analytics for network visibilityWhy we decided on RSA Security Analytics for network visibility
Why we decided on RSA Security Analytics for network visibility
 
OSMC 2023 | Journey to observability: tracking every function execution in pr...
OSMC 2023 | Journey to observability: tracking every function execution in pr...OSMC 2023 | Journey to observability: tracking every function execution in pr...
OSMC 2023 | Journey to observability: tracking every function execution in pr...
 
Top 8 Trends in Performance Engineering
Top 8 Trends in Performance EngineeringTop 8 Trends in Performance Engineering
Top 8 Trends in Performance Engineering
 

More from AgileNetwork

ANIn Chennai April 2024 |Agile Engineering: Modernizing Legacy Systems by Ana...
ANIn Chennai April 2024 |Agile Engineering: Modernizing Legacy Systems by Ana...ANIn Chennai April 2024 |Agile Engineering: Modernizing Legacy Systems by Ana...
ANIn Chennai April 2024 |Agile Engineering: Modernizing Legacy Systems by Ana...AgileNetwork
 
ANIn Chennai April 2024 |Beyond Big Bang: Technical Agility in Vintage Produc...
ANIn Chennai April 2024 |Beyond Big Bang: Technical Agility in Vintage Produc...ANIn Chennai April 2024 |Beyond Big Bang: Technical Agility in Vintage Produc...
ANIn Chennai April 2024 |Beyond Big Bang: Technical Agility in Vintage Produc...AgileNetwork
 
ANIn Gurugram April 2024 |Agile Adaptation: Driving Progress in Generative AI...
ANIn Gurugram April 2024 |Agile Adaptation: Driving Progress in Generative AI...ANIn Gurugram April 2024 |Agile Adaptation: Driving Progress in Generative AI...
ANIn Gurugram April 2024 |Agile Adaptation: Driving Progress in Generative AI...AgileNetwork
 
ANIn Noida Oct 2023 |AI Usage in Agile Transformation Journey by Kunal
ANIn Noida Oct 2023 |AI Usage in Agile Transformation Journey by KunalANIn Noida Oct 2023 |AI Usage in Agile Transformation Journey by Kunal
ANIn Noida Oct 2023 |AI Usage in Agile Transformation Journey by KunalAgileNetwork
 
ANIn Kolkata April 2024 |Ethics of AI by Abhishek Nandy
ANIn Kolkata April 2024 |Ethics of AI by Abhishek NandyANIn Kolkata April 2024 |Ethics of AI by Abhishek Nandy
ANIn Kolkata April 2024 |Ethics of AI by Abhishek NandyAgileNetwork
 
ANIn Kolkata April 2024 | AI Enabled Reflection in Agile Delivery by Indranil...
ANIn Kolkata April 2024 | AI Enabled Reflection in Agile Delivery by Indranil...ANIn Kolkata April 2024 | AI Enabled Reflection in Agile Delivery by Indranil...
ANIn Kolkata April 2024 | AI Enabled Reflection in Agile Delivery by Indranil...AgileNetwork
 
ANIn Gurugram April 2024 |Can Agile and AI work together? by Pramodkumar Shri...
ANIn Gurugram April 2024 |Can Agile and AI work together? by Pramodkumar Shri...ANIn Gurugram April 2024 |Can Agile and AI work together? by Pramodkumar Shri...
ANIn Gurugram April 2024 |Can Agile and AI work together? by Pramodkumar Shri...AgileNetwork
 
ANIn Pune April 2024 |L&D Accelerating business growth by Mukta Nalke
ANIn Pune April 2024 |L&D Accelerating business growth by Mukta NalkeANIn Pune April 2024 |L&D Accelerating business growth by Mukta Nalke
ANIn Pune April 2024 |L&D Accelerating business growth by Mukta NalkeAgileNetwork
 
ANIn Pune April 2024 | Meeting Modern Learning Needs with Innovation by Ankit...
ANIn Pune April 2024 | Meeting Modern Learning Needs with Innovation by Ankit...ANIn Pune April 2024 | Meeting Modern Learning Needs with Innovation by Ankit...
ANIn Pune April 2024 | Meeting Modern Learning Needs with Innovation by Ankit...AgileNetwork
 
ANIn Ahmedabad April 2024 | Powering Big Wins with Small, Agile Teams by Yoge...
ANIn Ahmedabad April 2024 | Powering Big Wins with Small, Agile Teams by Yoge...ANIn Ahmedabad April 2024 | Powering Big Wins with Small, Agile Teams by Yoge...
ANIn Ahmedabad April 2024 | Powering Big Wins with Small, Agile Teams by Yoge...AgileNetwork
 
ANIn Coimbatore March 2024 | Unlocking Agility with Gen AI by Balaprasanna S
ANIn Coimbatore March 2024 | Unlocking Agility with Gen AI by Balaprasanna SANIn Coimbatore March 2024 | Unlocking Agility with Gen AI by Balaprasanna S
ANIn Coimbatore March 2024 | Unlocking Agility with Gen AI by Balaprasanna SAgileNetwork
 
ANIn Coimbatore March 2024 | Agile & AI in Project Management by Dhilipkumar ...
ANIn Coimbatore March 2024 | Agile & AI in Project Management by Dhilipkumar ...ANIn Coimbatore March 2024 | Agile & AI in Project Management by Dhilipkumar ...
ANIn Coimbatore March 2024 | Agile & AI in Project Management by Dhilipkumar ...AgileNetwork
 
ANIn Chennai March 2024 |Conversational AI for Agility in Healthcare by Shine...
ANIn Chennai March 2024 |Conversational AI for Agility in Healthcare by Shine...ANIn Chennai March 2024 |Conversational AI for Agility in Healthcare by Shine...
ANIn Chennai March 2024 |Conversational AI for Agility in Healthcare by Shine...AgileNetwork
 
ANIn Chennai March 2024 |Oxygenating AI ecosystem with Agility by Gowtham Bal...
ANIn Chennai March 2024 |Oxygenating AI ecosystem with Agility by Gowtham Bal...ANIn Chennai March 2024 |Oxygenating AI ecosystem with Agility by Gowtham Bal...
ANIn Chennai March 2024 |Oxygenating AI ecosystem with Agility by Gowtham Bal...AgileNetwork
 
ANIn Ahmedabad March 2024 | The Power of Retrospection by Rakesh Mehta
ANIn Ahmedabad March 2024 | The Power of Retrospection by Rakesh MehtaANIn Ahmedabad March 2024 | The Power of Retrospection by Rakesh Mehta
ANIn Ahmedabad March 2024 | The Power of Retrospection by Rakesh MehtaAgileNetwork
 
ANIn Pune March 2024 | Customer Stratification for Business Growth by Manish ...
ANIn Pune March 2024 | Customer Stratification for Business Growth by Manish ...ANIn Pune March 2024 | Customer Stratification for Business Growth by Manish ...
ANIn Pune March 2024 | Customer Stratification for Business Growth by Manish ...AgileNetwork
 
ANIn Coimbatore July 2023 | Business Agility in Data Science by Dr.Selvaraaju...
ANIn Coimbatore July 2023 | Business Agility in Data Science by Dr.Selvaraaju...ANIn Coimbatore July 2023 | Business Agility in Data Science by Dr.Selvaraaju...
ANIn Coimbatore July 2023 | Business Agility in Data Science by Dr.Selvaraaju...AgileNetwork
 
ANIn Coimbatore May 2023 | Agile and Beyond by Nithya Sitharam
ANIn Coimbatore May 2023 | Agile and Beyond by Nithya SitharamANIn Coimbatore May 2023 | Agile and Beyond by Nithya Sitharam
ANIn Coimbatore May 2023 | Agile and Beyond by Nithya SitharamAgileNetwork
 
ANIn Hyderabad Jun 2023 |Humanizing Agile Transformation Beyond Process and T...
ANIn Hyderabad Jun 2023 |Humanizing Agile Transformation Beyond Process and T...ANIn Hyderabad Jun 2023 |Humanizing Agile Transformation Beyond Process and T...
ANIn Hyderabad Jun 2023 |Humanizing Agile Transformation Beyond Process and T...AgileNetwork
 
ANIn Coimbatore Jul 2023 |The Importance of Business Agility in the Current L...
ANIn Coimbatore Jul 2023 |The Importance of Business Agility in the Current L...ANIn Coimbatore Jul 2023 |The Importance of Business Agility in the Current L...
ANIn Coimbatore Jul 2023 |The Importance of Business Agility in the Current L...AgileNetwork
 

More from AgileNetwork (20)

ANIn Chennai April 2024 |Agile Engineering: Modernizing Legacy Systems by Ana...
ANIn Chennai April 2024 |Agile Engineering: Modernizing Legacy Systems by Ana...ANIn Chennai April 2024 |Agile Engineering: Modernizing Legacy Systems by Ana...
ANIn Chennai April 2024 |Agile Engineering: Modernizing Legacy Systems by Ana...
 
ANIn Chennai April 2024 |Beyond Big Bang: Technical Agility in Vintage Produc...
ANIn Chennai April 2024 |Beyond Big Bang: Technical Agility in Vintage Produc...ANIn Chennai April 2024 |Beyond Big Bang: Technical Agility in Vintage Produc...
ANIn Chennai April 2024 |Beyond Big Bang: Technical Agility in Vintage Produc...
 
ANIn Gurugram April 2024 |Agile Adaptation: Driving Progress in Generative AI...
ANIn Gurugram April 2024 |Agile Adaptation: Driving Progress in Generative AI...ANIn Gurugram April 2024 |Agile Adaptation: Driving Progress in Generative AI...
ANIn Gurugram April 2024 |Agile Adaptation: Driving Progress in Generative AI...
 
ANIn Noida Oct 2023 |AI Usage in Agile Transformation Journey by Kunal
ANIn Noida Oct 2023 |AI Usage in Agile Transformation Journey by KunalANIn Noida Oct 2023 |AI Usage in Agile Transformation Journey by Kunal
ANIn Noida Oct 2023 |AI Usage in Agile Transformation Journey by Kunal
 
ANIn Kolkata April 2024 |Ethics of AI by Abhishek Nandy
ANIn Kolkata April 2024 |Ethics of AI by Abhishek NandyANIn Kolkata April 2024 |Ethics of AI by Abhishek Nandy
ANIn Kolkata April 2024 |Ethics of AI by Abhishek Nandy
 
ANIn Kolkata April 2024 | AI Enabled Reflection in Agile Delivery by Indranil...
ANIn Kolkata April 2024 | AI Enabled Reflection in Agile Delivery by Indranil...ANIn Kolkata April 2024 | AI Enabled Reflection in Agile Delivery by Indranil...
ANIn Kolkata April 2024 | AI Enabled Reflection in Agile Delivery by Indranil...
 
ANIn Gurugram April 2024 |Can Agile and AI work together? by Pramodkumar Shri...
ANIn Gurugram April 2024 |Can Agile and AI work together? by Pramodkumar Shri...ANIn Gurugram April 2024 |Can Agile and AI work together? by Pramodkumar Shri...
ANIn Gurugram April 2024 |Can Agile and AI work together? by Pramodkumar Shri...
 
ANIn Pune April 2024 |L&D Accelerating business growth by Mukta Nalke
ANIn Pune April 2024 |L&D Accelerating business growth by Mukta NalkeANIn Pune April 2024 |L&D Accelerating business growth by Mukta Nalke
ANIn Pune April 2024 |L&D Accelerating business growth by Mukta Nalke
 
ANIn Pune April 2024 | Meeting Modern Learning Needs with Innovation by Ankit...
ANIn Pune April 2024 | Meeting Modern Learning Needs with Innovation by Ankit...ANIn Pune April 2024 | Meeting Modern Learning Needs with Innovation by Ankit...
ANIn Pune April 2024 | Meeting Modern Learning Needs with Innovation by Ankit...
 
ANIn Ahmedabad April 2024 | Powering Big Wins with Small, Agile Teams by Yoge...
ANIn Ahmedabad April 2024 | Powering Big Wins with Small, Agile Teams by Yoge...ANIn Ahmedabad April 2024 | Powering Big Wins with Small, Agile Teams by Yoge...
ANIn Ahmedabad April 2024 | Powering Big Wins with Small, Agile Teams by Yoge...
 
ANIn Coimbatore March 2024 | Unlocking Agility with Gen AI by Balaprasanna S
ANIn Coimbatore March 2024 | Unlocking Agility with Gen AI by Balaprasanna SANIn Coimbatore March 2024 | Unlocking Agility with Gen AI by Balaprasanna S
ANIn Coimbatore March 2024 | Unlocking Agility with Gen AI by Balaprasanna S
 
ANIn Coimbatore March 2024 | Agile & AI in Project Management by Dhilipkumar ...
ANIn Coimbatore March 2024 | Agile & AI in Project Management by Dhilipkumar ...ANIn Coimbatore March 2024 | Agile & AI in Project Management by Dhilipkumar ...
ANIn Coimbatore March 2024 | Agile & AI in Project Management by Dhilipkumar ...
 
ANIn Chennai March 2024 |Conversational AI for Agility in Healthcare by Shine...
ANIn Chennai March 2024 |Conversational AI for Agility in Healthcare by Shine...ANIn Chennai March 2024 |Conversational AI for Agility in Healthcare by Shine...
ANIn Chennai March 2024 |Conversational AI for Agility in Healthcare by Shine...
 
ANIn Chennai March 2024 |Oxygenating AI ecosystem with Agility by Gowtham Bal...
ANIn Chennai March 2024 |Oxygenating AI ecosystem with Agility by Gowtham Bal...ANIn Chennai March 2024 |Oxygenating AI ecosystem with Agility by Gowtham Bal...
ANIn Chennai March 2024 |Oxygenating AI ecosystem with Agility by Gowtham Bal...
 
ANIn Ahmedabad March 2024 | The Power of Retrospection by Rakesh Mehta
ANIn Ahmedabad March 2024 | The Power of Retrospection by Rakesh MehtaANIn Ahmedabad March 2024 | The Power of Retrospection by Rakesh Mehta
ANIn Ahmedabad March 2024 | The Power of Retrospection by Rakesh Mehta
 
ANIn Pune March 2024 | Customer Stratification for Business Growth by Manish ...
ANIn Pune March 2024 | Customer Stratification for Business Growth by Manish ...ANIn Pune March 2024 | Customer Stratification for Business Growth by Manish ...
ANIn Pune March 2024 | Customer Stratification for Business Growth by Manish ...
 
ANIn Coimbatore July 2023 | Business Agility in Data Science by Dr.Selvaraaju...
ANIn Coimbatore July 2023 | Business Agility in Data Science by Dr.Selvaraaju...ANIn Coimbatore July 2023 | Business Agility in Data Science by Dr.Selvaraaju...
ANIn Coimbatore July 2023 | Business Agility in Data Science by Dr.Selvaraaju...
 
ANIn Coimbatore May 2023 | Agile and Beyond by Nithya Sitharam
ANIn Coimbatore May 2023 | Agile and Beyond by Nithya SitharamANIn Coimbatore May 2023 | Agile and Beyond by Nithya Sitharam
ANIn Coimbatore May 2023 | Agile and Beyond by Nithya Sitharam
 
ANIn Hyderabad Jun 2023 |Humanizing Agile Transformation Beyond Process and T...
ANIn Hyderabad Jun 2023 |Humanizing Agile Transformation Beyond Process and T...ANIn Hyderabad Jun 2023 |Humanizing Agile Transformation Beyond Process and T...
ANIn Hyderabad Jun 2023 |Humanizing Agile Transformation Beyond Process and T...
 
ANIn Coimbatore Jul 2023 |The Importance of Business Agility in the Current L...
ANIn Coimbatore Jul 2023 |The Importance of Business Agility in the Current L...ANIn Coimbatore Jul 2023 |The Importance of Business Agility in the Current L...
ANIn Coimbatore Jul 2023 |The Importance of Business Agility in the Current L...
 

Recently uploaded

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 

Recently uploaded (20)

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 

Agile Gurugram 2023 | Observability for Modern Applications. How does it help track KPIs & react faster? - Abhinav Kapoor

  • 1. Observability for Modern Applications. - How does it help track KPIs & react faster?
  • 2. About me Abhinav Kapoor • Principal Engineer, Nagarro • Work as a Technical Architect for .NET Solutions. • 15+ years of technical experience in architecting & developing solutions for Banking, Supply chain & Embedded domains. • I enjoy writing about software architecture, communication patterns, modernization & more on Medium.com https://medium.com/@kapoorabhinav. Accreditations I‘m available on Linked at https://www.linkedin.com/in/abhinav-kapoor 0394456/
  • 3. What we’ll cover 1. What is observability ? 2. Why do we need observability? 3. How does it fit in a cloud native landscape? 4. Application Instrumentation – The pillars for getting insights 1. Logs 2. Traces 3. Metrics 4. Health Check 5. Future of Observability - OpenTelemetry 1. What is it & why is it needed? 2. Its components.
  • 4. What is observability ? Sir Lewis Carl Davidson Hamilton - 7-time F1 champion Steering Wheel with controls and dashboard - Image credit Mercedes X
  • 5. What is observability ? ● Moving from Formula 1 to Modern Software applications ○ Complex systems – Lot of moving parts ○ High stakes ○ Cutting edge engineering ○ Fast Reaction ○ Several teams with different specializations at work ● Observability is the ability to observe what is going on inside the system. ● It can be considered as an NFR which helps measures other NFRs
  • 6. Why do we need Observability? ● We want to avoid these reactions
  • 7. Why do we need Observability? ● Issue Timeline in Operation Support Detect Identify Fix Deploy Issue Start Issue End Mean Time To Repair (MTTR) Mean Time To Identify (MTTI) Mean Time To Detect (MTTD)
  • 8. Why do we need Observability? ● How does it impact KPIs ○ Faster Issue Detection and Resolution ○ Enhanced System Performance ○ Proactive Capacity Planning ○ Security and Compliance ○ Better User Experience ○ Data-Driven Decision-Making ○ Continuous Improvement
  • 9. Why is Observability more critical now? ● Evolution to smaller but more services - high cohesion and low coupling. “Complexities arises when the dependencies among the elements become important” - We have reduced code complexity but introduced much more moving parts
  • 10. Why is Observability more critical now? Technology transformation trends Microservices Polyglot Applications Ephemeral & cloud native environments
  • 11. How does it fit in a cloud native landscape?
  • 12. How does it fit in a cloud native landscape? Instrument Capture Transport Analyze Act SDK, Agent Scrapping/ Agent Dashboard Alerts / Notifications
  • 13. How does it fit in a cloud native landscape? Active Directory / IAM Certificate Manager Resources – DB, Broker, API Gateway Service A Sidecar Platform / Orchestrator Service B Sidecar Shared Services, Infrastructure Applications Audit logs, Activity logs, Metrics Central Aggregation & Collection Application logs, metrics, traces Monitoring & Alerts Access Control
  • 14. Deep Dive - Application Instrumentation
  • 15. Application Instrumentation ● Logs – What happened & who did it ? ● Traces – How did it happen ? ● Metrics – How much happened ? ● Health – How is the system doing ?
  • 16. Big picture ● We don’t just want to have connected black boxes. ● While infrastructure & sidecars can give a good overview, there are always blind spots. ● It speeds up development. Logs, metrics, Traces, Health Logs, metrics, Traces, Health Logs, metrics, Traces, Health Logs, metrics, Traces, Health
  • 17. Logs – What Happened ?
  • 18. Logs – Components of Logging Logging Framework Log Message - Severity, tags Application SDK Filter based on Severity - Destination Specific Log Providers ILogger
  • 19. Types of Logs ● Based on Information they contain – Governs where they are stored, how long they are stored, who can access them ○ Technical – Startup configurations, system event that the application receives ○ Business - Business and User Information ● Based on the schema ○ Unstructured – Simple statements with severity. ○ Structured / Schematic logs - Logs with Metadata to provide context. Pod, Service, IP address & more.
  • 20. Duality of Logs - Readable messages & Quarriable metrics
  • 21. Traces - How did it happen ?
  • 22. Traces – Types of traces ● Tracing – This is similar to logging. But noisier and collects more information from deeper parts of the application. ● Distributed Trancing – It’s the method to observe requests as they propagate through different services. ○ Its structured log which comes from different processes, nodes and can be stitched together to give an end to end view.
  • 23. Distributed Tracing Trace Collector Services Web API Broker Background Services Traces HTTP Request with tracing context TCP message with tracing context TCP message with tracing context DB Store tracing context with data in db Services Read tracing context • Correlation IDs are passed in the remote calls/messages. • The receiver regenerates context proceeds with its unit of work. • Depending upon business transaction, traces can be of few seconds to hours or days.
  • 24. Metrics–How much is happening?
  • 25. Metrics – Components Vendor specific Instrumentation & collection Metrics- Name, description, Labels & Value Application Backend – time series database - Adds timestamp - Supports aggregation (eg PromQL) SDK /Metrics scrape Visualization PromQL expressions Alerts
  • 26. Types of Prometheus Metrics - Counters Track the occurrence of an event in application. ● The absolute value may not be helpful. ● The delta between two timestamps or the rate of change over time is helpful. Functions like Increase(), Rate(). ● Consider the case when applications restart. ● Examples - Orders processed counter, unsuccessful counters, business error, technical errors. ● Sample output of meta-endpoint (/metrics) # HELP http_requests_total Total number of http api requests # TYPE http_requests_total counter http_requests_total{api=“add_Product“, company=“ITC”} 3433 counter http_requests_total{api=“add_Product“, company =“TATA”} 2000 PromQL increase(http_requests_total{api="add_product“, company=“ITC”}[5m])
  • 27. Types of Prometheus Metrics - Gauges Snapshots of a metric at a single point in time. Not an event. ● Example is message queue size, CPU utilization at a given time. ● Useful functions - avg_over_time, max_over_time, min_over_time, and quantile_over_time
  • 28. Types of Prometheus Metrics - Histograms ● Histograms divide the entire range of measurements into a set of intervals— named buckets—and count how many measurements fall into each bucket. ● These buckets are defined at compile time, they are upper inclusive (le). ● Histogram looks like the following http_request_duration_seconds_sum{api="add_product" instance=" host1 "} 8953.332 http_request_duration_seconds_count{api="add_product" instance=" host1"} 27892 http_request_duration_seconds_bucket{api="add_product", instance=" host1", le="0.05"} 1672 http_request_duration_seconds_bucket{api="add_product", instance=" host1", le="0.1"} 8954 http_request_duration_seconds_bucket{api="add_product", instance=" host1", le="0.25"} 14251 sum by (le) (rate(http_request_duration_seconds_bucket[5m]))
  • 29. Types of Prometheus Metrics - Summaries ● Similar to Histogram but they add quantile label to bound the summary. ● OpenTelemetry has been marked it legacy. ● Summary looks like the following http_request_duration_seconds_sum{api="add_product" instance="host1.domain.com"} 8953.332 http_request_duration_seconds_count{api="add_product" instance="host1.domain.com"} 27892 http_request_duration_seconds{api="add_product" instance=" host1 " quantile="0"} http_request_duration_seconds{api="add_product" instance="host1" quantile="0.5"} 0.232227334 ● Considerations due to quantile.
  • 30. Metrics - What happens after instrumentation? ● The orchestrator can enrich the application instrumentation with infrastructure information such as Service Name, Pod, IP address. ● The backend saves it with timestamp ● Add Alerts & dashboards.
  • 31. Health Check– How is the system doing ?
  • 32. Health check ● Basic use is liveness probe. ● More useful when we add dependencies. ● Prebuilt libraries depending upon the language ● Considerations ○ Orchestrator may restart the service while the issue is in a dependency. ○ Checking all downstream services could be time consuming. ○ Meta-end points (/health) may not be exposed. Instead of writing to the response metrics may be used to indicate issue.
  • 34. What is OpenTelemetry ? ● It is an open-source observability framework for infrastructure & application instrumentation. ● It’s the second-highest ranked CNCF project by activity and contributors after K8s ● It aims to be vendor agnostic & the SDK is produced by OpenTelemetry. ● Its still young so not all features are available for all languages. ● As Open Telemetry doesn’t provide a backend implementation (its concern is creating, collecting, and sending signals), the data will flow to another system or systems for storage and querying.
  • 35. Why OpenTelemetry? ● Before Open Telemetry ○ APM vendors had their own things – SDKs, Agents, Grammer ○ Vendor lock-in. ○ Application developers, built wrappers, event hooks to keep application isolated. ○ Competing Open standards – OpenTracing (Traces), OpenCensus (Metrics, Traces)
  • 36. OpenTelemetry Library - Instrumentation (Automatic & Manual) Exporters Exporter - Backends/APM Application Trace, metrics, Logs SDK Components of OpenTelemetry Collector Multiple receivers Multiple exporters

Editor's Notes

  1. We’ll take the questions at the end.
  2. Monitoring is collecting metrics & logs Observability is the ability to observe what is going on inside the system. Consider it an NFR.
  3. MTTD -> metrics or synthetic tests MTTI -> Traces
  4. Emergence for micro-services – Lot more Story of a certificate expiring somewhere instead of a single server.
  5. Everything writing to one file. Or few files is lot more simple. Lot more moving parts. Speed of development
  6. Conceptual View
  7. Its happening at multiple levels
  8. Logs – important events. State changes, who did. Trances – Are flow within the program or outside the program. Matrics – are numbers which can be aggregated to have formulas – SLOs, KPI & alerts Health – current state of health
  9. Automatic & manual – instrumentation
  10. Cost!!
  11. All services should have it.
  12. Do consider the situation when it turns to zero. https://grafana.com/grafana/dashboards/ including ones from OPSTREE
  13. May be used in cases where strict KPIs are enforced on duration for example
  14. https://grafana.com/grafana/dashboards/ Which help me detect issue & fix the issues faster. Which help management find datapoints. Which can report compliance & SLO.
  15. Alternatives 1. Have a monitoring on solution 2. cache the results
  16. Receivers can be used in migration
  17. As a developer, you don’t really need to worry about the models—but having a basic understanding helps.