SlideShare a Scribd company logo
1 of 37
Observability for Application
Developers
About me
Abhinav Kapoor
• Principal Engineer, Nagarro
• Work as a Technical Architect for .NET Solutions.
• 15+ years of technical experience in architecting & developing solutions for Banking, Supply chain &
Embedded domains.
• I enjoy writing about software architecture, communication patterns, modernization & more on
Medium.com https://medium.com/@kapoorabhinav.
Accreditations
I‘m available on Linked at
https://www.linkedin.com/in/abhinav-kapoor
0394456/
What we’ll cover
1. Why do we need Observability?
2. How does it fit in a cloud native landscape?
3. Application Instrumentation
1. Logs
2. Traces
3. metrics
4. Health Check
4. OpenTelemetry
1. What is it & why is it needed?
2. Its components.
3. Its signals
Why do we need Observability?
● We want to avoid these reactions
Why do we need Observability?
● Issue Timeline in Operation Support
Detect Identify Fix Deploy
Issue
Start
Issue
End
Mean Time To Repair (MTTR)
Mean Time To Identify
(MTTI)
Mean Time To Detect (MTTD)
Why is Observability more critical now?
● Evolution to smaller but more services - high cohesion and low coupling.
“Complexities arises when the dependencies among the elements become important”
- We have reduced code complexity but introduced much more moving parts
Why is Observability more critical now?
Technology transformation trends
Microservices Polyglot Applications Ephemeral & cloud native environments
How does it fit in a cloud native
landscape?
How does it fit in a cloud native landscape?
Instrument Capture Transport Analyze Act
SDK, Agent
Scrapping/
Agent
Dashboard Alerts / Notifications
How does it fit in a cloud native landscape?
Active Directory / IAM
Certificate Manager
Resources – DB, Broker, API Gateway
Service A
Sidecar
Platform / Orchestrator
Service B
Sidecar
Shared
Services,
Infrastructure
Applications
Audit logs,
Activity logs,
Metrics
Central Aggregation
& Collection
Application logs,
metrics, traces Monitoring & Alerts
Access Control
Application Instrumentation –
What should I know & do as a
developer ?
Application Instrumentation
● Logs – What happened & who did it ?
● Traces – How did it happen ?
● Metrics – How much happened ?
● Health – How is the system doing ?
Big picture
● We don’t just want to have connected black boxes.
● While infrastructure & sidecars can give a good overview, there are always
blind spots.
● It speeds up development.
Logs, metrics,
Traces, Health
Logs, metrics,
Traces, Health
Logs, metrics,
Traces, Health
Logs, metrics,
Traces, Health
Logs – What Happened ?
Logs – Components of Logging
Logging Framework
Log Message
- Severity, tags
Application
SDK
Filter based on Severity
- Destination Specific
Log Providers
ILogger
Types of Logs
● Based on Information they contain – Governs where they are stored, how long
they are stored, who can access them
○ Technical – Startup configurations, system event that the application receives
○ Business - Business and User Information
● Based on the schema
○ Unstructured – Simple statements with severity.
○ Structured / Schematic logs - Logs with Metadata to provide context. Pod, Service, IP address
& more.
Duality of Logs - Readable messages &
Quarriable metrics
Traces - How did it happen ?
Traces – Types of traces
● Tracing – This is similar to logging. But noisier and collects more information
from deeper parts of the application.
● Distributed Trancing – It’s the method to observe requests as they propagate
through different services.
○ Its structured log which comes from different processes, nodes and can be stitched together
to give an end to end view.
Distributed Tracing
Trace Collector
Services Web API Broker Background Services
Traces
HTTP Request
with tracing
context
TCP message
with tracing
context
TCP message
with tracing
context
DB
Store tracing
context with
data in db
Services
Read tracing
context
• Correlation IDs are passed in the remote calls/messages.
• The receiver regenerates context proceeds with its unit of work.
• Depending upon business transaction, traces can be of few seconds to hours or days.
Metrics–How much is happening?
Metrics – Components
Vendor specific
Instrumentation &
collection
Metrics-
Name, description, Labels &
Value
Application
Backend – time series database
- Adds timestamp
- Supports aggregation (eg PromQL)
SDK
/Metrics
scrape
Visualization
PromQL expressions
Alerts
Types of Prometheus Metrics - Counters
Track the occurrence of an event in application.
● The absolute value may not be helpful.
● The delta between two timestamps or the rate of change over time is helpful. Functions like Increase(),
Rate().
● Consider the case when applications restart.
● Examples - Orders processed counter, unsuccessful counters, business error, technical errors.
● Sample output of meta-endpoint (/metrics)
# HELP http_requests_total Total number of http api requests
# TYPE http_requests_total
counter http_requests_total{api=“add_Product“, company=“ITC”} 3433
counter http_requests_total{api=“add_Product“, company =“TATA”} 2000
PromQL
increase(http_requests_total{api="add_product“, company=“ITC”}[5m])
Types of Prometheus Metrics - Gauges
Snapshots of a metric at a single point in time. Not an event.
● Example is message queue size, CPU utilization at a given time.
● Useful functions - avg_over_time, max_over_time,
min_over_time, and quantile_over_time
Types of Prometheus Metrics - Histograms
● Histograms divide the entire range of measurements into a set of intervals—
named buckets—and count how many measurements fall into each bucket.
● These buckets are defined at compile time, they are upper inclusive (le).
● Histogram looks like the following
http_request_duration_seconds_sum{api="add_product" instance=" host1 "} 8953.332
http_request_duration_seconds_count{api="add_product" instance=" host1"} 27892
http_request_duration_seconds_bucket{api="add_product", instance=" host1", le="0.05"}
1672
http_request_duration_seconds_bucket{api="add_product", instance=" host1", le="0.1"} 8954
http_request_duration_seconds_bucket{api="add_product", instance=" host1", le="0.25"}
14251
sum by (le) (rate(http_request_duration_seconds_bucket[5m]))
Types of Prometheus Metrics - Summaries
● Similar to Histogram but they add quantile label to bound the summary.
● OpenTelemetry has been marked it legacy.
● Summary looks like the following
http_request_duration_seconds_sum{api="add_product" instance="host1.domain.com"} 8953.332
http_request_duration_seconds_count{api="add_product" instance="host1.domain.com"} 27892
http_request_duration_seconds{api="add_product" instance=" host1 " quantile="0"}
http_request_duration_seconds{api="add_product" instance="host1" quantile="0.5"} 0.232227334
● Considerations due to quantile.
Metrics - What happens after instrumentation?
● Enrichthe application instrumentation with infrastructure information such as
Service Name, Pod, IP address.
● The backend saves it with timestamp
● Add Alerts & dashboards. Help available at
https://grafana.com/grafana/dashboards/
Health Check– How is the system
doing ?
Health check
● Basic use is liveness probe.
● More useful when we add dependencies.
● Prebuilt libraries depending upon the language
● Considerations
○ Orchestrator may restart the service while the issue is in a
dependency.
○ Checking all downstream services could be time consuming.
○ Meta-end points (/health) may not be exposed. Instead of
writing to the response metrics may be used to indicate
issue.
OpenTelemetry
What is OpenTelemetry ?
● It is an open-source observability framework for infrastructure & application
instrumentation.
● It’s the second-highest ranked CNCF project by activity and contributors after
K8s
● It aims to be vendor agnostic & the SDK is produced by OpenTelemetry.
● Its still young so not all features are available for all languages.
● As Open Telemetry doesn’t provide a backend implementation (its concern is
creating, collecting, and sending signals), the data will flow to another system
or systems for storage and querying.
Why OpenTelemetry?
● Before Open Telemetry
○ APM vendors had their own things – SDKs,
Agents, Grammer
○ Vendor lock-in.
○ Application developers, built wrappers,
event hooks to keep application isolated.
○ Competing Open standards – OpenTracing
(Traces), OpenCensus (Metrics, Traces)
OpenTelemetry Library -
Instrumentation (Automatic & Manual)
Exporters
Exporter
- Backends/APM
Application
Trace, metrics, Logs
SDK
Components of OpenTelemetry
Collector
Multiple receivers
Multiple exporters
Signals of OpenTelemetry - Metrics
● Differences from Prometheus metrics
○ Summary is marked deprecated
○ Some more functions & data type flexibility
Signals of OpenTelemetry -Tracing
Data model of an OpenTelemetry Span
Example tracing for a news article.
Source - https://www.timescale.com/blog/what-are-traces-and-how-sql-yes-sql-
and-opentelemetry-can-help-us-get-more-value-out-of-traces-to-build-better-
software/
Questions ?
Thank you!

More Related Content

Similar to Observability for Application Developers (1)-1.pptx

Go Observability (in practice)
Go Observability (in practice)Go Observability (in practice)
Go Observability (in practice)Eran Levy
 
Dot Net performance monitoring
 Dot Net performance monitoring Dot Net performance monitoring
Dot Net performance monitoringKranthi Paidi
 
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdf
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdfPrometheus-Grafana-RahulSoni1584KnolX.pptx.pdf
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdfKnoldus Inc.
 
Major Project Report on Designing an Android Application for Electrical Maint...
Major Project Report on Designing an Android Application for Electrical Maint...Major Project Report on Designing an Android Application for Electrical Maint...
Major Project Report on Designing an Android Application for Electrical Maint...Amit Kumar
 
SplunkLive! Zurich 2018: Monitoring the End User Experience with Splunk
SplunkLive! Zurich 2018: Monitoring the End User Experience with SplunkSplunkLive! Zurich 2018: Monitoring the End User Experience with Splunk
SplunkLive! Zurich 2018: Monitoring the End User Experience with SplunkSplunk
 
Why we decided on RSA Security Analytics for network visibility
Why we decided on RSA Security Analytics for network visibilityWhy we decided on RSA Security Analytics for network visibility
Why we decided on RSA Security Analytics for network visibilityRecruit Technologies
 
SplunkLive! Munich 2018: Monitoring the End-User Experience with Splunk
SplunkLive! Munich 2018: Monitoring the End-User Experience with SplunkSplunkLive! Munich 2018: Monitoring the End-User Experience with Splunk
SplunkLive! Munich 2018: Monitoring the End-User Experience with SplunkSplunk
 
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Piyush Kumar
 
Splunk MINT for Mobile Intelligence and Splunk App for Stream for Enhanced Op...
Splunk MINT for Mobile Intelligence and Splunk App for Stream for Enhanced Op...Splunk MINT for Mobile Intelligence and Splunk App for Stream for Enhanced Op...
Splunk MINT for Mobile Intelligence and Splunk App for Stream for Enhanced Op...Splunk
 
Data Ops at TripActions
Data Ops at TripActionsData Ops at TripActions
Data Ops at TripActionsRob Winters
 
SplunkLive! Frankfurt 2018 - Monitoring the End User Experience with Splunk
SplunkLive! Frankfurt 2018 - Monitoring the End User Experience with SplunkSplunkLive! Frankfurt 2018 - Monitoring the End User Experience with Splunk
SplunkLive! Frankfurt 2018 - Monitoring the End User Experience with SplunkSplunk
 
Presentation tritan erp service
Presentation tritan erp servicePresentation tritan erp service
Presentation tritan erp serviceTritan solution
 
OSMC 2023 | Journey to observability: tracking every function execution in pr...
OSMC 2023 | Journey to observability: tracking every function execution in pr...OSMC 2023 | Journey to observability: tracking every function execution in pr...
OSMC 2023 | Journey to observability: tracking every function execution in pr...NETWAYS
 

Similar to Observability for Application Developers (1)-1.pptx (20)

Go Observability (in practice)
Go Observability (in practice)Go Observability (in practice)
Go Observability (in practice)
 
Dot Net performance monitoring
 Dot Net performance monitoring Dot Net performance monitoring
Dot Net performance monitoring
 
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdf
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdfPrometheus-Grafana-RahulSoni1584KnolX.pptx.pdf
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdf
 
Major Project Report on Designing an Android Application for Electrical Maint...
Major Project Report on Designing an Android Application for Electrical Maint...Major Project Report on Designing an Android Application for Electrical Maint...
Major Project Report on Designing an Android Application for Electrical Maint...
 
SplunkLive! Zurich 2018: Monitoring the End User Experience with Splunk
SplunkLive! Zurich 2018: Monitoring the End User Experience with SplunkSplunkLive! Zurich 2018: Monitoring the End User Experience with Splunk
SplunkLive! Zurich 2018: Monitoring the End User Experience with Splunk
 
Why we decided on RSA Security Analytics for network visibility
Why we decided on RSA Security Analytics for network visibilityWhy we decided on RSA Security Analytics for network visibility
Why we decided on RSA Security Analytics for network visibility
 
SplunkLive! Munich 2018: Monitoring the End-User Experience with Splunk
SplunkLive! Munich 2018: Monitoring the End-User Experience with SplunkSplunkLive! Munich 2018: Monitoring the End-User Experience with Splunk
SplunkLive! Munich 2018: Monitoring the End-User Experience with Splunk
 
Modern Monitoring
Modern MonitoringModern Monitoring
Modern Monitoring
 
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
 
Splunk MINT for Mobile Intelligence and Splunk App for Stream for Enhanced Op...
Splunk MINT for Mobile Intelligence and Splunk App for Stream for Enhanced Op...Splunk MINT for Mobile Intelligence and Splunk App for Stream for Enhanced Op...
Splunk MINT for Mobile Intelligence and Splunk App for Stream for Enhanced Op...
 
sagar
sagarsagar
sagar
 
sat_presentation
sat_presentationsat_presentation
sat_presentation
 
3 types of monitoring for 2020
3 types of monitoring for 20203 types of monitoring for 2020
3 types of monitoring for 2020
 
Arunprakash Alagesan
Arunprakash AlagesanArunprakash Alagesan
Arunprakash Alagesan
 
Data Ops at TripActions
Data Ops at TripActionsData Ops at TripActions
Data Ops at TripActions
 
SplunkLive! Frankfurt 2018 - Monitoring the End User Experience with Splunk
SplunkLive! Frankfurt 2018 - Monitoring the End User Experience with SplunkSplunkLive! Frankfurt 2018 - Monitoring the End User Experience with Splunk
SplunkLive! Frankfurt 2018 - Monitoring the End User Experience with Splunk
 
Presentation tritan erp service
Presentation tritan erp servicePresentation tritan erp service
Presentation tritan erp service
 
OSMC 2023 | Journey to observability: tracking every function execution in pr...
OSMC 2023 | Journey to observability: tracking every function execution in pr...OSMC 2023 | Journey to observability: tracking every function execution in pr...
OSMC 2023 | Journey to observability: tracking every function execution in pr...
 
SANJAY_SINGH
SANJAY_SINGHSANJAY_SINGH
SANJAY_SINGH
 
Requirement analysis
Requirement analysisRequirement analysis
Requirement analysis
 

More from OpsTree solutions

Deliver Latency Free Customer Experience
Deliver Latency Free Customer ExperienceDeliver Latency Free Customer Experience
Deliver Latency Free Customer ExperienceOpsTree solutions
 
How can Enterprises benefit from GitOps.pdf
How can  Enterprises benefit from GitOps.pdfHow can  Enterprises benefit from GitOps.pdf
How can Enterprises benefit from GitOps.pdfOpsTree solutions
 
OpenTelemetry - A powerful new standard for observability
OpenTelemetry - A powerful new standard for observabilityOpenTelemetry - A powerful new standard for observability
OpenTelemetry - A powerful new standard for observabilityOpsTree solutions
 
Getting started with Loki on GKE
Getting started with Loki on GKEGetting started with Loki on GKE
Getting started with Loki on GKEOpsTree solutions
 
Enabling Governance with Jira For Change Management - BuildPiper
Enabling Governance with Jira For Change Management - BuildPiperEnabling Governance with Jira For Change Management - BuildPiper
Enabling Governance with Jira For Change Management - BuildPiperOpsTree solutions
 
Application Delivery & Change Management - BuildPiper
Application Delivery & Change Management - BuildPiperApplication Delivery & Change Management - BuildPiper
Application Delivery & Change Management - BuildPiperOpsTree solutions
 
Graviton Migration on AWS - Achieve cost efficiency
Graviton Migration on AWS - Achieve cost efficiency Graviton Migration on AWS - Achieve cost efficiency
Graviton Migration on AWS - Achieve cost efficiency OpsTree solutions
 
Everything about Blue-Green Deployment Strategy!
Everything about Blue-Green Deployment Strategy!Everything about Blue-Green Deployment Strategy!
Everything about Blue-Green Deployment Strategy!OpsTree solutions
 
Managed Devsecops Approach for Tech Startups
Managed Devsecops Approach for Tech StartupsManaged Devsecops Approach for Tech Startups
Managed Devsecops Approach for Tech StartupsOpsTree solutions
 
Secure continuous integration and deployment features
Secure continuous integration and deployment featuresSecure continuous integration and deployment features
Secure continuous integration and deployment featuresOpsTree solutions
 
BuildPiper - Delivering Value at Scale
BuildPiper - Delivering Value at ScaleBuildPiper - Delivering Value at Scale
BuildPiper - Delivering Value at ScaleOpsTree solutions
 
Infographics - Microservices to grow your buisness
Infographics - Microservices to grow your buisnessInfographics - Microservices to grow your buisness
Infographics - Microservices to grow your buisnessOpsTree solutions
 
Managed Kubernetes - 5 Reasons Why your business needs it!"
Managed Kubernetes - 5 Reasons Why your business needs it!"Managed Kubernetes - 5 Reasons Why your business needs it!"
Managed Kubernetes - 5 Reasons Why your business needs it!"OpsTree solutions
 

More from OpsTree solutions (19)

Deliver Latency Free Customer Experience
Deliver Latency Free Customer ExperienceDeliver Latency Free Customer Experience
Deliver Latency Free Customer Experience
 
How can Enterprises benefit from GitOps.pdf
How can  Enterprises benefit from GitOps.pdfHow can  Enterprises benefit from GitOps.pdf
How can Enterprises benefit from GitOps.pdf
 
OpenTelemetry - A powerful new standard for observability
OpenTelemetry - A powerful new standard for observabilityOpenTelemetry - A powerful new standard for observability
OpenTelemetry - A powerful new standard for observability
 
Getting started with Loki on GKE
Getting started with Loki on GKEGetting started with Loki on GKE
Getting started with Loki on GKE
 
Graviton Migration on AWS
Graviton Migration on AWSGraviton Migration on AWS
Graviton Migration on AWS
 
Graviton Migration on AWS
Graviton Migration on AWSGraviton Migration on AWS
Graviton Migration on AWS
 
Sonar use case (4).pdf
Sonar use case  (4).pdfSonar use case  (4).pdf
Sonar use case (4).pdf
 
Enabling Governance with Jira For Change Management - BuildPiper
Enabling Governance with Jira For Change Management - BuildPiperEnabling Governance with Jira For Change Management - BuildPiper
Enabling Governance with Jira For Change Management - BuildPiper
 
Application Delivery & Change Management - BuildPiper
Application Delivery & Change Management - BuildPiperApplication Delivery & Change Management - BuildPiper
Application Delivery & Change Management - BuildPiper
 
Graviton Migration on AWS - Achieve cost efficiency
Graviton Migration on AWS - Achieve cost efficiency Graviton Migration on AWS - Achieve cost efficiency
Graviton Migration on AWS - Achieve cost efficiency
 
Everything about Blue-Green Deployment Strategy!
Everything about Blue-Green Deployment Strategy!Everything about Blue-Green Deployment Strategy!
Everything about Blue-Green Deployment Strategy!
 
SRE Fundamentals
SRE FundamentalsSRE Fundamentals
SRE Fundamentals
 
CANARY DEPLOYMENT
CANARY DEPLOYMENTCANARY DEPLOYMENT
CANARY DEPLOYMENT
 
Managed Devsecops Approach for Tech Startups
Managed Devsecops Approach for Tech StartupsManaged Devsecops Approach for Tech Startups
Managed Devsecops Approach for Tech Startups
 
Secure continuous integration and deployment features
Secure continuous integration and deployment featuresSecure continuous integration and deployment features
Secure continuous integration and deployment features
 
BuildPiper - Delivering Value at Scale
BuildPiper - Delivering Value at ScaleBuildPiper - Delivering Value at Scale
BuildPiper - Delivering Value at Scale
 
Infographics - Microservices to grow your buisness
Infographics - Microservices to grow your buisnessInfographics - Microservices to grow your buisness
Infographics - Microservices to grow your buisness
 
Prometheus workshop
Prometheus workshopPrometheus workshop
Prometheus workshop
 
Managed Kubernetes - 5 Reasons Why your business needs it!"
Managed Kubernetes - 5 Reasons Why your business needs it!"Managed Kubernetes - 5 Reasons Why your business needs it!"
Managed Kubernetes - 5 Reasons Why your business needs it!"
 

Recently uploaded

API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governanceWSO2
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaWSO2
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfdanishmna97
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingWSO2
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformWSO2
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 

Recently uploaded (20)

API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 

Observability for Application Developers (1)-1.pptx

  • 2. About me Abhinav Kapoor • Principal Engineer, Nagarro • Work as a Technical Architect for .NET Solutions. • 15+ years of technical experience in architecting & developing solutions for Banking, Supply chain & Embedded domains. • I enjoy writing about software architecture, communication patterns, modernization & more on Medium.com https://medium.com/@kapoorabhinav. Accreditations I‘m available on Linked at https://www.linkedin.com/in/abhinav-kapoor 0394456/
  • 3. What we’ll cover 1. Why do we need Observability? 2. How does it fit in a cloud native landscape? 3. Application Instrumentation 1. Logs 2. Traces 3. metrics 4. Health Check 4. OpenTelemetry 1. What is it & why is it needed? 2. Its components. 3. Its signals
  • 4. Why do we need Observability? ● We want to avoid these reactions
  • 5. Why do we need Observability? ● Issue Timeline in Operation Support Detect Identify Fix Deploy Issue Start Issue End Mean Time To Repair (MTTR) Mean Time To Identify (MTTI) Mean Time To Detect (MTTD)
  • 6. Why is Observability more critical now? ● Evolution to smaller but more services - high cohesion and low coupling. “Complexities arises when the dependencies among the elements become important” - We have reduced code complexity but introduced much more moving parts
  • 7. Why is Observability more critical now? Technology transformation trends Microservices Polyglot Applications Ephemeral & cloud native environments
  • 8. How does it fit in a cloud native landscape?
  • 9. How does it fit in a cloud native landscape? Instrument Capture Transport Analyze Act SDK, Agent Scrapping/ Agent Dashboard Alerts / Notifications
  • 10. How does it fit in a cloud native landscape? Active Directory / IAM Certificate Manager Resources – DB, Broker, API Gateway Service A Sidecar Platform / Orchestrator Service B Sidecar Shared Services, Infrastructure Applications Audit logs, Activity logs, Metrics Central Aggregation & Collection Application logs, metrics, traces Monitoring & Alerts Access Control
  • 11. Application Instrumentation – What should I know & do as a developer ?
  • 12. Application Instrumentation ● Logs – What happened & who did it ? ● Traces – How did it happen ? ● Metrics – How much happened ? ● Health – How is the system doing ?
  • 13. Big picture ● We don’t just want to have connected black boxes. ● While infrastructure & sidecars can give a good overview, there are always blind spots. ● It speeds up development. Logs, metrics, Traces, Health Logs, metrics, Traces, Health Logs, metrics, Traces, Health Logs, metrics, Traces, Health
  • 14. Logs – What Happened ?
  • 15. Logs – Components of Logging Logging Framework Log Message - Severity, tags Application SDK Filter based on Severity - Destination Specific Log Providers ILogger
  • 16. Types of Logs ● Based on Information they contain – Governs where they are stored, how long they are stored, who can access them ○ Technical – Startup configurations, system event that the application receives ○ Business - Business and User Information ● Based on the schema ○ Unstructured – Simple statements with severity. ○ Structured / Schematic logs - Logs with Metadata to provide context. Pod, Service, IP address & more.
  • 17. Duality of Logs - Readable messages & Quarriable metrics
  • 18. Traces - How did it happen ?
  • 19. Traces – Types of traces ● Tracing – This is similar to logging. But noisier and collects more information from deeper parts of the application. ● Distributed Trancing – It’s the method to observe requests as they propagate through different services. ○ Its structured log which comes from different processes, nodes and can be stitched together to give an end to end view.
  • 20. Distributed Tracing Trace Collector Services Web API Broker Background Services Traces HTTP Request with tracing context TCP message with tracing context TCP message with tracing context DB Store tracing context with data in db Services Read tracing context • Correlation IDs are passed in the remote calls/messages. • The receiver regenerates context proceeds with its unit of work. • Depending upon business transaction, traces can be of few seconds to hours or days.
  • 21. Metrics–How much is happening?
  • 22. Metrics – Components Vendor specific Instrumentation & collection Metrics- Name, description, Labels & Value Application Backend – time series database - Adds timestamp - Supports aggregation (eg PromQL) SDK /Metrics scrape Visualization PromQL expressions Alerts
  • 23. Types of Prometheus Metrics - Counters Track the occurrence of an event in application. ● The absolute value may not be helpful. ● The delta between two timestamps or the rate of change over time is helpful. Functions like Increase(), Rate(). ● Consider the case when applications restart. ● Examples - Orders processed counter, unsuccessful counters, business error, technical errors. ● Sample output of meta-endpoint (/metrics) # HELP http_requests_total Total number of http api requests # TYPE http_requests_total counter http_requests_total{api=“add_Product“, company=“ITC”} 3433 counter http_requests_total{api=“add_Product“, company =“TATA”} 2000 PromQL increase(http_requests_total{api="add_product“, company=“ITC”}[5m])
  • 24. Types of Prometheus Metrics - Gauges Snapshots of a metric at a single point in time. Not an event. ● Example is message queue size, CPU utilization at a given time. ● Useful functions - avg_over_time, max_over_time, min_over_time, and quantile_over_time
  • 25. Types of Prometheus Metrics - Histograms ● Histograms divide the entire range of measurements into a set of intervals— named buckets—and count how many measurements fall into each bucket. ● These buckets are defined at compile time, they are upper inclusive (le). ● Histogram looks like the following http_request_duration_seconds_sum{api="add_product" instance=" host1 "} 8953.332 http_request_duration_seconds_count{api="add_product" instance=" host1"} 27892 http_request_duration_seconds_bucket{api="add_product", instance=" host1", le="0.05"} 1672 http_request_duration_seconds_bucket{api="add_product", instance=" host1", le="0.1"} 8954 http_request_duration_seconds_bucket{api="add_product", instance=" host1", le="0.25"} 14251 sum by (le) (rate(http_request_duration_seconds_bucket[5m]))
  • 26. Types of Prometheus Metrics - Summaries ● Similar to Histogram but they add quantile label to bound the summary. ● OpenTelemetry has been marked it legacy. ● Summary looks like the following http_request_duration_seconds_sum{api="add_product" instance="host1.domain.com"} 8953.332 http_request_duration_seconds_count{api="add_product" instance="host1.domain.com"} 27892 http_request_duration_seconds{api="add_product" instance=" host1 " quantile="0"} http_request_duration_seconds{api="add_product" instance="host1" quantile="0.5"} 0.232227334 ● Considerations due to quantile.
  • 27. Metrics - What happens after instrumentation? ● Enrichthe application instrumentation with infrastructure information such as Service Name, Pod, IP address. ● The backend saves it with timestamp ● Add Alerts & dashboards. Help available at https://grafana.com/grafana/dashboards/
  • 28. Health Check– How is the system doing ?
  • 29. Health check ● Basic use is liveness probe. ● More useful when we add dependencies. ● Prebuilt libraries depending upon the language ● Considerations ○ Orchestrator may restart the service while the issue is in a dependency. ○ Checking all downstream services could be time consuming. ○ Meta-end points (/health) may not be exposed. Instead of writing to the response metrics may be used to indicate issue.
  • 31. What is OpenTelemetry ? ● It is an open-source observability framework for infrastructure & application instrumentation. ● It’s the second-highest ranked CNCF project by activity and contributors after K8s ● It aims to be vendor agnostic & the SDK is produced by OpenTelemetry. ● Its still young so not all features are available for all languages. ● As Open Telemetry doesn’t provide a backend implementation (its concern is creating, collecting, and sending signals), the data will flow to another system or systems for storage and querying.
  • 32. Why OpenTelemetry? ● Before Open Telemetry ○ APM vendors had their own things – SDKs, Agents, Grammer ○ Vendor lock-in. ○ Application developers, built wrappers, event hooks to keep application isolated. ○ Competing Open standards – OpenTracing (Traces), OpenCensus (Metrics, Traces)
  • 33. OpenTelemetry Library - Instrumentation (Automatic & Manual) Exporters Exporter - Backends/APM Application Trace, metrics, Logs SDK Components of OpenTelemetry Collector Multiple receivers Multiple exporters
  • 34. Signals of OpenTelemetry - Metrics ● Differences from Prometheus metrics ○ Summary is marked deprecated ○ Some more functions & data type flexibility
  • 35. Signals of OpenTelemetry -Tracing Data model of an OpenTelemetry Span Example tracing for a news article. Source - https://www.timescale.com/blog/what-are-traces-and-how-sql-yes-sql- and-opentelemetry-can-help-us-get-more-value-out-of-traces-to-build-better- software/

Editor's Notes

  1. We’ll take the questions at the end.
  2. Monitoring is collecting metrics & logs Observability is the ability to observe what is going on inside the system. Consider it an NFR.
  3. MTTD -> metrics or synthetic tests MTTI -> Traces
  4. Emergence for micro-services – Lot more Story of a certificate expiring somewhere instead of a single server.
  5. Everything writing to one file. Or few files is lot more simple. Lot more moving parts. Speed of development
  6. Conceptual View
  7. Its happening at multiple levels
  8. Logs – important events. State changes, who did. Trances – Are flow within the program or outside the program. Matrics – are numbers which can be aggregated to have formulas – SLOs, KPI & alerts Health – current state of health
  9. Automatic & manual – instrumentation
  10. Cost!!
  11. All services should have it.
  12. Do consider the situation when it turns to zero. https://grafana.com/grafana/dashboards/ including ones from OPSTREE
  13. May be used in cases where strict KPIs are enforced on duration for example
  14. https://grafana.com/grafana/dashboards/ Which help me detect issue & fix the issues faster. Which help management find datapoints. Which can report compliance & SLO.
  15. Alternatives 1. Have a monitoring on solution 2. cache the results
  16. Receivers can be used in migration
  17. As a developer, you don’t really need to worry about the models—but having a basic understanding helps.