SlideShare a Scribd company logo
1 of 39
Observability & More
Alon Fliess
Chief Architect
alonf@codevalue.net
@alon_fliess
http://alonfliess.me
http://codevalue.net
Cloudflare blames ‘bad software’ deployment for today’s outage
About Me
 Alon Fliess:
 Chief Software Architect & Co-Founder at OzCode & CodeValue
 More than 30 years of hands-on experience
 Microsoft Regional Director & Microsoft Azure MVP
 Spend most of my time in project analysis, architecture, design
 Code at night
Azure Israel
 https://www.meetup.com/AzureIsrael
4
Agenda
 DevOps, the true story
 Microservice Architecture, the complexity shift
 Ops & Monitoring
 Site Reliable Managers
 Developers & Observability
 Business (marketing, sales, management) and
observability
 Application Performance Monitoring
 How does it work?
 Distributed Tracing
 Production problem solving
5
The Essence of DevOps
 Better Software, Faster! When Development and Operations Synergize
 Covers the *entire* Application Lifecycle
6
Microservice Architecture == Complexity Shift
7
Ops  Vital Signs: Heartbeat, Blood Pressure, Temperature
8
What Do Site Reliability Managers (SRE) Want?
9
What Do Developers Want?
10
What Do Marketing & Sales Teams Want?
11
What is Observability? (Twitter 2013)
12
Gartner
Critical Capabilities for APM (May 2019)
13
Business
Analysis
Anomaly
Detection
IT Operations
DevOps Release
Application Support
Application Development
Application Owner
Use Cases
14
APM Players
Dynatrace
AppDynamics (Cisco)
Datadog
Splunk
Broadcom (CA Technologies)
New Relic
Riverbed
IBM
Instana
Oracle
Tingyun
SolarWinds
ManageEngine
Micro Focus
15
How Does Monitoring & Tracing Work?
16
Operating Systems
APM system tracking agent installed on the machine
CPU, Memory, I/O, Network
Code Tracing
Instrumentation
Manual
Auto
Runtime data collection
Instrumentation – Original Pseudo Code
17
Function AddToBasket(var productId, var quantity)
if (quantity < 0)
return false
var product = Dal.GetProductById(productId)
BasketService.Add(product, quantity)
return true
Instrumentation – Add Logging on Errors
18
Function AddToBasket(var productId, var quantity)
if (quantity < 0)
Log(“Error: Negative quantity value”)
return false
var product = Dal.GetProductById(productId)
BasketService.Add(product, quantity)
return true
Instrumentation – Add Metrics of Usage and Errors
19
Function AddToBasket(var productId, var quantity)
metrics.Count(“AddToBasket”, 1)
if (quantity < 0)
Log(“Error: Negative quantity value”)
metrics.Count(“AddToBasketFailure”, 1)
return false
var product = Dal.GetProductById(productId)
BasketService.Add(product, quantity)
return true
Instrumentation – Measure Latency
20
Function AddToBasket(var productId, var quantity)
metrics.Count(“AddToBasket”, 1)
start = time()
if (quantity < 0)
Log(“Error: Negative quantity value”)
metrics.Count(“AddToBasketFailure”, 1)
return false
var product = Dal.GetProductById(productId);
BasketService.Add(product, quantity);
metrics.Measure(“AddToBasket”, time() – start);
return true;
Instrumentation – Measure Latency Everywhere
21
Function AddToBasket(var productId, var quantity)
metrics.Count(“AddToBasket”, 1)
start = time()
if (quantity < 0)
Log(“Error: Negative quantity value”)
metrics.Count(“AddToBasketFailure”, 1)
return false
var product = Dal.GetProductById(productId)
metrics.Measure(“AddToBasket_GetProductById”, time() – start)
BasketService.Add(product, quantity)
metrics.Measure(“AddToBasket”, time() – start)
return true
Instrumentation – Add Debugging Information
22
Function AddToBasket(var productId, var quantity)
debug.AddParameters(“AddToBasket”, [[“ProductId”, productid],[“quantity”, quantity]])
metrics.Count(“AddToBasket”, 1)
start = time()
if (quantity < 0)
Log(“Error: Negative quantity value”)
metrics.Count(“AddToBasketFailure”, 1)
debug.AddError(“AddToBasket”, GetErrorData())
return false
var product = Dal.GetProductById(productId)
debug.AddValue(“AddToBasket”, [[“product”, product]])
metrics.Measure(“AddToBasket_GetProductById”, time() – start)
BasketService.Add(product, quantity)
metrics.Measure(“AddToBasket”, time() – start)
return true
Instrumentation – Original vs. Instrumented Code
23
Function AddToBasket(var productId, var quantity)
debug.AddParameters(“AddToBasket”, [[“ProductId”, productid],[“quantity”, quantity]])
metrics.Count(“AddToBasket”, 1)
start = time()
if (quantity < 0)
Log(“Error: Negative quantity value”)
metrics.Count(“AddToBasketFailure”, 1)
debug.AddError(“AddToBasket”, GetErrorData())
return false
var product = Dal.GetProductById(productId)
debug.AddValue(“AddToBasket”, [[“product”, product]])
metrics.Measure(“AddToBasket_GetProductById”, time() – start)
BasketService.Add(product, quantity)
metrics.Measure(“AddToBasket”, time() – start)
return true
Instrumentation and Tracing Automation
 Aspect Oriented Approach
 Communication level instrumentation
 Pipeline interception – technology depended
 Resource performance counters – DB statistics for example
 Code Instrumentation
 Manual – deploy a package and call it
 Automatic – bytecode instrumentation libraries and tools
 Distributed Tracing
 Passing call context between services
24
Distributed Tracing
25
Id:123
Application
A
Service A
B
Service B
Span
Span
Span
Instrumentation – Call Context
26
Function AddToBasket(var productId, var quantity, var context)
debug.AddParameters(context, “AddToBasket”, [[“ProductId”, productid],[“quantity”, quantity]])
metrics.Count(context, “AddToBasket”, 1)
start = time()
if (quantity < 0)
Log(context, “Error: Negative quantity value”)
metrics.Count(context, “AddToBasketFailure”, 1)
debug.AddError(context, “AddToBasket”, GetErrorData())
return false
var product = Dal.GetProductById(context, productId)
debug.AddValue(context, “AddToBasket”, [[“product”, product]])
metrics.Measure(context, “AddToBasket_GetProductById”, time() – start)
BasketService.Add(context, product, quantity)
metrics.Measure(context, “AddToBasket”, time() – start)
return true
Context:
Call Id
URL
HTTP Method
DB Host
User Info
Timing Info
Instrumentation – Using Span
27
Function AddToBasket(var productId, var quantity, var context)
span = trace.BeginSpan(context, {“AddToBasket”, productid, quantity})
if (quantity < 0)
span.Error(“Negative quantity value”)
return false;
var product = Dal.GetProductById(context, productId)
span.AddValue(“product”, product)
BasketService.Add(context, product, quantity)
span.End()
return true;
Span:
Call Id
URL
HTTP Method
DB Host
User Info
Timing Info
OpenTracing & OpenCencus
28
What Do SREs & Developers Want – From Each Other?
29
New Relic APM Dashboard
APM Error Analysis – Not Enough Information
Error Rate
Request information
Stack trace
 APM systems can assist in health monitoring and fault first aid
Production Problem Solving Challenges
10kg
Can’t mess with
data
10kg
No Debugging
tools
10kg
Code is
optimized
10kg
Older source
code version
10kg
Can’t impact
performance
10kg
Data must stay in
a secure env.
10kg
Data is private and
contains PII
10kg
Very hard to
reproduce the bug
Problem Solving With APM
33
Production Problem Solving Platforms
 OzCode
 OverOps
 Rookout
 Application Insights
34
Problem Solving With a Production Debugger
35
OzCode Production Debugger
36
Summary
37
Q
A
38
Alon Fliess
Chief Architect
alonf@codevalue.net
@alon_fliess
http://alonfliess.me
http://codevalue.net

More Related Content

What's hot

Sumo Logic Cert Jam - Advanced Metrics with Kubernetes
Sumo Logic Cert Jam - Advanced Metrics with KubernetesSumo Logic Cert Jam - Advanced Metrics with Kubernetes
Sumo Logic Cert Jam - Advanced Metrics with KubernetesSumo Logic
 
FIWARE Complex Event Processing
FIWARE Complex Event ProcessingFIWARE Complex Event Processing
FIWARE Complex Event ProcessingMiguel González
 
Introduction to Open Telemetry as Observability Library
Introduction to Open  Telemetry as Observability LibraryIntroduction to Open  Telemetry as Observability Library
Introduction to Open Telemetry as Observability LibraryTonny Adhi Sabastian
 
Find Out What's New With WhiteSource September 2018- A WhiteSource Webinar
Find Out What's New With WhiteSource September 2018- A WhiteSource WebinarFind Out What's New With WhiteSource September 2018- A WhiteSource Webinar
Find Out What's New With WhiteSource September 2018- A WhiteSource WebinarWhiteSource
 
Data Obfuscation in Splunk Enterprise
Data Obfuscation in Splunk EnterpriseData Obfuscation in Splunk Enterprise
Data Obfuscation in Splunk EnterpriseSplunk
 
IBM Index Conference - 10 steps to build token based API Security
IBM Index Conference - 10 steps to build token based API SecurityIBM Index Conference - 10 steps to build token based API Security
IBM Index Conference - 10 steps to build token based API SecuritySenthilkumar Gopal
 
Cybersecurity with Apache Metron and Apache Solr - Ward Bekker, Hortonworks &...
Cybersecurity with Apache Metron and Apache Solr - Ward Bekker, Hortonworks &...Cybersecurity with Apache Metron and Apache Solr - Ward Bekker, Hortonworks &...
Cybersecurity with Apache Metron and Apache Solr - Ward Bekker, Hortonworks &...Lucidworks
 
What Questions Do Programmers Ask About Configuration as Code?
What Questions Do Programmers Ask About Configuration as Code?What Questions Do Programmers Ask About Configuration as Code?
What Questions Do Programmers Ask About Configuration as Code?Akond Rahman
 
Vulnerability Detection Based on Git History
Vulnerability Detection Based on Git HistoryVulnerability Detection Based on Git History
Vulnerability Detection Based on Git HistoryKenta Yamamoto
 
Shhh!: Secret Management Practices for Infrastructure as Code
Shhh!: Secret Management Practices for Infrastructure as Code Shhh!: Secret Management Practices for Infrastructure as Code
Shhh!: Secret Management Practices for Infrastructure as Code Akond Rahman
 
Sumo Logic Cert Jam - Security & Compliance
Sumo Logic Cert Jam - Security & ComplianceSumo Logic Cert Jam - Security & Compliance
Sumo Logic Cert Jam - Security & ComplianceSumo Logic
 
apidays LIVE London 2021 - API Security in Highly Volatile Threat Landscapes ...
apidays LIVE London 2021 - API Security in Highly Volatile Threat Landscapes ...apidays LIVE London 2021 - API Security in Highly Volatile Threat Landscapes ...
apidays LIVE London 2021 - API Security in Highly Volatile Threat Landscapes ...apidays
 
Security Certification: Security Analytics using Sumo Logic - Oct 2018
Security Certification: Security Analytics using Sumo Logic - Oct 2018Security Certification: Security Analytics using Sumo Logic - Oct 2018
Security Certification: Security Analytics using Sumo Logic - Oct 2018Sumo Logic
 
Providence Future of Data Meetup - Apache Metron Open Source Cybersecurity Pl...
Providence Future of Data Meetup - Apache Metron Open Source Cybersecurity Pl...Providence Future of Data Meetup - Apache Metron Open Source Cybersecurity Pl...
Providence Future of Data Meetup - Apache Metron Open Source Cybersecurity Pl...Carolyn Duby
 
Sumo Logic Cert Jam - Advanced Metrics with Kubernetes
Sumo Logic Cert Jam - Advanced Metrics with KubernetesSumo Logic Cert Jam - Advanced Metrics with Kubernetes
Sumo Logic Cert Jam - Advanced Metrics with KubernetesSumo Logic
 
(SACON) Pradyumn Nand & Mrinal Pande - Metron & Blitz, Building and scaling y...
(SACON) Pradyumn Nand & Mrinal Pande - Metron & Blitz, Building and scaling y...(SACON) Pradyumn Nand & Mrinal Pande - Metron & Blitz, Building and scaling y...
(SACON) Pradyumn Nand & Mrinal Pande - Metron & Blitz, Building and scaling y...Priyanka Aash
 
Improving Automated Tests with Fluent Assertions
Improving Automated Tests with Fluent Assertions Improving Automated Tests with Fluent Assertions
Improving Automated Tests with Fluent Assertions TestingCR
 
Conf2014_SplunkSecurityNinjutsu
Conf2014_SplunkSecurityNinjutsuConf2014_SplunkSecurityNinjutsu
Conf2014_SplunkSecurityNinjutsuSplunk
 
SplunkLive! Frankfurt 2018 - Customer Presentation: Bosch Cyber Defense Center
SplunkLive! Frankfurt 2018 - Customer Presentation: Bosch Cyber Defense CenterSplunkLive! Frankfurt 2018 - Customer Presentation: Bosch Cyber Defense Center
SplunkLive! Frankfurt 2018 - Customer Presentation: Bosch Cyber Defense CenterSplunk
 

What's hot (20)

Sumo Logic Cert Jam - Advanced Metrics with Kubernetes
Sumo Logic Cert Jam - Advanced Metrics with KubernetesSumo Logic Cert Jam - Advanced Metrics with Kubernetes
Sumo Logic Cert Jam - Advanced Metrics with Kubernetes
 
FIWARE Complex Event Processing
FIWARE Complex Event ProcessingFIWARE Complex Event Processing
FIWARE Complex Event Processing
 
Introduction to Open Telemetry as Observability Library
Introduction to Open  Telemetry as Observability LibraryIntroduction to Open  Telemetry as Observability Library
Introduction to Open Telemetry as Observability Library
 
Find Out What's New With WhiteSource September 2018- A WhiteSource Webinar
Find Out What's New With WhiteSource September 2018- A WhiteSource WebinarFind Out What's New With WhiteSource September 2018- A WhiteSource Webinar
Find Out What's New With WhiteSource September 2018- A WhiteSource Webinar
 
Data Obfuscation in Splunk Enterprise
Data Obfuscation in Splunk EnterpriseData Obfuscation in Splunk Enterprise
Data Obfuscation in Splunk Enterprise
 
IBM Index Conference - 10 steps to build token based API Security
IBM Index Conference - 10 steps to build token based API SecurityIBM Index Conference - 10 steps to build token based API Security
IBM Index Conference - 10 steps to build token based API Security
 
Cybersecurity with Apache Metron and Apache Solr - Ward Bekker, Hortonworks &...
Cybersecurity with Apache Metron and Apache Solr - Ward Bekker, Hortonworks &...Cybersecurity with Apache Metron and Apache Solr - Ward Bekker, Hortonworks &...
Cybersecurity with Apache Metron and Apache Solr - Ward Bekker, Hortonworks &...
 
What Questions Do Programmers Ask About Configuration as Code?
What Questions Do Programmers Ask About Configuration as Code?What Questions Do Programmers Ask About Configuration as Code?
What Questions Do Programmers Ask About Configuration as Code?
 
Vulnerability Detection Based on Git History
Vulnerability Detection Based on Git HistoryVulnerability Detection Based on Git History
Vulnerability Detection Based on Git History
 
Shhh!: Secret Management Practices for Infrastructure as Code
Shhh!: Secret Management Practices for Infrastructure as Code Shhh!: Secret Management Practices for Infrastructure as Code
Shhh!: Secret Management Practices for Infrastructure as Code
 
Sumo Logic Cert Jam - Security & Compliance
Sumo Logic Cert Jam - Security & ComplianceSumo Logic Cert Jam - Security & Compliance
Sumo Logic Cert Jam - Security & Compliance
 
Fiware, the future internet
Fiware, the future internetFiware, the future internet
Fiware, the future internet
 
apidays LIVE London 2021 - API Security in Highly Volatile Threat Landscapes ...
apidays LIVE London 2021 - API Security in Highly Volatile Threat Landscapes ...apidays LIVE London 2021 - API Security in Highly Volatile Threat Landscapes ...
apidays LIVE London 2021 - API Security in Highly Volatile Threat Landscapes ...
 
Security Certification: Security Analytics using Sumo Logic - Oct 2018
Security Certification: Security Analytics using Sumo Logic - Oct 2018Security Certification: Security Analytics using Sumo Logic - Oct 2018
Security Certification: Security Analytics using Sumo Logic - Oct 2018
 
Providence Future of Data Meetup - Apache Metron Open Source Cybersecurity Pl...
Providence Future of Data Meetup - Apache Metron Open Source Cybersecurity Pl...Providence Future of Data Meetup - Apache Metron Open Source Cybersecurity Pl...
Providence Future of Data Meetup - Apache Metron Open Source Cybersecurity Pl...
 
Sumo Logic Cert Jam - Advanced Metrics with Kubernetes
Sumo Logic Cert Jam - Advanced Metrics with KubernetesSumo Logic Cert Jam - Advanced Metrics with Kubernetes
Sumo Logic Cert Jam - Advanced Metrics with Kubernetes
 
(SACON) Pradyumn Nand & Mrinal Pande - Metron & Blitz, Building and scaling y...
(SACON) Pradyumn Nand & Mrinal Pande - Metron & Blitz, Building and scaling y...(SACON) Pradyumn Nand & Mrinal Pande - Metron & Blitz, Building and scaling y...
(SACON) Pradyumn Nand & Mrinal Pande - Metron & Blitz, Building and scaling y...
 
Improving Automated Tests with Fluent Assertions
Improving Automated Tests with Fluent Assertions Improving Automated Tests with Fluent Assertions
Improving Automated Tests with Fluent Assertions
 
Conf2014_SplunkSecurityNinjutsu
Conf2014_SplunkSecurityNinjutsuConf2014_SplunkSecurityNinjutsu
Conf2014_SplunkSecurityNinjutsu
 
SplunkLive! Frankfurt 2018 - Customer Presentation: Bosch Cyber Defense Center
SplunkLive! Frankfurt 2018 - Customer Presentation: Bosch Cyber Defense CenterSplunkLive! Frankfurt 2018 - Customer Presentation: Bosch Cyber Defense Center
SplunkLive! Frankfurt 2018 - Customer Presentation: Bosch Cyber Defense Center
 

Similar to Observability and more architecture next 2020

Bootstrapping an App for Launch
Bootstrapping an App for LaunchBootstrapping an App for Launch
Bootstrapping an App for LaunchCraig Phares
 
MongoDB World 2018: Ch-Ch-Ch-Ch-Changes: Taking Your Stitch Application to th...
MongoDB World 2018: Ch-Ch-Ch-Ch-Changes: Taking Your Stitch Application to th...MongoDB World 2018: Ch-Ch-Ch-Ch-Changes: Taking Your Stitch Application to th...
MongoDB World 2018: Ch-Ch-Ch-Ch-Changes: Taking Your Stitch Application to th...MongoDB
 
Just enough web ops for web developers
Just enough web ops for web developersJust enough web ops for web developers
Just enough web ops for web developersDatadog
 
MongoDB.local Dallas 2019: MongoDB Stitch Tutorial
MongoDB.local Dallas 2019: MongoDB Stitch TutorialMongoDB.local Dallas 2019: MongoDB Stitch Tutorial
MongoDB.local Dallas 2019: MongoDB Stitch TutorialMongoDB
 
Composable and streamable Play apps
Composable and streamable Play appsComposable and streamable Play apps
Composable and streamable Play appsYevgeniy Brikman
 
Building a Pyramid: Symfony Testing Strategies
Building a Pyramid: Symfony Testing StrategiesBuilding a Pyramid: Symfony Testing Strategies
Building a Pyramid: Symfony Testing StrategiesCiaranMcNulty
 
Bootiful Development with Spring Boot and React - Dublin JUG 2018
Bootiful Development with Spring Boot and React - Dublin JUG 2018Bootiful Development with Spring Boot and React - Dublin JUG 2018
Bootiful Development with Spring Boot and React - Dublin JUG 2018Matt Raible
 
Google Analytics for Developers
Google Analytics for DevelopersGoogle Analytics for Developers
Google Analytics for DevelopersParadigma Digital
 
Building Push Triggers for Logic Apps
Building Push Triggers for Logic AppsBuilding Push Triggers for Logic Apps
Building Push Triggers for Logic AppsBizTalk360
 
What is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandWhat is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandMaarten Balliauw
 
Creating web api and consuming- part 1
Creating web api and consuming- part 1Creating web api and consuming- part 1
Creating web api and consuming- part 1Dipendra Shekhawat
 
Expedite the development lifecycle with MongoDB and serverless - DEM02 - Sant...
Expedite the development lifecycle with MongoDB and serverless - DEM02 - Sant...Expedite the development lifecycle with MongoDB and serverless - DEM02 - Sant...
Expedite the development lifecycle with MongoDB and serverless - DEM02 - Sant...Amazon Web Services
 
Google Analytics for Developers
Google Analytics for DevelopersGoogle Analytics for Developers
Google Analytics for DevelopersRubén Martínez
 
Streamlining data analysis through environmental alerts how to integrate ambe...
Streamlining data analysis through environmental alerts how to integrate ambe...Streamlining data analysis through environmental alerts how to integrate ambe...
Streamlining data analysis through environmental alerts how to integrate ambe...Ambee
 
Cómo tener analíticas en tu app y no volverte loco
Cómo tener analíticas en tu app y no volverte locoCómo tener analíticas en tu app y no volverte loco
Cómo tener analíticas en tu app y no volverte locoGemma Del Olmo
 
Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemYael Garten
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemShirshanka Das
 
Bootiful Development with Spring Boot and React - UberConf 2018
Bootiful Development with Spring Boot and React - UberConf 2018Bootiful Development with Spring Boot and React - UberConf 2018
Bootiful Development with Spring Boot and React - UberConf 2018Matt Raible
 

Similar to Observability and more architecture next 2020 (20)

Bootstrapping an App for Launch
Bootstrapping an App for LaunchBootstrapping an App for Launch
Bootstrapping an App for Launch
 
MongoDB World 2018: Ch-Ch-Ch-Ch-Changes: Taking Your Stitch Application to th...
MongoDB World 2018: Ch-Ch-Ch-Ch-Changes: Taking Your Stitch Application to th...MongoDB World 2018: Ch-Ch-Ch-Ch-Changes: Taking Your Stitch Application to th...
MongoDB World 2018: Ch-Ch-Ch-Ch-Changes: Taking Your Stitch Application to th...
 
Just enough web ops for web developers
Just enough web ops for web developersJust enough web ops for web developers
Just enough web ops for web developers
 
MongoDB.local Dallas 2019: MongoDB Stitch Tutorial
MongoDB.local Dallas 2019: MongoDB Stitch TutorialMongoDB.local Dallas 2019: MongoDB Stitch Tutorial
MongoDB.local Dallas 2019: MongoDB Stitch Tutorial
 
Composable and streamable Play apps
Composable and streamable Play appsComposable and streamable Play apps
Composable and streamable Play apps
 
Building a Pyramid: Symfony Testing Strategies
Building a Pyramid: Symfony Testing StrategiesBuilding a Pyramid: Symfony Testing Strategies
Building a Pyramid: Symfony Testing Strategies
 
Bootiful Development with Spring Boot and React - Dublin JUG 2018
Bootiful Development with Spring Boot and React - Dublin JUG 2018Bootiful Development with Spring Boot and React - Dublin JUG 2018
Bootiful Development with Spring Boot and React - Dublin JUG 2018
 
Google Web Toolkit
Google Web ToolkitGoogle Web Toolkit
Google Web Toolkit
 
Google Analytics for Developers
Google Analytics for DevelopersGoogle Analytics for Developers
Google Analytics for Developers
 
Building Push Triggers for Logic Apps
Building Push Triggers for Logic AppsBuilding Push Triggers for Logic Apps
Building Push Triggers for Logic Apps
 
What is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandWhat is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays Finland
 
Creating web api and consuming- part 1
Creating web api and consuming- part 1Creating web api and consuming- part 1
Creating web api and consuming- part 1
 
Expedite the development lifecycle with MongoDB and serverless - DEM02 - Sant...
Expedite the development lifecycle with MongoDB and serverless - DEM02 - Sant...Expedite the development lifecycle with MongoDB and serverless - DEM02 - Sant...
Expedite the development lifecycle with MongoDB and serverless - DEM02 - Sant...
 
Google Analytics for Developers
Google Analytics for DevelopersGoogle Analytics for Developers
Google Analytics for Developers
 
Streamlining data analysis through environmental alerts how to integrate ambe...
Streamlining data analysis through environmental alerts how to integrate ambe...Streamlining data analysis through environmental alerts how to integrate ambe...
Streamlining data analysis through environmental alerts how to integrate ambe...
 
Cómo tener analíticas en tu app y no volverte loco
Cómo tener analíticas en tu app y no volverte locoCómo tener analíticas en tu app y no volverte loco
Cómo tener analíticas en tu app y no volverte loco
 
Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystem
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
 
Bootiful Development with Spring Boot and React - UberConf 2018
Bootiful Development with Spring Boot and React - UberConf 2018Bootiful Development with Spring Boot and React - UberConf 2018
Bootiful Development with Spring Boot and React - UberConf 2018
 
Pragmatic Code Coverage
Pragmatic Code CoveragePragmatic Code Coverage
Pragmatic Code Coverage
 

More from Alon Fliess

Generative AI in CSharp with Semantic Kernel.pptx
Generative AI in CSharp with Semantic Kernel.pptxGenerative AI in CSharp with Semantic Kernel.pptx
Generative AI in CSharp with Semantic Kernel.pptxAlon Fliess
 
Generating cross platform .NET based azure IoTdevice
Generating cross platform .NET based azure IoTdeviceGenerating cross platform .NET based azure IoTdevice
Generating cross platform .NET based azure IoTdeviceAlon Fliess
 
C# Production Debugging Made Easy
 C# Production Debugging Made Easy C# Production Debugging Made Easy
C# Production Debugging Made EasyAlon Fliess
 
We Make Debugging Sucks Less
We Make Debugging Sucks LessWe Make Debugging Sucks Less
We Make Debugging Sucks LessAlon Fliess
 
Architecting io t solutions with microisoft azure ignite tour version
Architecting io t solutions with microisoft azure ignite tour versionArchitecting io t solutions with microisoft azure ignite tour version
Architecting io t solutions with microisoft azure ignite tour versionAlon Fliess
 
To microservice or not to microservice - ignite version
To microservice or not to microservice - ignite versionTo microservice or not to microservice - ignite version
To microservice or not to microservice - ignite versionAlon Fliess
 
Net core microservice development made easy with azure dev spaces
Net core microservice development made easy with azure dev spacesNet core microservice development made easy with azure dev spaces
Net core microservice development made easy with azure dev spacesAlon Fliess
 
DWX2018 IoT lecture
DWX2018 IoT lectureDWX2018 IoT lecture
DWX2018 IoT lectureAlon Fliess
 
Architecting IoT solutions with Microsoft Azure
Architecting IoT solutions with Microsoft AzureArchitecting IoT solutions with Microsoft Azure
Architecting IoT solutions with Microsoft AzureAlon Fliess
 
Azure Internet of Things
Azure Internet of ThingsAzure Internet of Things
Azure Internet of ThingsAlon Fliess
 

More from Alon Fliess (11)

Generative AI in CSharp with Semantic Kernel.pptx
Generative AI in CSharp with Semantic Kernel.pptxGenerative AI in CSharp with Semantic Kernel.pptx
Generative AI in CSharp with Semantic Kernel.pptx
 
Generating cross platform .NET based azure IoTdevice
Generating cross platform .NET based azure IoTdeviceGenerating cross platform .NET based azure IoTdevice
Generating cross platform .NET based azure IoTdevice
 
Zionet Overview
Zionet OverviewZionet Overview
Zionet Overview
 
C# Production Debugging Made Easy
 C# Production Debugging Made Easy C# Production Debugging Made Easy
C# Production Debugging Made Easy
 
We Make Debugging Sucks Less
We Make Debugging Sucks LessWe Make Debugging Sucks Less
We Make Debugging Sucks Less
 
Architecting io t solutions with microisoft azure ignite tour version
Architecting io t solutions with microisoft azure ignite tour versionArchitecting io t solutions with microisoft azure ignite tour version
Architecting io t solutions with microisoft azure ignite tour version
 
To microservice or not to microservice - ignite version
To microservice or not to microservice - ignite versionTo microservice or not to microservice - ignite version
To microservice or not to microservice - ignite version
 
Net core microservice development made easy with azure dev spaces
Net core microservice development made easy with azure dev spacesNet core microservice development made easy with azure dev spaces
Net core microservice development made easy with azure dev spaces
 
DWX2018 IoT lecture
DWX2018 IoT lectureDWX2018 IoT lecture
DWX2018 IoT lecture
 
Architecting IoT solutions with Microsoft Azure
Architecting IoT solutions with Microsoft AzureArchitecting IoT solutions with Microsoft Azure
Architecting IoT solutions with Microsoft Azure
 
Azure Internet of Things
Azure Internet of ThingsAzure Internet of Things
Azure Internet of Things
 

Recently uploaded

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....ShaimaaMohamedGalal
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 

Recently uploaded (20)

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 

Observability and more architecture next 2020

  • 1. Observability & More Alon Fliess Chief Architect alonf@codevalue.net @alon_fliess http://alonfliess.me http://codevalue.net
  • 2. Cloudflare blames ‘bad software’ deployment for today’s outage
  • 3. About Me  Alon Fliess:  Chief Software Architect & Co-Founder at OzCode & CodeValue  More than 30 years of hands-on experience  Microsoft Regional Director & Microsoft Azure MVP  Spend most of my time in project analysis, architecture, design  Code at night
  • 5. Agenda  DevOps, the true story  Microservice Architecture, the complexity shift  Ops & Monitoring  Site Reliable Managers  Developers & Observability  Business (marketing, sales, management) and observability  Application Performance Monitoring  How does it work?  Distributed Tracing  Production problem solving 5
  • 6. The Essence of DevOps  Better Software, Faster! When Development and Operations Synergize  Covers the *entire* Application Lifecycle 6
  • 7. Microservice Architecture == Complexity Shift 7
  • 8. Ops  Vital Signs: Heartbeat, Blood Pressure, Temperature 8
  • 9. What Do Site Reliability Managers (SRE) Want? 9
  • 10. What Do Developers Want? 10
  • 11. What Do Marketing & Sales Teams Want? 11
  • 12. What is Observability? (Twitter 2013) 12
  • 13. Gartner Critical Capabilities for APM (May 2019) 13 Business Analysis Anomaly Detection IT Operations DevOps Release Application Support Application Development Application Owner Use Cases
  • 14. 14
  • 15. APM Players Dynatrace AppDynamics (Cisco) Datadog Splunk Broadcom (CA Technologies) New Relic Riverbed IBM Instana Oracle Tingyun SolarWinds ManageEngine Micro Focus 15
  • 16. How Does Monitoring & Tracing Work? 16 Operating Systems APM system tracking agent installed on the machine CPU, Memory, I/O, Network Code Tracing Instrumentation Manual Auto Runtime data collection
  • 17. Instrumentation – Original Pseudo Code 17 Function AddToBasket(var productId, var quantity) if (quantity < 0) return false var product = Dal.GetProductById(productId) BasketService.Add(product, quantity) return true
  • 18. Instrumentation – Add Logging on Errors 18 Function AddToBasket(var productId, var quantity) if (quantity < 0) Log(“Error: Negative quantity value”) return false var product = Dal.GetProductById(productId) BasketService.Add(product, quantity) return true
  • 19. Instrumentation – Add Metrics of Usage and Errors 19 Function AddToBasket(var productId, var quantity) metrics.Count(“AddToBasket”, 1) if (quantity < 0) Log(“Error: Negative quantity value”) metrics.Count(“AddToBasketFailure”, 1) return false var product = Dal.GetProductById(productId) BasketService.Add(product, quantity) return true
  • 20. Instrumentation – Measure Latency 20 Function AddToBasket(var productId, var quantity) metrics.Count(“AddToBasket”, 1) start = time() if (quantity < 0) Log(“Error: Negative quantity value”) metrics.Count(“AddToBasketFailure”, 1) return false var product = Dal.GetProductById(productId); BasketService.Add(product, quantity); metrics.Measure(“AddToBasket”, time() – start); return true;
  • 21. Instrumentation – Measure Latency Everywhere 21 Function AddToBasket(var productId, var quantity) metrics.Count(“AddToBasket”, 1) start = time() if (quantity < 0) Log(“Error: Negative quantity value”) metrics.Count(“AddToBasketFailure”, 1) return false var product = Dal.GetProductById(productId) metrics.Measure(“AddToBasket_GetProductById”, time() – start) BasketService.Add(product, quantity) metrics.Measure(“AddToBasket”, time() – start) return true
  • 22. Instrumentation – Add Debugging Information 22 Function AddToBasket(var productId, var quantity) debug.AddParameters(“AddToBasket”, [[“ProductId”, productid],[“quantity”, quantity]]) metrics.Count(“AddToBasket”, 1) start = time() if (quantity < 0) Log(“Error: Negative quantity value”) metrics.Count(“AddToBasketFailure”, 1) debug.AddError(“AddToBasket”, GetErrorData()) return false var product = Dal.GetProductById(productId) debug.AddValue(“AddToBasket”, [[“product”, product]]) metrics.Measure(“AddToBasket_GetProductById”, time() – start) BasketService.Add(product, quantity) metrics.Measure(“AddToBasket”, time() – start) return true
  • 23. Instrumentation – Original vs. Instrumented Code 23 Function AddToBasket(var productId, var quantity) debug.AddParameters(“AddToBasket”, [[“ProductId”, productid],[“quantity”, quantity]]) metrics.Count(“AddToBasket”, 1) start = time() if (quantity < 0) Log(“Error: Negative quantity value”) metrics.Count(“AddToBasketFailure”, 1) debug.AddError(“AddToBasket”, GetErrorData()) return false var product = Dal.GetProductById(productId) debug.AddValue(“AddToBasket”, [[“product”, product]]) metrics.Measure(“AddToBasket_GetProductById”, time() – start) BasketService.Add(product, quantity) metrics.Measure(“AddToBasket”, time() – start) return true
  • 24. Instrumentation and Tracing Automation  Aspect Oriented Approach  Communication level instrumentation  Pipeline interception – technology depended  Resource performance counters – DB statistics for example  Code Instrumentation  Manual – deploy a package and call it  Automatic – bytecode instrumentation libraries and tools  Distributed Tracing  Passing call context between services 24
  • 26. Instrumentation – Call Context 26 Function AddToBasket(var productId, var quantity, var context) debug.AddParameters(context, “AddToBasket”, [[“ProductId”, productid],[“quantity”, quantity]]) metrics.Count(context, “AddToBasket”, 1) start = time() if (quantity < 0) Log(context, “Error: Negative quantity value”) metrics.Count(context, “AddToBasketFailure”, 1) debug.AddError(context, “AddToBasket”, GetErrorData()) return false var product = Dal.GetProductById(context, productId) debug.AddValue(context, “AddToBasket”, [[“product”, product]]) metrics.Measure(context, “AddToBasket_GetProductById”, time() – start) BasketService.Add(context, product, quantity) metrics.Measure(context, “AddToBasket”, time() – start) return true Context: Call Id URL HTTP Method DB Host User Info Timing Info
  • 27. Instrumentation – Using Span 27 Function AddToBasket(var productId, var quantity, var context) span = trace.BeginSpan(context, {“AddToBasket”, productid, quantity}) if (quantity < 0) span.Error(“Negative quantity value”) return false; var product = Dal.GetProductById(context, productId) span.AddValue(“product”, product) BasketService.Add(context, product, quantity) span.End() return true; Span: Call Id URL HTTP Method DB Host User Info Timing Info
  • 29. What Do SREs & Developers Want – From Each Other? 29
  • 30. New Relic APM Dashboard
  • 31. APM Error Analysis – Not Enough Information Error Rate Request information Stack trace  APM systems can assist in health monitoring and fault first aid
  • 32. Production Problem Solving Challenges 10kg Can’t mess with data 10kg No Debugging tools 10kg Code is optimized 10kg Older source code version 10kg Can’t impact performance 10kg Data must stay in a secure env. 10kg Data is private and contains PII 10kg Very hard to reproduce the bug
  • 34. Production Problem Solving Platforms  OzCode  OverOps  Rookout  Application Insights 34
  • 35. Problem Solving With a Production Debugger 35

Editor's Notes

  1. MSA – many small parts deployed and communicate Simple components, Complex combination Very hard to follow a request that spans many services Must have automation process to overcome the complexity Must have health monitor, performance monitor and cross-services error handling TOOLS!!!
  2. More than CI/CD Ops  First aid medic, take vital signs CPU, Network, IO, Memory Request throughput and latency
  3. Wants easy life. Eats the meal that the Dev team cooked. The customer of the Dev team Bugs, Problem Solving Need to know the current situation with the current problems For example, can role back to a previous version, but need to know the status of the bug fix
  4. Information, Debuggability Reproduce the problem
  5. Analytics Business Insights Usage
  6. As Twitter has moved from a monolithic to a distributed architecture, our scalability has increased dramatically. Zipking – a distributed tracing system (https://zipkin.io/)
  7. Business Analysis - business related KPI IT Service Monitoring - health of Key Services Root Cause Analysis - a failure or degradation Anomaly Detection identifying system observations that do not conform to an expected behavior Distributed Profiling track transactions across a mesh of interconnected nodes, followed by detection of where along the path the degradation appears to be happening Application Debugging production debugging capabilities, based on distributed date collection
  8. Enable saying: 15% of our request fails
  9. Errors & problems root cause may be the result of
  10. Problem happens only with a specific user or URLs
  11. Problem happens only with a specific user or URLs
  12. OpenTelemetry makes robust, portable telemetry a built-in feature of cloud-native software. OpenTelemetry provides a single set of APIs, libraries, agents, and collector services to capture distributed traces and metrics from your application. You can analyze them using Prometheus, Jaeger, and other observability tools.
  13. DevOps, the true story Microservice Architecture, the complexity shift Ops & Monitoring Site Reliable Managers Developers & Observability Business (marketing, sales, management) and observability Application Performance Monitoring How does it work? Distributed Tracing Production problem solving