SlideShare a Scribd company logo
SELF-AWARE APPLICATIONS
AUTOMATIC PRODUCTION DIAGNOSIS
DINA GOLDSHTEIN
Agenda
Motivation

Hierarchy of self-monitoring

CPU profiling

GC monitoring

Deadlock detection
2
Motivation
Why monitor?

Obviously a must for servers and/or anything at scale
3
Motivation
Why monitor?

Obviously a must for servers and/or anything at scale

But also a must for “simple” shrink-wrap software
4
Motivation
Why monitor?

Obviously a must for servers and/or anything at scale

But also a must for “simple” shrink-wrap software

Why our own?
5
Motivation
Why monitor?

Obviously a must for servers and/or anything at scale

But also a must for “simple” shrink-wrap software

Why our own?

Customized for specific business needs
6
Motivation
Why monitor?

Obviously a must for servers and/or anything at scale

But also a must for “simple” shrink-wrap software

Why our own?

Customized for specific business needs

Diagnostics flow from ground up
7
Motivation
Why monitor?

Obviously a must for servers and/or anything at scale

But also a must for “simple” shrink-wrap software

Why our own?

Customized for specific business needs

Diagnostics flow from ground up

That’s what all the cool kids do :-)
8
Master Plan - Use Hierarchy!
9
Master Plan - Use Hierarchy!
Lightweight for continuous and frequent monitoring of all basics

Numerical resource consumption: CPU, memory, disk

Performance counters, Win32 APIs
10
Master Plan - Use Hierarchy!
Lightweight for continuous and frequent monitoring of all basics

Numerical resource consumption: CPU, memory, disk

Performance counters, Win32 APIs

Medium for less frequency events

Rare exceptions, deadlocks

ETW, ClrMD
11
Master Plan - Use Hierarchy!
Lightweight for continuous and frequent monitoring of all basics

Numerical resource consumption: CPU, memory, disk

Performance counters, Win32 APIs

Medium for less frequency events

Rare exceptions, deadlocks

ETW, ClrMD

Invasive for deep-dive and concrete diagnostics

Memory leaks, bulk call-stack data (e.g. CPU profiling)

CLR Profiling API, CLR Debugging API, hooks
12
LET’S GET TO BUSINESS
(Self) CPU-Profiling
Monitor CPU using performance counters

Are we above a certain threshold for a certain amount of time? 

Turn on ETW and collect stacks (live using LiveStacks)

Find hot paths, produce flame graphs

Suggest recommendations
14
What Can Be Done?
AuthenticationController takes 95% CPU, maybe we're being
DDoS'ed

Image processing component takes 100% CPU, need to auto-scale
the app

Encoding this 30 second video takes 3 minutes at 100% CPU, tell the
user she can send us a bug report
15
DEMO
MONITOR FOR CPU SPIKES
This Is From Real Life
17
(Self) GC-Monitoring
Monitor GC performance using performance counters

Register on ETW’s GC events such as GCAllocationTick

Types of objects allocated and their stacks(!)

Number of GCs of each kind and size of reclaimed memory

Duration of GC pauses

Attach ClrMD to get heap breakdown

Generations, segments, reserved/committed, number of objects
18
DEMO
MONITOR FOR ALLOCATION SPIKES
(Self) Deadlock Detection
Monitor for potential deadlock

Low CPU

Request timeouts

Increased thread count

Attach ClrMD to create wait chains and detect deadlocks

Report, try to break, pray for a miracle…
20
DEMO
DETECT A DEADLOCK
Food for Thought
Many more scenarios are possible
22
Food for Thought
Many more scenarios are possible

Monitor heap fragmentation and compact large objects if needed
23
Food for Thought
Many more scenarios are possible

Monitor heap fragmentation and compact large objects if needed

Memory leak analysis (both native and managed)
24
Food for Thought
Many more scenarios are possible

Monitor heap fragmentation and compact large objects if needed

Memory leak analysis (both native and managed) 

Side notes:

ClrMD is also very suitable for automating crash dump analysis

You can automate opening tickets in bug tracker, consolidate same
issue from different users, versions, etc.
25
Not Everything Is Perfect
The pros are obvious (visibility, easy scaling…)
26
Not Everything Is Perfect
The pros are obvious (visibility, easy scaling…)

But there are some cons as well…
27
Not Everything Is Perfect
The pros are obvious (visibility, easy scaling…)

But there are some cons as well…

Adds complexity (reduce risk by using separate process)
28
Not Everything Is Perfect
The pros are obvious (visibility, easy scaling…)

But there are some cons as well…

Adds complexity (reduce risk by using separate process)

Adds overhead
29
Not Everything Is Perfect
The pros are obvious (visibility, easy scaling…)

But there are some cons as well…

Adds complexity (reduce risk by using separate process)

Adds overhead

Requires additional development
30
Summary
Self-monitoring is important for all kinds of software

Best to create a hierarchy of monitoring (and overhead and
complexity)

Lots of scenarios: CPU, GC, memory, deadlocks

Demos: https://github.com/dinazil/self-aware-applications
31
THANK YOU
DINA GOLDSHTEIN
@DINAGOZIL

More Related Content

Similar to Self-Aware Applications: Automatic Production Monitoring (SDP November 2017)

WebSphere Technical University: Introduction to the Java Diagnostic Tools
WebSphere Technical University: Introduction to the Java Diagnostic ToolsWebSphere Technical University: Introduction to the Java Diagnostic Tools
WebSphere Technical University: Introduction to the Java Diagnostic Tools
Chris Bailey
 
A165 tools for java and javascript
A165 tools for java and javascriptA165 tools for java and javascript
A165 tools for java and javascript
Toby Corbin
 
Introduction to DevOps
Introduction to DevOpsIntroduction to DevOps
Introduction to DevOps
OCTO Technology
 
Dot Net Application Monitoring
Dot Net Application MonitoringDot Net Application Monitoring
Dot Net Application Monitoring
Ravi Okade
 
Monitor everything
Monitor everythingMonitor everything
Monitor everything
Brian Christner
 
How to build a proper software staging environment for testing
How to build a proper software staging environment for testing How to build a proper software staging environment for testing
How to build a proper software staging environment for testing TestCampRO
 
Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure
Stay clear of the bugs: Troubleshooting Applications in Microsoft AzureStay clear of the bugs: Troubleshooting Applications in Microsoft Azure
Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure
HARMAN Services
 
How to build observability into Serverless (O'Reilly Velocity 2018)
How to build observability into Serverless (O'Reilly Velocity 2018)How to build observability into Serverless (O'Reilly Velocity 2018)
How to build observability into Serverless (O'Reilly Velocity 2018)
Yan Cui
 
Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"
Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"
Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"
Yulia Tsisyk
 
IBM Monitoring and Diagnostic Tools - GCMV 2.8
IBM Monitoring and Diagnostic Tools - GCMV 2.8IBM Monitoring and Diagnostic Tools - GCMV 2.8
IBM Monitoring and Diagnostic Tools - GCMV 2.8
Chris Bailey
 
Shift-left SRE: Self-healing on OpenShift with Ansible
Shift-left SRE: Self-healing on OpenShift with AnsibleShift-left SRE: Self-healing on OpenShift with Ansible
Shift-left SRE: Self-healing on OpenShift with Ansible
Jürgen Etzlstorfer
 
How to build observability into a serverless application
How to build observability into a serverless applicationHow to build observability into a serverless application
How to build observability into a serverless application
Yan Cui
 
Yan Cui - How to build observability into a serverless application - Codemoti...
Yan Cui - How to build observability into a serverless application - Codemoti...Yan Cui - How to build observability into a serverless application - Codemoti...
Yan Cui - How to build observability into a serverless application - Codemoti...
Codemotion
 
Common asp.net production issues rev
Common asp.net production issues revCommon asp.net production issues rev
Common asp.net production issues rev
Tess Ferrandez
 
Agentless System Crawler - InterConnect 2016
Agentless System Crawler - InterConnect 2016Agentless System Crawler - InterConnect 2016
Agentless System Crawler - InterConnect 2016
Canturk Isci
 
Automatically Discovering, Reporting and Reproducing Android Application Crashes
Automatically Discovering, Reporting and Reproducing Android Application CrashesAutomatically Discovering, Reporting and Reproducing Android Application Crashes
Automatically Discovering, Reporting and Reproducing Android Application Crashes
Kevin Moran
 
Techno-Fest-15nov16
Techno-Fest-15nov16Techno-Fest-15nov16
Techno-Fest-15nov16
Satish Navkar
 
Design Matters: Why In-Place Copy Data Management is the Right Choice
Design Matters: Why In-Place Copy Data Management is the Right Choice Design Matters: Why In-Place Copy Data Management is the Right Choice
Design Matters: Why In-Place Copy Data Management is the Right Choice
Catalogic Software
 
CQRS and Event Sourcing: A DevOps perspective
CQRS and Event Sourcing: A DevOps perspectiveCQRS and Event Sourcing: A DevOps perspective
CQRS and Event Sourcing: A DevOps perspective
Maria Gomez
 
Trends and development practices in Serverless architectures
Trends and development practices in Serverless architecturesTrends and development practices in Serverless architectures
Trends and development practices in Serverless architectures
DiUS
 

Similar to Self-Aware Applications: Automatic Production Monitoring (SDP November 2017) (20)

WebSphere Technical University: Introduction to the Java Diagnostic Tools
WebSphere Technical University: Introduction to the Java Diagnostic ToolsWebSphere Technical University: Introduction to the Java Diagnostic Tools
WebSphere Technical University: Introduction to the Java Diagnostic Tools
 
A165 tools for java and javascript
A165 tools for java and javascriptA165 tools for java and javascript
A165 tools for java and javascript
 
Introduction to DevOps
Introduction to DevOpsIntroduction to DevOps
Introduction to DevOps
 
Dot Net Application Monitoring
Dot Net Application MonitoringDot Net Application Monitoring
Dot Net Application Monitoring
 
Monitor everything
Monitor everythingMonitor everything
Monitor everything
 
How to build a proper software staging environment for testing
How to build a proper software staging environment for testing How to build a proper software staging environment for testing
How to build a proper software staging environment for testing
 
Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure
Stay clear of the bugs: Troubleshooting Applications in Microsoft AzureStay clear of the bugs: Troubleshooting Applications in Microsoft Azure
Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure
 
How to build observability into Serverless (O'Reilly Velocity 2018)
How to build observability into Serverless (O'Reilly Velocity 2018)How to build observability into Serverless (O'Reilly Velocity 2018)
How to build observability into Serverless (O'Reilly Velocity 2018)
 
Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"
Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"
Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"
 
IBM Monitoring and Diagnostic Tools - GCMV 2.8
IBM Monitoring and Diagnostic Tools - GCMV 2.8IBM Monitoring and Diagnostic Tools - GCMV 2.8
IBM Monitoring and Diagnostic Tools - GCMV 2.8
 
Shift-left SRE: Self-healing on OpenShift with Ansible
Shift-left SRE: Self-healing on OpenShift with AnsibleShift-left SRE: Self-healing on OpenShift with Ansible
Shift-left SRE: Self-healing on OpenShift with Ansible
 
How to build observability into a serverless application
How to build observability into a serverless applicationHow to build observability into a serverless application
How to build observability into a serverless application
 
Yan Cui - How to build observability into a serverless application - Codemoti...
Yan Cui - How to build observability into a serverless application - Codemoti...Yan Cui - How to build observability into a serverless application - Codemoti...
Yan Cui - How to build observability into a serverless application - Codemoti...
 
Common asp.net production issues rev
Common asp.net production issues revCommon asp.net production issues rev
Common asp.net production issues rev
 
Agentless System Crawler - InterConnect 2016
Agentless System Crawler - InterConnect 2016Agentless System Crawler - InterConnect 2016
Agentless System Crawler - InterConnect 2016
 
Automatically Discovering, Reporting and Reproducing Android Application Crashes
Automatically Discovering, Reporting and Reproducing Android Application CrashesAutomatically Discovering, Reporting and Reproducing Android Application Crashes
Automatically Discovering, Reporting and Reproducing Android Application Crashes
 
Techno-Fest-15nov16
Techno-Fest-15nov16Techno-Fest-15nov16
Techno-Fest-15nov16
 
Design Matters: Why In-Place Copy Data Management is the Right Choice
Design Matters: Why In-Place Copy Data Management is the Right Choice Design Matters: Why In-Place Copy Data Management is the Right Choice
Design Matters: Why In-Place Copy Data Management is the Right Choice
 
CQRS and Event Sourcing: A DevOps perspective
CQRS and Event Sourcing: A DevOps perspectiveCQRS and Event Sourcing: A DevOps perspective
CQRS and Event Sourcing: A DevOps perspective
 
Trends and development practices in Serverless architectures
Trends and development practices in Serverless architecturesTrends and development practices in Serverless architectures
Trends and development practices in Serverless architectures
 

More from Dina Goldshtein

How Does the Internet Work? (Wix she codes; branch)
How Does the Internet Work? (Wix she codes; branch)How Does the Internet Work? (Wix she codes; branch)
How Does the Internet Work? (Wix she codes; branch)
Dina Goldshtein
 
Look Mommy, No GC! (TechDays NL 2017)
Look Mommy, No GC! (TechDays NL 2017)Look Mommy, No GC! (TechDays NL 2017)
Look Mommy, No GC! (TechDays NL 2017)
Dina Goldshtein
 
ETW - Monitor Anything, Anytime, Anywhere (Velocity NYC 2017)
ETW - Monitor Anything, Anytime, Anywhere (Velocity NYC 2017)ETW - Monitor Anything, Anytime, Anywhere (Velocity NYC 2017)
ETW - Monitor Anything, Anytime, Anywhere (Velocity NYC 2017)
Dina Goldshtein
 
Look Mommy, no GC! (.NET Summit 2017)
Look Mommy, no GC! (.NET Summit 2017)Look Mommy, no GC! (.NET Summit 2017)
Look Mommy, no GC! (.NET Summit 2017)
Dina Goldshtein
 
Look Mommy, no GC! (BrightSource)
Look Mommy, no GC! (BrightSource)Look Mommy, no GC! (BrightSource)
Look Mommy, no GC! (BrightSource)
Dina Goldshtein
 
ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)
ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)
ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)
Dina Goldshtein
 
Look Mommy, no GC! (SDP May 2017)
Look Mommy, no GC! (SDP May 2017)Look Mommy, no GC! (SDP May 2017)
Look Mommy, no GC! (SDP May 2017)
Dina Goldshtein
 
Look Mommy, No GC! (Codecamp Iasi 2017)
Look Mommy, No GC! (Codecamp Iasi 2017)Look Mommy, No GC! (Codecamp Iasi 2017)
Look Mommy, No GC! (Codecamp Iasi 2017)
Dina Goldshtein
 
Look Mommy, No GC! (NDC London 2017)
Look Mommy, No GC! (NDC London 2017)Look Mommy, No GC! (NDC London 2017)
Look Mommy, No GC! (NDC London 2017)
Dina Goldshtein
 
How does the Internet Work?
How does the Internet Work?How does the Internet Work?
How does the Internet Work?
Dina Goldshtein
 
How does the Internet Work?
How does the Internet Work?How does the Internet Work?
How does the Internet Work?
Dina Goldshtein
 
Things They Don’t Teach You @ School
Things They Don’t Teach You @ SchoolThings They Don’t Teach You @ School
Things They Don’t Teach You @ School
Dina Goldshtein
 
What's New in C++ 11/14?
What's New in C++ 11/14?What's New in C++ 11/14?
What's New in C++ 11/14?
Dina Goldshtein
 
HTML5 Canvas
HTML5 CanvasHTML5 Canvas
HTML5 Canvas
Dina Goldshtein
 
JavaScript Basics
JavaScript BasicsJavaScript Basics
JavaScript Basics
Dina Goldshtein
 

More from Dina Goldshtein (15)

How Does the Internet Work? (Wix she codes; branch)
How Does the Internet Work? (Wix she codes; branch)How Does the Internet Work? (Wix she codes; branch)
How Does the Internet Work? (Wix she codes; branch)
 
Look Mommy, No GC! (TechDays NL 2017)
Look Mommy, No GC! (TechDays NL 2017)Look Mommy, No GC! (TechDays NL 2017)
Look Mommy, No GC! (TechDays NL 2017)
 
ETW - Monitor Anything, Anytime, Anywhere (Velocity NYC 2017)
ETW - Monitor Anything, Anytime, Anywhere (Velocity NYC 2017)ETW - Monitor Anything, Anytime, Anywhere (Velocity NYC 2017)
ETW - Monitor Anything, Anytime, Anywhere (Velocity NYC 2017)
 
Look Mommy, no GC! (.NET Summit 2017)
Look Mommy, no GC! (.NET Summit 2017)Look Mommy, no GC! (.NET Summit 2017)
Look Mommy, no GC! (.NET Summit 2017)
 
Look Mommy, no GC! (BrightSource)
Look Mommy, no GC! (BrightSource)Look Mommy, no GC! (BrightSource)
Look Mommy, no GC! (BrightSource)
 
ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)
ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)
ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)
 
Look Mommy, no GC! (SDP May 2017)
Look Mommy, no GC! (SDP May 2017)Look Mommy, no GC! (SDP May 2017)
Look Mommy, no GC! (SDP May 2017)
 
Look Mommy, No GC! (Codecamp Iasi 2017)
Look Mommy, No GC! (Codecamp Iasi 2017)Look Mommy, No GC! (Codecamp Iasi 2017)
Look Mommy, No GC! (Codecamp Iasi 2017)
 
Look Mommy, No GC! (NDC London 2017)
Look Mommy, No GC! (NDC London 2017)Look Mommy, No GC! (NDC London 2017)
Look Mommy, No GC! (NDC London 2017)
 
How does the Internet Work?
How does the Internet Work?How does the Internet Work?
How does the Internet Work?
 
How does the Internet Work?
How does the Internet Work?How does the Internet Work?
How does the Internet Work?
 
Things They Don’t Teach You @ School
Things They Don’t Teach You @ SchoolThings They Don’t Teach You @ School
Things They Don’t Teach You @ School
 
What's New in C++ 11/14?
What's New in C++ 11/14?What's New in C++ 11/14?
What's New in C++ 11/14?
 
HTML5 Canvas
HTML5 CanvasHTML5 Canvas
HTML5 Canvas
 
JavaScript Basics
JavaScript BasicsJavaScript Basics
JavaScript Basics
 

Recently uploaded

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
ViralQR
 

Recently uploaded (20)

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
 

Self-Aware Applications: Automatic Production Monitoring (SDP November 2017)

  • 2. Agenda Motivation Hierarchy of self-monitoring CPU profiling GC monitoring Deadlock detection 2
  • 3. Motivation Why monitor? Obviously a must for servers and/or anything at scale 3
  • 4. Motivation Why monitor? Obviously a must for servers and/or anything at scale But also a must for “simple” shrink-wrap software 4
  • 5. Motivation Why monitor? Obviously a must for servers and/or anything at scale But also a must for “simple” shrink-wrap software Why our own? 5
  • 6. Motivation Why monitor? Obviously a must for servers and/or anything at scale But also a must for “simple” shrink-wrap software Why our own? Customized for specific business needs 6
  • 7. Motivation Why monitor? Obviously a must for servers and/or anything at scale But also a must for “simple” shrink-wrap software Why our own? Customized for specific business needs Diagnostics flow from ground up 7
  • 8. Motivation Why monitor? Obviously a must for servers and/or anything at scale But also a must for “simple” shrink-wrap software Why our own? Customized for specific business needs Diagnostics flow from ground up That’s what all the cool kids do :-) 8
  • 9. Master Plan - Use Hierarchy! 9
  • 10. Master Plan - Use Hierarchy! Lightweight for continuous and frequent monitoring of all basics Numerical resource consumption: CPU, memory, disk Performance counters, Win32 APIs 10
  • 11. Master Plan - Use Hierarchy! Lightweight for continuous and frequent monitoring of all basics Numerical resource consumption: CPU, memory, disk Performance counters, Win32 APIs Medium for less frequency events Rare exceptions, deadlocks ETW, ClrMD 11
  • 12. Master Plan - Use Hierarchy! Lightweight for continuous and frequent monitoring of all basics Numerical resource consumption: CPU, memory, disk Performance counters, Win32 APIs Medium for less frequency events Rare exceptions, deadlocks ETW, ClrMD Invasive for deep-dive and concrete diagnostics Memory leaks, bulk call-stack data (e.g. CPU profiling) CLR Profiling API, CLR Debugging API, hooks 12
  • 13. LET’S GET TO BUSINESS
  • 14. (Self) CPU-Profiling Monitor CPU using performance counters Are we above a certain threshold for a certain amount of time? Turn on ETW and collect stacks (live using LiveStacks) Find hot paths, produce flame graphs Suggest recommendations 14
  • 15. What Can Be Done? AuthenticationController takes 95% CPU, maybe we're being DDoS'ed Image processing component takes 100% CPU, need to auto-scale the app Encoding this 30 second video takes 3 minutes at 100% CPU, tell the user she can send us a bug report 15
  • 17. This Is From Real Life 17
  • 18. (Self) GC-Monitoring Monitor GC performance using performance counters Register on ETW’s GC events such as GCAllocationTick Types of objects allocated and their stacks(!) Number of GCs of each kind and size of reclaimed memory Duration of GC pauses Attach ClrMD to get heap breakdown Generations, segments, reserved/committed, number of objects 18
  • 20. (Self) Deadlock Detection Monitor for potential deadlock Low CPU Request timeouts Increased thread count Attach ClrMD to create wait chains and detect deadlocks Report, try to break, pray for a miracle… 20
  • 22. Food for Thought Many more scenarios are possible 22
  • 23. Food for Thought Many more scenarios are possible Monitor heap fragmentation and compact large objects if needed 23
  • 24. Food for Thought Many more scenarios are possible Monitor heap fragmentation and compact large objects if needed Memory leak analysis (both native and managed) 24
  • 25. Food for Thought Many more scenarios are possible Monitor heap fragmentation and compact large objects if needed Memory leak analysis (both native and managed) Side notes: ClrMD is also very suitable for automating crash dump analysis You can automate opening tickets in bug tracker, consolidate same issue from different users, versions, etc. 25
  • 26. Not Everything Is Perfect The pros are obvious (visibility, easy scaling…) 26
  • 27. Not Everything Is Perfect The pros are obvious (visibility, easy scaling…) But there are some cons as well… 27
  • 28. Not Everything Is Perfect The pros are obvious (visibility, easy scaling…) But there are some cons as well… Adds complexity (reduce risk by using separate process) 28
  • 29. Not Everything Is Perfect The pros are obvious (visibility, easy scaling…) But there are some cons as well… Adds complexity (reduce risk by using separate process) Adds overhead 29
  • 30. Not Everything Is Perfect The pros are obvious (visibility, easy scaling…) But there are some cons as well… Adds complexity (reduce risk by using separate process) Adds overhead Requires additional development 30
  • 31. Summary Self-monitoring is important for all kinds of software Best to create a hierarchy of monitoring (and overhead and complexity) Lots of scenarios: CPU, GC, memory, deadlocks Demos: https://github.com/dinazil/self-aware-applications 31