SlideShare a Scribd company logo
1 of 31
Beyond the % usage, an
in-depth look into
monitoring.
Twan Koot
Introduction
• Twan Koot 26 Years
• Senior Performance tester / Engineer @
• 5 years of IT experience.
• Loves: fast IT solutions on small hardware.
• Hates: unfounded decisions on IT architecture.
• Apart from working, I love to photograph and drive motorcycle.
Beyond the % usage, an in-depth look into monitoring.
• Topics:
• Method for analyzing recourse monitoring.
• Showcasing some monitoring metrics.
• Introduction into BCC tools.
Monitoring
Monitoring- thebasics – Analyze
• So, you have run a performance test and even had monitoring running.
• Now we can do the basic 3 step dance.
• Check CPU, RAM and IO.
• Check if any counter exceeds a static threshold like % usage.
• Match if peaks in recourses overlap peaks in response times.
Monitoring- thebasics – Dashboard hype
USE
Monitoring– USE – Brendan Gregg
• The USE method enables a Methodical approach to analyzing recourses.
• It is developed by Brendan Gregg. http://www.brendangregg.com
"Industry expert in computing performance and cloud computing. Solves
hard problems. Makes things faster."
Monitoring– USE – USE method
• Utilization – Average time a recourse was busy servicing work.
• Saturation – The degree of work which can't be handled, and which is
being queued.
• Errors – The amount of errors.
Recourse Utilization(Easy) Saturation(Moderate) Errors(Hard)
CPU CPU utilization (%) Run-queue Length /
scheduler latency
Correctable CPU
cache ECC events or
faulted CPUs
Memory Available free memory Anonymous paging or
thread swapping
Failed malloc()s
Storage device I/O Device busy % Wait queue length Device errors
Monitoring– USE – The flow
• How do we apply the USE method ?
• We can use the following flow:
Monitoring
Monitoring– Let’sgo deeper
Lots of tools
Monitoring– Let'sgo deeper - CPU Utilisation
• One of most measured metrics during a performance test.
• What does 80% utilization even mean ?
• Overloaded ?
One of the most misread metrics !
Util Sat
CPU
RAM
I/O
Monitoring– Let'sgo deeper - CPU Utilisation
• When checking the utilization counter we may observe:
• What is happening:
• What are we waiting for ?
• IO
• Memory
• So, CPU utilization is wrong ? It’s a good starting point to begin monitoring
Busy idle
Busy Waiting"stalled" Waiting"Idle"
Util Sat
CPU
RAM
I/O
Monitoring– Let'sgo deeper – CPU Saturation
• CPU saturation -> run queue
• Nmon "k"
• We see a run queue of 9, this will cause latency
Util Sat
CPU
RAM
I/O
Monitoring– Let'sgo deeper - Memory Utilisation
• Using Nmon “m”
Util Sat
CPU
RAM
I/O
Monitoring– Let'sgo deeper - Memory Saturation
• Using Nmon “m” “V”
• We can see lots of page activity and big usage of swap space.
Util Sat
CPU
RAM
I/O
Monitoring– Let'sgo deeper - IO Utilisation
• Using Nmon “d”
• We can see multiple counters for measuring IO utilization.
• We can measure the amount of data reads and writes to the disk and compare this to
specs.
• Reading 3726,2 transfers/sec.
Util Sat
CPU
RAM
I/O
Monitoring– Let'sgo deeper - IO Saturation
• Using iostat -d to filter to a specific disk.
• We can see we had a queue of ~ 43 requests (1 Sec interval).
• queueing means latency.
Util Sat
CPU
RAM
I/O
(e)BPF/BCC
Monitoring– Let'sgo deeper – eBPF
• ‘Extended’ Berkeley Packet Filter.
• In-kernel virtual machine, to run mini filtered programs.
• Gives access to many new metrics about kernel, performance, scheduler
and more.
• Some use-cases:
• Deep performance analysis
• Network tracing
• DDOS mitigation/detection
Monitoring– Let'sgo deeper -BCC
BPF Compiler Collection (BCC)
“BCC is a toolkit for creating efficient kernel tracing and manipulation
programs and includes several useful tools and examples.”
Monitoring– BCC – Overview
Lots of tools
Monitoring– BCC– CPU saturation/ Runqlat
• Runqlat is used to measure schedular latency.
We can even filter to specific PID:
Util Sat
CPU
RAM
I/O
Monitoring– BCC– Cachestat / Cachetop
• We can observe cache stats.
• We can show the same stats with more information.
Util Sat
CPU
RAM
I/O
Monitoring– BCC– Biolatency/ Filetop
• BCC contains powerful tools such as Biolatency
• Which measures Disk I/O latency:
• Using Filetop we can observe metrics about file activity.
Util Sat
CPU
RAM
I/O
Monitoring– BCC– Fileslower / Filelife
• We can measure File read and writes slower than a threshold:
• We can also measure the reads and writes to files using Filetop:
Util Sat
CPU
RAM
I/O
Monitoring– BCC– TCPlife
• Used for tracing TCP sessions that open and close.
Monitoring– Hopes for afterthis showcase
• More performance engineers start using a methodical approach for
analyzing recourse monitoring.
• Performance testers/engineers use this presentation as a start point to
learn more about in-depth monitoring.
• We can gaze upon more dashboards with metrics from eBPF or following
the USE method
Monitoring– Let’srecap
• Use USE for analyzing recourses.
• Begin with analyzing Utilization of the recourses.
• Go deeper by checking the Saturation metrics.
• When available check the Error metrics for the recourses.
• Use BCC tools to analyze even more metrics.
• When analyzing monitoring data keep Yoda in mind.
Thank you!
Thank you

More Related Content

Similar to Twan Koot - Beyond the % usage, an in-depth look into monitoring

Performance tuning Grails applications
 Performance tuning Grails applications Performance tuning Grails applications
Performance tuning Grails applicationsGR8Conf
 
05. performance-concepts
05. performance-concepts05. performance-concepts
05. performance-conceptsMuhammad Ahad
 
Progress OE performance management
Progress OE performance managementProgress OE performance management
Progress OE performance managementYassine MOALLA
 
Progress Openedge performance management
Progress Openedge performance managementProgress Openedge performance management
Progress Openedge performance managementYassine MOALLA
 
LISA2010 visualizations
LISA2010 visualizationsLISA2010 visualizations
LISA2010 visualizationsBrendan Gregg
 
Monitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisMonitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisBrendan Gregg
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf toolsBrendan Gregg
 
Database Fundamental Concepts - Series 2 Monitoring plan
Database Fundamental Concepts - Series 2 Monitoring planDatabase Fundamental Concepts - Series 2 Monitoring plan
Database Fundamental Concepts - Series 2 Monitoring planDAGEOP LTD
 
A Taste of Monitoring and Post Mortem Debugging with Node
A Taste of Monitoring and Post Mortem Debugging with Node A Taste of Monitoring and Post Mortem Debugging with Node
A Taste of Monitoring and Post Mortem Debugging with Node ibmwebspheresoftware
 
Observability with Spring-based distributed systems
Observability with Spring-based distributed systemsObservability with Spring-based distributed systems
Observability with Spring-based distributed systemsTommy Ludwig
 
Guider: An Integrated Runtime Performance Analyzer on AGL
Guider: An Integrated Runtime Performance Analyzer on AGLGuider: An Integrated Runtime Performance Analyzer on AGL
Guider: An Integrated Runtime Performance Analyzer on AGLPeace Lee
 
Performance tuning Grails Applications GR8Conf US 2014
Performance tuning Grails Applications GR8Conf US 2014Performance tuning Grails Applications GR8Conf US 2014
Performance tuning Grails Applications GR8Conf US 2014Lari Hotari
 
Webinar: Keep Calm and Scale Out - A proactive guide to Monitoring MongoDB
Webinar: Keep Calm and Scale Out - A proactive guide to Monitoring MongoDBWebinar: Keep Calm and Scale Out - A proactive guide to Monitoring MongoDB
Webinar: Keep Calm and Scale Out - A proactive guide to Monitoring MongoDBMongoDB
 
Performance tuning Grails applications
Performance tuning Grails applicationsPerformance tuning Grails applications
Performance tuning Grails applicationsLari Hotari
 
I/O systems chapter 12 OS
I/O systems chapter 12 OS I/O systems chapter 12 OS
I/O systems chapter 12 OS ssuser45ae56
 
EM12c Monitoring, Metric Extensions and Performance Pages
EM12c Monitoring, Metric Extensions and Performance PagesEM12c Monitoring, Metric Extensions and Performance Pages
EM12c Monitoring, Metric Extensions and Performance PagesEnkitec
 
Lecture for the day three in jj3 ppt.pdf
Lecture for the day three in jj3 ppt.pdfLecture for the day three in jj3 ppt.pdf
Lecture for the day three in jj3 ppt.pdfAhmedWasiu
 
Performance Analysis: The USE Method
Performance Analysis: The USE MethodPerformance Analysis: The USE Method
Performance Analysis: The USE MethodBrendan Gregg
 

Similar to Twan Koot - Beyond the % usage, an in-depth look into monitoring (20)

Performance tuning Grails applications
 Performance tuning Grails applications Performance tuning Grails applications
Performance tuning Grails applications
 
05. performance-concepts
05. performance-concepts05. performance-concepts
05. performance-concepts
 
Progress OE performance management
Progress OE performance managementProgress OE performance management
Progress OE performance management
 
Progress Openedge performance management
Progress Openedge performance managementProgress Openedge performance management
Progress Openedge performance management
 
LISA2010 visualizations
LISA2010 visualizationsLISA2010 visualizations
LISA2010 visualizations
 
Monitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisMonitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance Analysis
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
 
Database Fundamental Concepts - Series 2 Monitoring plan
Database Fundamental Concepts - Series 2 Monitoring planDatabase Fundamental Concepts - Series 2 Monitoring plan
Database Fundamental Concepts - Series 2 Monitoring plan
 
A Taste of Monitoring and Post Mortem Debugging with Node
A Taste of Monitoring and Post Mortem Debugging with Node A Taste of Monitoring and Post Mortem Debugging with Node
A Taste of Monitoring and Post Mortem Debugging with Node
 
Observability with Spring-based distributed systems
Observability with Spring-based distributed systemsObservability with Spring-based distributed systems
Observability with Spring-based distributed systems
 
Guider: An Integrated Runtime Performance Analyzer on AGL
Guider: An Integrated Runtime Performance Analyzer on AGLGuider: An Integrated Runtime Performance Analyzer on AGL
Guider: An Integrated Runtime Performance Analyzer on AGL
 
Java one2016
Java one2016Java one2016
Java one2016
 
Performance tuning Grails Applications GR8Conf US 2014
Performance tuning Grails Applications GR8Conf US 2014Performance tuning Grails Applications GR8Conf US 2014
Performance tuning Grails Applications GR8Conf US 2014
 
Webinar: Keep Calm and Scale Out - A proactive guide to Monitoring MongoDB
Webinar: Keep Calm and Scale Out - A proactive guide to Monitoring MongoDBWebinar: Keep Calm and Scale Out - A proactive guide to Monitoring MongoDB
Webinar: Keep Calm and Scale Out - A proactive guide to Monitoring MongoDB
 
Performance tuning Grails applications
Performance tuning Grails applicationsPerformance tuning Grails applications
Performance tuning Grails applications
 
I/O systems chapter 12 OS
I/O systems chapter 12 OS I/O systems chapter 12 OS
I/O systems chapter 12 OS
 
EM12c Monitoring, Metric Extensions and Performance Pages
EM12c Monitoring, Metric Extensions and Performance PagesEM12c Monitoring, Metric Extensions and Performance Pages
EM12c Monitoring, Metric Extensions and Performance Pages
 
Breaking data
Breaking dataBreaking data
Breaking data
 
Lecture for the day three in jj3 ppt.pdf
Lecture for the day three in jj3 ppt.pdfLecture for the day three in jj3 ppt.pdf
Lecture for the day three in jj3 ppt.pdf
 
Performance Analysis: The USE Method
Performance Analysis: The USE MethodPerformance Analysis: The USE Method
Performance Analysis: The USE Method
 

More from Neotys_Partner

Srivalli Aparna - The Blueprints to Success
Srivalli Aparna - The Blueprints to SuccessSrivalli Aparna - The Blueprints to Success
Srivalli Aparna - The Blueprints to SuccessNeotys_Partner
 
Leandro Melendez - Switching Performance Left & Right
Leandro Melendez - Switching Performance Left & RightLeandro Melendez - Switching Performance Left & Right
Leandro Melendez - Switching Performance Left & RightNeotys_Partner
 
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)Neotys_Partner
 
Hari Krishnan Ramachandran - Assuring Performance for the Connected World
Hari Krishnan Ramachandran  - Assuring Performance for the Connected WorldHari Krishnan Ramachandran  - Assuring Performance for the Connected World
Hari Krishnan Ramachandran - Assuring Performance for the Connected WorldNeotys_Partner
 
Bruno Audoux - Connected Cars to the Net, IoTs on the Roads
Bruno Audoux - Connected Cars to the Net, IoTs on the RoadsBruno Audoux - Connected Cars to the Net, IoTs on the Roads
Bruno Audoux - Connected Cars to the Net, IoTs on the RoadsNeotys_Partner
 
Andreas Grabner - Performance as Code, Let's Make It a Standard
Andreas Grabner - Performance as Code, Let's Make It a StandardAndreas Grabner - Performance as Code, Let's Make It a Standard
Andreas Grabner - Performance as Code, Let's Make It a StandardNeotys_Partner
 
Alexander Podelko - Context-Driven Performance Testing
Alexander Podelko - Context-Driven Performance TestingAlexander Podelko - Context-Driven Performance Testing
Alexander Podelko - Context-Driven Performance TestingNeotys_Partner
 
Alan Gordon - Building a Holistic Performance Management Platform
Alan Gordon - Building a Holistic Performance Management PlatformAlan Gordon - Building a Holistic Performance Management Platform
Alan Gordon - Building a Holistic Performance Management PlatformNeotys_Partner
 
Stijn Schepers - Performance Test Automation Beyond Frontier
Stijn Schepers - Performance Test Automation Beyond FrontierStijn Schepers - Performance Test Automation Beyond Frontier
Stijn Schepers - Performance Test Automation Beyond FrontierNeotys_Partner
 
Stephen Townshend - Constellations
Stephen Townshend - ConstellationsStephen Townshend - Constellations
Stephen Townshend - ConstellationsNeotys_Partner
 
Stefano Doni - Achieve Superhuman Performance with Machine Learning
Stefano Doni - Achieve Superhuman Performance with Machine LearningStefano Doni - Achieve Superhuman Performance with Machine Learning
Stefano Doni - Achieve Superhuman Performance with Machine LearningNeotys_Partner
 
Neotys PAC 2018 - Gayatree Nalwadad
Neotys PAC 2018 - Gayatree NalwadadNeotys PAC 2018 - Gayatree Nalwadad
Neotys PAC 2018 - Gayatree NalwadadNeotys_Partner
 
PAC 2018 - Stijn Schepers
PAC 2018 - Stijn SchepersPAC 2018 - Stijn Schepers
PAC 2018 - Stijn SchepersNeotys_Partner
 
Neotys PAC 2018 - Helen Bally
Neotys PAC 2018 - Helen BallyNeotys PAC 2018 - Helen Bally
Neotys PAC 2018 - Helen BallyNeotys_Partner
 
Neotys PAC 2018 - Mark Tomlinson
Neotys PAC 2018 - Mark TomlinsonNeotys PAC 2018 - Mark Tomlinson
Neotys PAC 2018 - Mark TomlinsonNeotys_Partner
 
Neotys PAC 2018 - Wilson Mar
Neotys PAC 2018 - Wilson MarNeotys PAC 2018 - Wilson Mar
Neotys PAC 2018 - Wilson MarNeotys_Partner
 
Neotys PAC 2018 - Thomas Steinmaurer
Neotys PAC 2018 - Thomas SteinmaurerNeotys PAC 2018 - Thomas Steinmaurer
Neotys PAC 2018 - Thomas SteinmaurerNeotys_Partner
 
Neotys PAC 2018 - Todd De Capua
Neotys PAC 2018 - Todd De CapuaNeotys PAC 2018 - Todd De Capua
Neotys PAC 2018 - Todd De CapuaNeotys_Partner
 
Neotys PAC 2018 - Tingting Zong
Neotys PAC 2018 - Tingting ZongNeotys PAC 2018 - Tingting Zong
Neotys PAC 2018 - Tingting ZongNeotys_Partner
 

More from Neotys_Partner (20)

Srivalli Aparna - The Blueprints to Success
Srivalli Aparna - The Blueprints to SuccessSrivalli Aparna - The Blueprints to Success
Srivalli Aparna - The Blueprints to Success
 
Leandro Melendez - Switching Performance Left & Right
Leandro Melendez - Switching Performance Left & RightLeandro Melendez - Switching Performance Left & Right
Leandro Melendez - Switching Performance Left & Right
 
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
 
Hari Krishnan Ramachandran - Assuring Performance for the Connected World
Hari Krishnan Ramachandran  - Assuring Performance for the Connected WorldHari Krishnan Ramachandran  - Assuring Performance for the Connected World
Hari Krishnan Ramachandran - Assuring Performance for the Connected World
 
Bruno Audoux - Connected Cars to the Net, IoTs on the Roads
Bruno Audoux - Connected Cars to the Net, IoTs on the RoadsBruno Audoux - Connected Cars to the Net, IoTs on the Roads
Bruno Audoux - Connected Cars to the Net, IoTs on the Roads
 
Andreas Grabner - Performance as Code, Let's Make It a Standard
Andreas Grabner - Performance as Code, Let's Make It a StandardAndreas Grabner - Performance as Code, Let's Make It a Standard
Andreas Grabner - Performance as Code, Let's Make It a Standard
 
Alexander Podelko - Context-Driven Performance Testing
Alexander Podelko - Context-Driven Performance TestingAlexander Podelko - Context-Driven Performance Testing
Alexander Podelko - Context-Driven Performance Testing
 
Alan Gordon - Building a Holistic Performance Management Platform
Alan Gordon - Building a Holistic Performance Management PlatformAlan Gordon - Building a Holistic Performance Management Platform
Alan Gordon - Building a Holistic Performance Management Platform
 
Stijn Schepers - Performance Test Automation Beyond Frontier
Stijn Schepers - Performance Test Automation Beyond FrontierStijn Schepers - Performance Test Automation Beyond Frontier
Stijn Schepers - Performance Test Automation Beyond Frontier
 
Stephen Townshend - Constellations
Stephen Townshend - ConstellationsStephen Townshend - Constellations
Stephen Townshend - Constellations
 
Stefano Doni - Achieve Superhuman Performance with Machine Learning
Stefano Doni - Achieve Superhuman Performance with Machine LearningStefano Doni - Achieve Superhuman Performance with Machine Learning
Stefano Doni - Achieve Superhuman Performance with Machine Learning
 
Neotys PAC 2018 - Gayatree Nalwadad
Neotys PAC 2018 - Gayatree NalwadadNeotys PAC 2018 - Gayatree Nalwadad
Neotys PAC 2018 - Gayatree Nalwadad
 
PAC 2018 - Stijn Schepers
PAC 2018 - Stijn SchepersPAC 2018 - Stijn Schepers
PAC 2018 - Stijn Schepers
 
Neotys PAC 2018 - Helen Bally
Neotys PAC 2018 - Helen BallyNeotys PAC 2018 - Helen Bally
Neotys PAC 2018 - Helen Bally
 
Neotys PAC 2018 - Mark Tomlinson
Neotys PAC 2018 - Mark TomlinsonNeotys PAC 2018 - Mark Tomlinson
Neotys PAC 2018 - Mark Tomlinson
 
Neotys PAC 2018 - Wilson Mar
Neotys PAC 2018 - Wilson MarNeotys PAC 2018 - Wilson Mar
Neotys PAC 2018 - Wilson Mar
 
Neotys PAC - Zak Cole
Neotys PAC - Zak ColeNeotys PAC - Zak Cole
Neotys PAC - Zak Cole
 
Neotys PAC 2018 - Thomas Steinmaurer
Neotys PAC 2018 - Thomas SteinmaurerNeotys PAC 2018 - Thomas Steinmaurer
Neotys PAC 2018 - Thomas Steinmaurer
 
Neotys PAC 2018 - Todd De Capua
Neotys PAC 2018 - Todd De CapuaNeotys PAC 2018 - Todd De Capua
Neotys PAC 2018 - Todd De Capua
 
Neotys PAC 2018 - Tingting Zong
Neotys PAC 2018 - Tingting ZongNeotys PAC 2018 - Tingting Zong
Neotys PAC 2018 - Tingting Zong
 

Recently uploaded

Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 

Recently uploaded (20)

Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 

Twan Koot - Beyond the % usage, an in-depth look into monitoring

  • 1. Beyond the % usage, an in-depth look into monitoring. Twan Koot
  • 2. Introduction • Twan Koot 26 Years • Senior Performance tester / Engineer @ • 5 years of IT experience. • Loves: fast IT solutions on small hardware. • Hates: unfounded decisions on IT architecture. • Apart from working, I love to photograph and drive motorcycle.
  • 3. Beyond the % usage, an in-depth look into monitoring. • Topics: • Method for analyzing recourse monitoring. • Showcasing some monitoring metrics. • Introduction into BCC tools.
  • 5. Monitoring- thebasics – Analyze • So, you have run a performance test and even had monitoring running. • Now we can do the basic 3 step dance. • Check CPU, RAM and IO. • Check if any counter exceeds a static threshold like % usage. • Match if peaks in recourses overlap peaks in response times.
  • 6. Monitoring- thebasics – Dashboard hype
  • 7. USE
  • 8. Monitoring– USE – Brendan Gregg • The USE method enables a Methodical approach to analyzing recourses. • It is developed by Brendan Gregg. http://www.brendangregg.com "Industry expert in computing performance and cloud computing. Solves hard problems. Makes things faster."
  • 9. Monitoring– USE – USE method • Utilization – Average time a recourse was busy servicing work. • Saturation – The degree of work which can't be handled, and which is being queued. • Errors – The amount of errors. Recourse Utilization(Easy) Saturation(Moderate) Errors(Hard) CPU CPU utilization (%) Run-queue Length / scheduler latency Correctable CPU cache ECC events or faulted CPUs Memory Available free memory Anonymous paging or thread swapping Failed malloc()s Storage device I/O Device busy % Wait queue length Device errors
  • 10. Monitoring– USE – The flow • How do we apply the USE method ? • We can use the following flow:
  • 13. Monitoring– Let'sgo deeper - CPU Utilisation • One of most measured metrics during a performance test. • What does 80% utilization even mean ? • Overloaded ? One of the most misread metrics ! Util Sat CPU RAM I/O
  • 14. Monitoring– Let'sgo deeper - CPU Utilisation • When checking the utilization counter we may observe: • What is happening: • What are we waiting for ? • IO • Memory • So, CPU utilization is wrong ? It’s a good starting point to begin monitoring Busy idle Busy Waiting"stalled" Waiting"Idle" Util Sat CPU RAM I/O
  • 15. Monitoring– Let'sgo deeper – CPU Saturation • CPU saturation -> run queue • Nmon "k" • We see a run queue of 9, this will cause latency Util Sat CPU RAM I/O
  • 16. Monitoring– Let'sgo deeper - Memory Utilisation • Using Nmon “m” Util Sat CPU RAM I/O
  • 17. Monitoring– Let'sgo deeper - Memory Saturation • Using Nmon “m” “V” • We can see lots of page activity and big usage of swap space. Util Sat CPU RAM I/O
  • 18. Monitoring– Let'sgo deeper - IO Utilisation • Using Nmon “d” • We can see multiple counters for measuring IO utilization. • We can measure the amount of data reads and writes to the disk and compare this to specs. • Reading 3726,2 transfers/sec. Util Sat CPU RAM I/O
  • 19. Monitoring– Let'sgo deeper - IO Saturation • Using iostat -d to filter to a specific disk. • We can see we had a queue of ~ 43 requests (1 Sec interval). • queueing means latency. Util Sat CPU RAM I/O
  • 21. Monitoring– Let'sgo deeper – eBPF • ‘Extended’ Berkeley Packet Filter. • In-kernel virtual machine, to run mini filtered programs. • Gives access to many new metrics about kernel, performance, scheduler and more. • Some use-cases: • Deep performance analysis • Network tracing • DDOS mitigation/detection
  • 22. Monitoring– Let'sgo deeper -BCC BPF Compiler Collection (BCC) “BCC is a toolkit for creating efficient kernel tracing and manipulation programs and includes several useful tools and examples.”
  • 23. Monitoring– BCC – Overview Lots of tools
  • 24. Monitoring– BCC– CPU saturation/ Runqlat • Runqlat is used to measure schedular latency. We can even filter to specific PID: Util Sat CPU RAM I/O
  • 25. Monitoring– BCC– Cachestat / Cachetop • We can observe cache stats. • We can show the same stats with more information. Util Sat CPU RAM I/O
  • 26. Monitoring– BCC– Biolatency/ Filetop • BCC contains powerful tools such as Biolatency • Which measures Disk I/O latency: • Using Filetop we can observe metrics about file activity. Util Sat CPU RAM I/O
  • 27. Monitoring– BCC– Fileslower / Filelife • We can measure File read and writes slower than a threshold: • We can also measure the reads and writes to files using Filetop: Util Sat CPU RAM I/O
  • 28. Monitoring– BCC– TCPlife • Used for tracing TCP sessions that open and close.
  • 29. Monitoring– Hopes for afterthis showcase • More performance engineers start using a methodical approach for analyzing recourse monitoring. • Performance testers/engineers use this presentation as a start point to learn more about in-depth monitoring. • We can gaze upon more dashboards with metrics from eBPF or following the USE method
  • 30. Monitoring– Let’srecap • Use USE for analyzing recourses. • Begin with analyzing Utilization of the recourses. • Go deeper by checking the Saturation metrics. • When available check the Error metrics for the recourses. • Use BCC tools to analyze even more metrics. • When analyzing monitoring data keep Yoda in mind. Thank you!

Editor's Notes

  1. Again Welcome, my name is Twan Koot I’m 26 years old and a Senior performance tester/ engineer @ Sogeti Netherlands. I currently have 5 Years of IT experience and started my career as tester at a small IT firm. Quite quickly I was far more interested in technical aspect of testing rather then the functional side of software. Starting with test automation and security testing I found my drive in discovering and learning new technology and technical skills. After 2 years I joined Sogeti and quickly encountered performance testing and ever since I’m hooked. Currently active for multiple clients where I’m implementing a CI/CD performance pipeline and Coach and training junior performance testers. I really love fast IT solutions on small hardware and really hate unfounded decisions on IT architecture. Apart from working I love to photograph and drive motorcycle.
  2. So now that we have the introduction out of the way we can focus on what’s interesting. This presentation will be going to mainly on monitoring hardware recourses and secondly, I will highlight a real interesting and handy method to analyze performance issues. And we will finish off with a selection of tools form the BCC collection to showcase some in-depth monitoring. After the presentation we have a moment to get into Q&A.
  3. All right now we can really start. So we first I'm going to chat about my current vision on how many performance testers are monitoring and analyzing recourse monitoring.
  4. So we have run our performance test and we even had some cool monitoring running during the test. Now we can do our what I call 3 step dance. First we check the CPU RAM and IO. We grab our graphs and check if any of our counters exceeds any threshold during the test. Well next thing is explaining peaks in response times and check if we can correlate this with the monitoring data. This analyzing approach isn't based on  any method or best practices, it's normal approach because how are brains work. Studies have shown that we are constantly looking for patterns to make better and faster decisions. Taking the previous in to account is logical that we look at 2 graphs and try to find the correlation between cpu% and response times.    Then again in my opinion this is only the basics and doesn’t deserve to be called an analysis.    
  5. So we all know these awesome dashboards full of charts. It's full of information and data about the performance test or live production data. But what are we monitoring ? Do we have a clear idea why these metrics are selected to be on the dashboard ? In the past years, many APM tools and other convenient tools have gained popularity. But with them have come the lack of in-depth knowledge about what your fancy tool is showing you in the even fancier dashboard. This reminds me of the episode of How I Met Your Mother where Marshall is showing his favorite charts using charts, I find  that quite accurate way of the state of many performance testers using monitoring tools. Its’s like a look a that cool pie chart or look we need a fancy graph to show our cpu usage. Using one specific tool will shift your focus from the metrics that aren’t available in that particular tool and with that create a sort of tunnel vision on monitoring metrics. In this presentation we go beyond these fancy dashboard metrics and show some cool features that will help you in analyzing performance issues.    
  6. First, we need a convient way to help us analyze performance counters. So let's introduce a method to analyze performance issues or results in a structure way. So let's use USE !
  7. Well first start with in my opinion one of the gurus about performance engineering Brendan Gregg. His post and talks have triggered me to go explore, learn and talk about the great possibilities monitoring on a lower level has to offer. Brendan has introduced the USE method he has talked about this method on his personal website and in his Book: System Performance. This book Is filled to brim with knowledge every performance engineer should know some things about. His very short Bio is : “industry expert in computing performance and cloud computing. Solves hard problems. Makes things faster.”
  8. So, USE in CAPS by the way because it’s an abbreviation. The U stands for utilization, the amount of usage of a particular recourse or the better explanation: Average time a recourse was busy servicing work. The S stands for Saturation – the amount of work which can’t be handled and which is being queued. To me this is the most important one of all three and particular the queued part. Because we all know that standing in line is real life latency waiting for the process that’s you need to be done. Like yesterday morning when I was waiting inline for a coffee, it’s time which isn’t spent on anything useful. And now we come to the last part of USE the E it stands for errors. Will this is really easy explanation it is the amount of errors So now we have three new metrics on which we can analyze recourse counters. In the Table on the slide you can see USE being applied on CPU, Memory and Storage device IO. So let’s start with the first column Utilization, one of the easiest metric to monitor. For CPU we can use CPU utilization (%), for Memory we can measure free memory and for IO we can use Device busy(%). These metrics are quite easy to measure and on first glance are easy to understand. The second column is Saturation and we can measure the following Run-queue length/ schedular latency, for memory we can look at swapping and paging counters and for IO we can check to wait queue length. For errors we can look at CPU errors like ECC events or even faulty CPU errors. On the memory side we can check failed malloc()s and on the IO part Device errors is a good metric. As stated at every column Is labelled from easy to hard. During the next slides will go in to detail on the first 2 columns of the table. Because of the lack of availability of error counters in a cloud env. We will focus on the first 2.
  9. We now have a good understanding about USE is about and then comes the time to apply it during an analysis. Well there are a couple of benefits in using USE. It will guide you to methodically analyze recourse counters and performance issues. To use USE we can apply the following flow. This is pretty straight forward flow which will guide you trough the process of recourse monitoring analysis. The flow start with selection an recourse you can usually create your own top 5 on which you want to start. Then you start with checking the error counters because a faulty memory bank is an easy and quick win on performance. Then you check for utilization and then saturation. As you can see the flow is pretty self explanatory.
  10. Now I will go to in-depth into the metrics which where shown 2 slides back.
  11. In order to Check some of the metrics which are used in the USE method. In this overview a functional diagram is shown of a system with al the hardware components. Each component has one of more tool which can be used to gather metrics on that component. As you can see the amount of tools is overwhelming. With all these tools we can measure allot and go really deep into checking which components may be the bottleneck in a performance issue.
  12. Again here I have mentioned that queuing means latency. But how much saturation or measered queueing is a problem ? When does queuing will add a significant amount of time to the total transaction time of your web call ? We can also measure this whit some handy tools and kernel features. In comes the next part of this presentation.
  13. Ideal to combine with run queue We can specify the queue to real latency We can even specify to a specific program so we can measure what the exact run queue latency was for that program. Via this way we can easily spot if we have a bottleneck on the schedular/cpu and can measure in a understandable metric which is easily compared.
  14. The same applies for IO we have measured the disk queue. But we don’t have an idea how much 43 request queued means for the amount of latency. To measure the latency time we can use Biolatency. This will allow is to measure the amount of calls and see how long the latency was and we can combine this with the disk queue to get a good idea to check when a disk queue will give problems and how much this contributes to increase in response time for instance. We Can also use Filetop to check all file read and writes which are called by certain programs. This will helps in specifying the workload we can see which files are called. How many times and by which program.