SlideShare a Scribd company logo
1 of 46
Download to read offline
Performance Analysis:
The USE Method

Brendan Gregg
Lead Performance Engineer, Joyent
brendan.gregg@joyent.com

FISL13
July, 2012
whoami
• I work at the top of the performance support chain
• I also write open source performance tools
out of necessity to solve issues

• http://github.com/brendangregg
• http://www.brendangregg.com/#software
• And books (DTrace, Solaris Performance and Tools)
• Was Brendan @ Sun Microsystems, Oracle,
now Joyent
Joyent
• Cloud computing provider
• Cloud computing software
• SmartOS
• host OS, and guest via OS virtualization
• Linux, Windows
• guest via KVM
Agenda
• Example Problem
• Performance Methodology
• Problem Statement
• The USE Method
• Workload Characterization
• Drill-Down Analysis
• Specific Tools
Example Problem
• Recent cloud-based performance issue
• Customer problem statement:
• “Database response time sometimes take multiple
seconds. Is the network dropping packets?”

• Tested network using traceroute, which showed some
packet drops
Example: Support Path
• Performance Analysis
Top
2nd Level
1st Level

Customer Issues
Example: Support Path
• Performance Analysis
Top

my turn

2nd Level

“network looks ok,
CPU also ok”

1st Level

“ran traceroute,
can’t reproduce”

Customer: “network drops?”
Example: Network Drops
• Old fashioned: network packet capture (sniffing)
• Performance overhead during capture (CPU, storage)
and post-processing (wireshark)

• Time consuming to analyze: not real-time
Example: Network Drops
• New: dynamic tracing
• Efficient: only drop/retransmit paths traced
• Context: kernel state readable
• Real-time: analysis and summaries
# ./tcplistendrop.d
TIME
2012 Jan 19 01:22:49
2012 Jan 19 01:22:49
2012 Jan 19 01:22:49
2012 Jan 19 01:22:49
2012 Jan 19 01:22:49
2012 Jan 19 01:22:49
2012 Jan 19 01:22:49
[...]

SRC-IP
10.17.210.103
10.17.210.108
10.17.210.116
10.17.210.117
10.17.210.112
10.17.210.106
10.12.143.16

PORT
25691
18423
38883
10739
27988
28824
65070

->
->
->
->
->
->
->

DST-IP
192.192.240.212
192.192.240.212
192.192.240.212
192.192.240.212
192.192.240.212
192.192.240.212
192.192.240.212

PORT
80
80
80
80
80
80
80
Example: Methodology
• Instead of network drop analysis, I began with the
USE method to check system health
Example: Methodology
• Instead of network drop analysis, I began with the
USE method to check system health

• In < 5 minutes, I found:
• CPU: ok (light usage)
• network: ok (light usage)
• memory: available memory was exhausted, and the
system was paging

• disk: periodic bursts of 100% utilization
• The method is simple, fast, directs further analysis
Example: Other Methodologies
• Customer was surprised (are you sure?) I used

latency analysis to confirm. Details (if interesting):

• memory: using both microstate accounting and
dynamic tracing to confirm that anonymous pagins
were hurting the database; worst case app thread
spent 97% of time waiting on disk (data faults).

• disk: using dynamic tracing to confirm latency at the
application / file system interface; included up to
1000ms fsync() calls.

• Different methodology, smaller audience (expertise),
more time (1 hour).
Example: Summary
• What happened:
• customer, 1st and 2nd level support spent much time
chasing network packet drops.

• What could have happened:
• customer or 1st level follows the USE method and
quickly discover memory and disk issues

• memory: fixable by customer reconfig
• disk: could go back to 1st or 2nd level support for confirmation

• Faster resolution, frees time
Performance Methodology
• Not a tool
• Not a product
• Is a procedure (documentation)
Performance Methodology
• Not a tool -> but tools can be written to help
• Not a product -> could be in monitoring solutions
• Is a procedure (documentation)
Why Now: past
• Performance analysis circa ‘90s, metric-orientated:
• Vendor creates metrics and performance tools
• Users develop methods to interpret metrics
• Common method: “Tools Method”
• List available performance tools
• For each tool, list useful metrics
• For each metric, determine interpretation
• Problematic: vendors often don’t provide the best
metrics; can be blind to issue types
Why Now: changes
• Open Source
• Dynamic Tracing
• See anything, not just what the vendor gave you
• Only practical on open source software
• Hardest part is knowing what questions to ask
Why Now: present
• Performance analysis now (post dynamic tracing),
question-orientated:

• Users pose questions
• Check if vendor has provided metrics
• Develop custom metrics using dynamic tracing
• Methodologies pose the questions
• What would previously be an academic exercise is
now practical
Methology Audience
• Beginners: provides a starting point
• Experts: provides a checklist/reminder
Performance Methodolgies
• Suggested order of execution:
1.Problem Statement
2.The USE Method
3.Workload Characterization
4.Drill-Down Analysis (Latency)
Problem Statement
• Typical support procedure (1st Methodology):
1.What makes you think there is a problem?
2.Has this system ever performed well?
3.What changed? Software? Hardware? Load?
4.Can the performance degradation be expressed in
terms of latency or run time?
5.Does the problem affect other people or
applications?
6.What is the environment? What software and
hardware is used? Versions? Configuration?
The USE Method
• Quick System Health Check (2nd Methodology):
• For every resource, check:
• Utilization
• Saturation
• Errors
The USE Method
• Quick System Health Check (2nd Methodology):
• For every resource, check:
• Utilization: time resource was busy, or degree used
• Saturation: degree of queued extra work
• Errors: any errors

Saturation

X
Errors

Utilization
The USE Method: Hardware
Resources
• CPUs
• Main Memory
• Network Interfaces
• Storage Devices
• Controllers
• Interconnects
The USE Method: Hardware
Resources
• A great way to determine resources is to find (or
draw) the server functional diagram

• The hardware team at vendors should have these
• Analyze every component in the data path
The USE Method: Functional
Diagrams, Generic Example
Memory
Bus

DRAM

CPU
Interconnect

CPU
1

DRAM

CPU
2

I/O Bus

I/O
Bridge
I/O
Controller

Expander Interconnect

Network
Controller

Interface Transports

Disk

Disk

Port

Port
The USE Method: Resource
Types
• There are two different resource types, each define
utilization differently:

• I/O Resource: eg, network interface
• utilization: time resource was busy.
current IOPS / max or current throughput / max
can be used in some cases

• Capacity Resource: eg, main memory
• utilization: space consumed
• Storage devices act as both resource types
The USE Method: Software
Resources
• Mutex Locks
• Thread Pools
• Process/Thread Capacity
• File Descriptor Capacity
The USE Method: Flow Diagram
Choose Resource
Errors
Present?

Y

N

High
Utilization?

Y

N
N

Saturation?

Y

Problem
Identified
The USE Method: Interpretation
• Utilization
• 100% usually a bottleneck
• 70%+ often a bottleneck for I/O resources, especially
when high priority work cannot easily interrupt lower
priority work (eg, disks)

• Beware of time intervals. 60% utilized over 5 minutes
may mean 100% utilized for 3 minutes then idle

• Best examined per-device (unbalanced workloads)
The USE Method: Interpretation
• Saturation
• Any non-zero value adds latency
• Errors
• Should be obvious
The USE Method: Easy
Combinations
Resource

Type

CPU

utilization

CPU

saturation

Memory

utilization

Memory

saturation

Network Interface

utilization

Storage Device I/O utilization
Storage Device I/O saturation
Storage Device I/O errors

Metric
The USE Method: Easy
Combinations
Resource

Type

Metric

CPU

utilization

CPU utilization

CPU

saturation run-queue length

Memory

utilization

Memory

saturation paging or swapping

Network Interface

utilization

Storage Device I/O utilization

available memory

RX/TX tput/bandwidth
device busy percent

Storage Device I/O saturation wait queue length
Storage Device I/O errors

device errors
The USE Method: Harder
Combinations
Resource

Type

CPU

errors

Network

saturation

Storage Controller utilization
CPU Interconnect

utilization

Mem. Interconnect saturation
I/O Interconnect

saturation

Metric
The USE Method: Harder
Combinations
Resource

Type

Metric

CPU

errors

eg, correctable CPU
cache ECC events

Network

saturation “nocanputs”, buffering

Storage Controller utilization
CPU Interconnect

utilization

active vs max controller
IOPS and tput
per port tput / max
bandwidth

Mem. Interconnect saturation memory stall cycles
I/O Interconnect

bus throughput / max
saturation
bandwidth
The USE Method: tools
• To be thorough, you will need to use:
• CPU performance counters
• For bus and interconnect activity; eg, perf events, cpustat

• Dynamic Tracing
• For missing saturation and error metrics; eg, DTrace

• Both can get tricky; tools can be developed to help
• Please, no more top variants! ... unless it is
interconnect-top or bus-top

• I’ve written dozens of open source tools for both CPC
and DTrace; much more can be done
Workload Characterization
• May use as a 3rd Methodology
• Characterize workload by:
• who is causing the load? PID, UID, IP addr, ...
• why is the load called? code path
• what is the load? IOPS, tput, type
• how is the load changing over time?
• Best performance wins are from eliminating
unnecessary work

• Identifies class of issues that are load-based, not
architecture-based
Drill-Down Analysis
• May use as a 4th Methodology
• Peel away software layers to drill down on the issue
• Eg, software stack I/O latency analysis:
Application
System Call Interface
File System
Block Device Interface
Storage Device Drivers
Storage Devices
Drill-Down Analysis:
Open Source
• With Dynamic Tracing, all function entry & return

points can be traced, with nanosecond timestamps.

• One Strategy is to measure latency pairs, to search
for the source; eg, A->B & C->D:

static int
arc_cksum_equal(arc_buf_t *buf)
A{
zio_cksum_t zc;
int equal;

C

mutex_enter(&buf->b_hdr->b_freeze_lock);
fletcher_2_native(buf->b_data, buf->b_hdr->b_size, &zc);

D

equal = ZIO_CHECKSUM_EQUAL(*buf->b_hdr->b_freeze_cksum, zc);
mutex_exit(&buf->b_hdr->b_freeze_lock);

B}

return (equal);
Other Methodologies
• Method R
• A latency-based analysis approach for Oracle
databases. See “Optimizing Oracle Performance" by
Cary Millsap and Jeff Holt (2003)

• Experimental approaches
• Can be very useful: eg, validating network throughput
using iperf
Specific Tools for the USE
Method
illumos-based
• http://dtrace.org/blogs/brendan/2012/03/01/the-usemethod-solaris-performance-checklist/

Resource Type

Metric

CPU

Utilization

per-cpu: mpstat 1, “idl”; system-wide: vmstat 1, “id”;
per-process:prstat -c 1 (“CPU” == recent), prstat mLc 1 (“USR” + “SYS”); per-kernel-thread: lockstat -Ii
rate, DTrace profile stack()

Saturation

system-wide: uptime, load averages; vmstat 1, “r”;
DTrace dispqlen.d (DTT) for a better “vmstat r”; per-process:
prstat -mLc 1, “LAT”

Errors

fmadm faulty; cpustat (CPC) for whatever error
counters are supported (eg, thermal throttling)

Saturation

system-wide: vmstat 1, “sr” (bad now), “w” (was very
bad); vmstat -p 1, “api” (anon page ins == pain), “apo”;
per-process: prstat -mLc 1, “DFL”; DTrace anonpgpid.d
(DTT), vminfo:::anonpgin on execname

CPU
CPU
Memory

• ... etc for all combinations (would span a dozen slides)
Linux-based
• http://dtrace.org/blogs/brendan/2012/03/07/the-usemethod-linux-performance-checklist/

Resource Type

Metric

CPU

Utilization

per-cpu: mpstat -P ALL 1, “%idle”; sar -P ALL,
“%idle”; system-wide: vmstat 1, “id”; sar -u, “%idle”;
dstat -c, “idl”; per-process:top, “%CPU”; htop, “CPU%”;
ps -o pcpu; pidstat 1, “%CPU”; per-kernel-thread:
top/htop (“K” to toggle), where VIRT == 0 (heuristic). [1]

Saturation

system-wide: vmstat 1, “r” > CPU count [2]; sar -q,
“runq-sz” > CPU count; dstat -p, “run” > CPU count; perprocess: /proc/PID/schedstat 2nd field
(sched_info.run_delay); perf sched latency (shows
“Average” and “Maximum” delay per-schedule); dynamic
tracing, eg, SystemTap schedtimes.stp “queued(us)” [3]

Errors

perf (LPE) if processor specific error events (CPC) are
available; eg, AMD64′s “04Ah Single-bit ECC Errors Recorded
by Scrubber” [4]

CPU

CPU

• ... etc for all combinations (would span a dozen slides)
Products
• Earlier I said methodologies could be supported by
monitoring solutions

• At Joyent we develop Cloud Analytics:
Future
• Methodologies for advanced performance issues
• I recently worked a complex KVM bandwidth issue where
no current methodologies really worked

• Innovative methods based on open source +
dynamic tracing

• Less performance mystery. Less guesswork.
• Better use of resources (price/performance)
• Easier for beginners to get started
Thank you
• Resources:
• http://dtrace.org/blogs/brendan
• http://dtrace.org/blogs/brendan/2012/02/29/the-use-method/
• http://dtrace.org/blogs/brendan/tag/usemethod/
• http://dtrace.org/blogs/brendan/2011/12/18/visualizing-deviceutilization/ - ideas if you are a monitoring solution developer

• brendan@joyent.com

More Related Content

What's hot

Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Brendan Gregg
 
Pentesting GraphQL Applications
Pentesting GraphQL ApplicationsPentesting GraphQL Applications
Pentesting GraphQL ApplicationsNeelu Tripathy
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf toolsBrendan Gregg
 
ColdFusion for Penetration Testers
ColdFusion for Penetration TestersColdFusion for Penetration Testers
ColdFusion for Penetration TestersChris Gates
 
Pentest Application With GraphQL | Null Bangalore Meetup
Pentest Application With GraphQL | Null Bangalore Meetup Pentest Application With GraphQL | Null Bangalore Meetup
Pentest Application With GraphQL | Null Bangalore Meetup Divyanshu
 
DNS exfiltration using sqlmap
DNS exfiltration using sqlmapDNS exfiltration using sqlmap
DNS exfiltration using sqlmapMiroslav Stampar
 
Linux Performance Tools
Linux Performance ToolsLinux Performance Tools
Linux Performance ToolsBrendan Gregg
 
(Ab)Using GPOs for Active Directory Pwnage
(Ab)Using GPOs for Active Directory Pwnage(Ab)Using GPOs for Active Directory Pwnage
(Ab)Using GPOs for Active Directory PwnagePetros Koutroumpis
 
FIFA 온라인 3의 MongoDB 사용기
FIFA 온라인 3의 MongoDB 사용기FIFA 온라인 3의 MongoDB 사용기
FIFA 온라인 3의 MongoDB 사용기Jongwon Kim
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFBrendan Gregg
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesEd Hunter
 
Petit potam slides-rtfm-ossir
Petit potam slides-rtfm-ossirPetit potam slides-rtfm-ossir
Petit potam slides-rtfm-ossirLionelTopotam
 
Introduction to red team operations
Introduction to red team operationsIntroduction to red team operations
Introduction to red team operationsSunny Neo
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedBrendan Gregg
 
JDD 2017: Nginx + Lua = OpenResty (Marcin Stożek)
JDD 2017: Nginx + Lua = OpenResty (Marcin Stożek)JDD 2017: Nginx + Lua = OpenResty (Marcin Stożek)
JDD 2017: Nginx + Lua = OpenResty (Marcin Stożek)PROIDEA
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixBrendan Gregg
 
Kernel Recipes 2019 - Faster IO through io_uring
Kernel Recipes 2019 - Faster IO through io_uringKernel Recipes 2019 - Faster IO through io_uring
Kernel Recipes 2019 - Faster IO through io_uringAnne Nicolas
 
Security Monitoring with eBPF
Security Monitoring with eBPFSecurity Monitoring with eBPF
Security Monitoring with eBPFAlex Maestretti
 
redis 소개자료 - 네오클로바
redis 소개자료 - 네오클로바redis 소개자료 - 네오클로바
redis 소개자료 - 네오클로바NeoClova
 

What's hot (20)

Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)
 
Pentesting GraphQL Applications
Pentesting GraphQL ApplicationsPentesting GraphQL Applications
Pentesting GraphQL Applications
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
 
ColdFusion for Penetration Testers
ColdFusion for Penetration TestersColdFusion for Penetration Testers
ColdFusion for Penetration Testers
 
Pentest Application With GraphQL | Null Bangalore Meetup
Pentest Application With GraphQL | Null Bangalore Meetup Pentest Application With GraphQL | Null Bangalore Meetup
Pentest Application With GraphQL | Null Bangalore Meetup
 
DNS exfiltration using sqlmap
DNS exfiltration using sqlmapDNS exfiltration using sqlmap
DNS exfiltration using sqlmap
 
Linux Performance Tools
Linux Performance ToolsLinux Performance Tools
Linux Performance Tools
 
(Ab)Using GPOs for Active Directory Pwnage
(Ab)Using GPOs for Active Directory Pwnage(Ab)Using GPOs for Active Directory Pwnage
(Ab)Using GPOs for Active Directory Pwnage
 
FIFA 온라인 3의 MongoDB 사용기
FIFA 온라인 3의 MongoDB 사용기FIFA 온라인 3의 MongoDB 사용기
FIFA 온라인 3의 MongoDB 사용기
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPF
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slides
 
Petit potam slides-rtfm-ossir
Petit potam slides-rtfm-ossirPetit potam slides-rtfm-ossir
Petit potam slides-rtfm-ossir
 
Introduction to red team operations
Introduction to red team operationsIntroduction to red team operations
Introduction to red team operations
 
The basics of fluentd
The basics of fluentdThe basics of fluentd
The basics of fluentd
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting Started
 
JDD 2017: Nginx + Lua = OpenResty (Marcin Stożek)
JDD 2017: Nginx + Lua = OpenResty (Marcin Stożek)JDD 2017: Nginx + Lua = OpenResty (Marcin Stożek)
JDD 2017: Nginx + Lua = OpenResty (Marcin Stożek)
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
 
Kernel Recipes 2019 - Faster IO through io_uring
Kernel Recipes 2019 - Faster IO through io_uringKernel Recipes 2019 - Faster IO through io_uring
Kernel Recipes 2019 - Faster IO through io_uring
 
Security Monitoring with eBPF
Security Monitoring with eBPFSecurity Monitoring with eBPF
Security Monitoring with eBPF
 
redis 소개자료 - 네오클로바
redis 소개자료 - 네오클로바redis 소개자료 - 네오클로바
redis 소개자료 - 네오클로바
 

Viewers also liked

Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016Brendan Gregg
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Brendan Gregg
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and moreBrendan Gregg
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at NetflixBrendan Gregg
 
Lisa12 methodologies
Lisa12 methodologiesLisa12 methodologies
Lisa12 methodologiesBrendan Gregg
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudBrendan Gregg
 
SREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREsSREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREsBrendan Gregg
 
DTrace Topics: Introduction
DTrace Topics: IntroductionDTrace Topics: Introduction
DTrace Topics: IntroductionBrendan Gregg
 
ACM Applicative System Methodology 2016
ACM Applicative System Methodology 2016ACM Applicative System Methodology 2016
ACM Applicative System Methodology 2016Brendan Gregg
 
Linux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF SuperpowersLinux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF SuperpowersBrendan Gregg
 
The New Systems Performance
The New Systems PerformanceThe New Systems Performance
The New Systems PerformanceBrendan Gregg
 
Performance analysis 2013
Performance analysis 2013Performance analysis 2013
Performance analysis 2013Kerry Harrison
 
From DTrace to Linux
From DTrace to LinuxFrom DTrace to Linux
From DTrace to LinuxBrendan Gregg
 
Open Source Systems Performance
Open Source Systems PerformanceOpen Source Systems Performance
Open Source Systems PerformanceBrendan Gregg
 
Systems Performance: Enterprise and the Cloud
Systems Performance: Enterprise and the CloudSystems Performance: Enterprise and the Cloud
Systems Performance: Enterprise and the CloudBrendan Gregg
 
Designing Tracing Tools
Designing Tracing ToolsDesigning Tracing Tools
Designing Tracing ToolsBrendan Gregg
 
MeetBSD2014 Performance Analysis
MeetBSD2014 Performance AnalysisMeetBSD2014 Performance Analysis
MeetBSD2014 Performance AnalysisBrendan Gregg
 
Application Performance Monitoring
Application Performance MonitoringApplication Performance Monitoring
Application Performance MonitoringOlivier Gérardin
 
AppDynamics VS New Relic – The Complete Guide
AppDynamics VS New Relic – The Complete GuideAppDynamics VS New Relic – The Complete Guide
AppDynamics VS New Relic – The Complete GuideTakipi
 
Netflix: From Clouds to Roots
Netflix: From Clouds to RootsNetflix: From Clouds to Roots
Netflix: From Clouds to RootsBrendan Gregg
 

Viewers also liked (20)

Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
Lisa12 methodologies
Lisa12 methodologiesLisa12 methodologies
Lisa12 methodologies
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloud
 
SREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREsSREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREs
 
DTrace Topics: Introduction
DTrace Topics: IntroductionDTrace Topics: Introduction
DTrace Topics: Introduction
 
ACM Applicative System Methodology 2016
ACM Applicative System Methodology 2016ACM Applicative System Methodology 2016
ACM Applicative System Methodology 2016
 
Linux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF SuperpowersLinux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF Superpowers
 
The New Systems Performance
The New Systems PerformanceThe New Systems Performance
The New Systems Performance
 
Performance analysis 2013
Performance analysis 2013Performance analysis 2013
Performance analysis 2013
 
From DTrace to Linux
From DTrace to LinuxFrom DTrace to Linux
From DTrace to Linux
 
Open Source Systems Performance
Open Source Systems PerformanceOpen Source Systems Performance
Open Source Systems Performance
 
Systems Performance: Enterprise and the Cloud
Systems Performance: Enterprise and the CloudSystems Performance: Enterprise and the Cloud
Systems Performance: Enterprise and the Cloud
 
Designing Tracing Tools
Designing Tracing ToolsDesigning Tracing Tools
Designing Tracing Tools
 
MeetBSD2014 Performance Analysis
MeetBSD2014 Performance AnalysisMeetBSD2014 Performance Analysis
MeetBSD2014 Performance Analysis
 
Application Performance Monitoring
Application Performance MonitoringApplication Performance Monitoring
Application Performance Monitoring
 
AppDynamics VS New Relic – The Complete Guide
AppDynamics VS New Relic – The Complete GuideAppDynamics VS New Relic – The Complete Guide
AppDynamics VS New Relic – The Complete Guide
 
Netflix: From Clouds to Roots
Netflix: From Clouds to RootsNetflix: From Clouds to Roots
Netflix: From Clouds to Roots
 

Similar to Performance Analysis: The USE Method

Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyserAlex Moskvin
 
Mixing d ps building architecture on the cross cutting example
Mixing d ps building architecture on the cross cutting exampleMixing d ps building architecture on the cross cutting example
Mixing d ps building architecture on the cross cutting examplecorehard_by
 
Computer system organization
Computer system organizationComputer system organization
Computer system organizationSyed Zaid Irshad
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applicationsAmit Kejriwal
 
05. performance-concepts
05. performance-concepts05. performance-concepts
05. performance-conceptsMuhammad Ahad
 
Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...Nikolay Savvinov
 
SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1sqlserver.co.il
 
How to improve your Tizen native program
How to improve your Tizen native programHow to improve your Tizen native program
How to improve your Tizen native programRyo Jin
 
Guider: An Integrated Runtime Performance Analyzer on AGL
Guider: An Integrated Runtime Performance Analyzer on AGLGuider: An Integrated Runtime Performance Analyzer on AGL
Guider: An Integrated Runtime Performance Analyzer on AGLPeace Lee
 
LISA2010 visualizations
LISA2010 visualizationsLISA2010 visualizations
LISA2010 visualizationsBrendan Gregg
 
Operating Systems & Applications
Operating Systems & ApplicationsOperating Systems & Applications
Operating Systems & ApplicationsMaulen Bale
 
Monitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisMonitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisBrendan Gregg
 
Performance Tuning
Performance TuningPerformance Tuning
Performance TuningJannet Peetz
 
Chapter -2 operating system presentation
Chapter -2 operating system presentationChapter -2 operating system presentation
Chapter -2 operating system presentationchnrketan
 
Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses,...
Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses,...Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses,...
Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses,...DevOpsDays Tel Aviv
 
SOA with PHP and Symfony
SOA with PHP and SymfonySOA with PHP and Symfony
SOA with PHP and SymfonyMichalSchroeder
 
Testing - How Vital and How Easy to use
Testing - How Vital and How Easy to useTesting - How Vital and How Easy to use
Testing - How Vital and How Easy to useUma Ghotikar
 
Performance monitoring - Adoniram Mishra, Rupesh Dubey, ThoughtWorks
Performance monitoring - Adoniram Mishra, Rupesh Dubey, ThoughtWorksPerformance monitoring - Adoniram Mishra, Rupesh Dubey, ThoughtWorks
Performance monitoring - Adoniram Mishra, Rupesh Dubey, ThoughtWorksThoughtworks
 

Similar to Performance Analysis: The USE Method (20)

Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyser
 
Mixing d ps building architecture on the cross cutting example
Mixing d ps building architecture on the cross cutting exampleMixing d ps building architecture on the cross cutting example
Mixing d ps building architecture on the cross cutting example
 
Computer system organization
Computer system organizationComputer system organization
Computer system organization
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applications
 
05. performance-concepts
05. performance-concepts05. performance-concepts
05. performance-concepts
 
Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...
 
SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1
 
How to improve your Tizen native program
How to improve your Tizen native programHow to improve your Tizen native program
How to improve your Tizen native program
 
Guider: An Integrated Runtime Performance Analyzer on AGL
Guider: An Integrated Runtime Performance Analyzer on AGLGuider: An Integrated Runtime Performance Analyzer on AGL
Guider: An Integrated Runtime Performance Analyzer on AGL
 
LISA2010 visualizations
LISA2010 visualizationsLISA2010 visualizations
LISA2010 visualizations
 
Operating Systems & Applications
Operating Systems & ApplicationsOperating Systems & Applications
Operating Systems & Applications
 
Monitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisMonitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance Analysis
 
Performance Tuning
Performance TuningPerformance Tuning
Performance Tuning
 
techniques.ppt
techniques.ppttechniques.ppt
techniques.ppt
 
Chapter -2 operating system presentation
Chapter -2 operating system presentationChapter -2 operating system presentation
Chapter -2 operating system presentation
 
Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses,...
Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses,...Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses,...
Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses,...
 
SOA with PHP and Symfony
SOA with PHP and SymfonySOA with PHP and Symfony
SOA with PHP and Symfony
 
Testing - How Vital and How Easy to use
Testing - How Vital and How Easy to useTesting - How Vital and How Easy to use
Testing - How Vital and How Easy to use
 
Performance monitoring - Adoniram Mishra, Rupesh Dubey, ThoughtWorks
Performance monitoring - Adoniram Mishra, Rupesh Dubey, ThoughtWorksPerformance monitoring - Adoniram Mishra, Rupesh Dubey, ThoughtWorks
Performance monitoring - Adoniram Mishra, Rupesh Dubey, ThoughtWorks
 
Why do Users kill HPC Jobs?
Why do Users kill HPC Jobs?Why do Users kill HPC Jobs?
Why do Users kill HPC Jobs?
 

More from Brendan Gregg

YOW2021 Computing Performance
YOW2021 Computing PerformanceYOW2021 Computing Performance
YOW2021 Computing PerformanceBrendan Gregg
 
IntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingIntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingBrendan Gregg
 
Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Brendan Gregg
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)Brendan Gregg
 
Performance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedPerformance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedBrendan Gregg
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceBrendan Gregg
 
re:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflixre:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at NetflixBrendan Gregg
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareBrendan Gregg
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceBrendan Gregg
 
LPC2019 BPF Tracing Tools
LPC2019 BPF Tracing ToolsLPC2019 BPF Tracing Tools
LPC2019 BPF Tracing ToolsBrendan Gregg
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityBrendan Gregg
 
YOW2018 CTO Summit: Working at netflix
YOW2018 CTO Summit: Working at netflixYOW2018 CTO Summit: Working at netflix
YOW2018 CTO Summit: Working at netflixBrendan Gregg
 
eBPF Perf Tools 2019
eBPF Perf Tools 2019eBPF Perf Tools 2019
eBPF Perf Tools 2019Brendan Gregg
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixBrendan Gregg
 
NetConf 2018 BPF Observability
NetConf 2018 BPF ObservabilityNetConf 2018 BPF Observability
NetConf 2018 BPF ObservabilityBrendan Gregg
 
ATO Linux Performance 2018
ATO Linux Performance 2018ATO Linux Performance 2018
ATO Linux Performance 2018Brendan Gregg
 
Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)Brendan Gregg
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceBrendan Gregg
 

More from Brendan Gregg (20)

YOW2021 Computing Performance
YOW2021 Computing PerformanceYOW2021 Computing Performance
YOW2021 Computing Performance
 
IntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingIntelON 2021 Processor Benchmarking
IntelON 2021 Processor Benchmarking
 
Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
Performance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedPerformance Wins with BPF: Getting Started
Performance Wins with BPF: Getting Started
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
 
re:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflixre:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflix
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of Software
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems Performance
 
LPC2019 BPF Tracing Tools
LPC2019 BPF Tracing ToolsLPC2019 BPF Tracing Tools
LPC2019 BPF Tracing Tools
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF Observability
 
YOW2018 CTO Summit: Working at netflix
YOW2018 CTO Summit: Working at netflixYOW2018 CTO Summit: Working at netflix
YOW2018 CTO Summit: Working at netflix
 
eBPF Perf Tools 2019
eBPF Perf Tools 2019eBPF Perf Tools 2019
eBPF Perf Tools 2019
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
 
BPF Tools 2017
BPF Tools 2017BPF Tools 2017
BPF Tools 2017
 
NetConf 2018 BPF Observability
NetConf 2018 BPF ObservabilityNetConf 2018 BPF Observability
NetConf 2018 BPF Observability
 
FlameScope 2018
FlameScope 2018FlameScope 2018
FlameScope 2018
 
ATO Linux Performance 2018
ATO Linux Performance 2018ATO Linux Performance 2018
ATO Linux Performance 2018
 
Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for Performance
 

Recently uploaded

[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 

Recently uploaded (20)

[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 

Performance Analysis: The USE Method

  • 1. Performance Analysis: The USE Method Brendan Gregg Lead Performance Engineer, Joyent brendan.gregg@joyent.com FISL13 July, 2012
  • 2. whoami • I work at the top of the performance support chain • I also write open source performance tools out of necessity to solve issues • http://github.com/brendangregg • http://www.brendangregg.com/#software • And books (DTrace, Solaris Performance and Tools) • Was Brendan @ Sun Microsystems, Oracle, now Joyent
  • 3. Joyent • Cloud computing provider • Cloud computing software • SmartOS • host OS, and guest via OS virtualization • Linux, Windows • guest via KVM
  • 4. Agenda • Example Problem • Performance Methodology • Problem Statement • The USE Method • Workload Characterization • Drill-Down Analysis • Specific Tools
  • 5. Example Problem • Recent cloud-based performance issue • Customer problem statement: • “Database response time sometimes take multiple seconds. Is the network dropping packets?” • Tested network using traceroute, which showed some packet drops
  • 6. Example: Support Path • Performance Analysis Top 2nd Level 1st Level Customer Issues
  • 7. Example: Support Path • Performance Analysis Top my turn 2nd Level “network looks ok, CPU also ok” 1st Level “ran traceroute, can’t reproduce” Customer: “network drops?”
  • 8. Example: Network Drops • Old fashioned: network packet capture (sniffing) • Performance overhead during capture (CPU, storage) and post-processing (wireshark) • Time consuming to analyze: not real-time
  • 9. Example: Network Drops • New: dynamic tracing • Efficient: only drop/retransmit paths traced • Context: kernel state readable • Real-time: analysis and summaries # ./tcplistendrop.d TIME 2012 Jan 19 01:22:49 2012 Jan 19 01:22:49 2012 Jan 19 01:22:49 2012 Jan 19 01:22:49 2012 Jan 19 01:22:49 2012 Jan 19 01:22:49 2012 Jan 19 01:22:49 [...] SRC-IP 10.17.210.103 10.17.210.108 10.17.210.116 10.17.210.117 10.17.210.112 10.17.210.106 10.12.143.16 PORT 25691 18423 38883 10739 27988 28824 65070 -> -> -> -> -> -> -> DST-IP 192.192.240.212 192.192.240.212 192.192.240.212 192.192.240.212 192.192.240.212 192.192.240.212 192.192.240.212 PORT 80 80 80 80 80 80 80
  • 10. Example: Methodology • Instead of network drop analysis, I began with the USE method to check system health
  • 11. Example: Methodology • Instead of network drop analysis, I began with the USE method to check system health • In < 5 minutes, I found: • CPU: ok (light usage) • network: ok (light usage) • memory: available memory was exhausted, and the system was paging • disk: periodic bursts of 100% utilization • The method is simple, fast, directs further analysis
  • 12. Example: Other Methodologies • Customer was surprised (are you sure?) I used latency analysis to confirm. Details (if interesting): • memory: using both microstate accounting and dynamic tracing to confirm that anonymous pagins were hurting the database; worst case app thread spent 97% of time waiting on disk (data faults). • disk: using dynamic tracing to confirm latency at the application / file system interface; included up to 1000ms fsync() calls. • Different methodology, smaller audience (expertise), more time (1 hour).
  • 13. Example: Summary • What happened: • customer, 1st and 2nd level support spent much time chasing network packet drops. • What could have happened: • customer or 1st level follows the USE method and quickly discover memory and disk issues • memory: fixable by customer reconfig • disk: could go back to 1st or 2nd level support for confirmation • Faster resolution, frees time
  • 14. Performance Methodology • Not a tool • Not a product • Is a procedure (documentation)
  • 15. Performance Methodology • Not a tool -> but tools can be written to help • Not a product -> could be in monitoring solutions • Is a procedure (documentation)
  • 16. Why Now: past • Performance analysis circa ‘90s, metric-orientated: • Vendor creates metrics and performance tools • Users develop methods to interpret metrics • Common method: “Tools Method” • List available performance tools • For each tool, list useful metrics • For each metric, determine interpretation • Problematic: vendors often don’t provide the best metrics; can be blind to issue types
  • 17. Why Now: changes • Open Source • Dynamic Tracing • See anything, not just what the vendor gave you • Only practical on open source software • Hardest part is knowing what questions to ask
  • 18. Why Now: present • Performance analysis now (post dynamic tracing), question-orientated: • Users pose questions • Check if vendor has provided metrics • Develop custom metrics using dynamic tracing • Methodologies pose the questions • What would previously be an academic exercise is now practical
  • 19. Methology Audience • Beginners: provides a starting point • Experts: provides a checklist/reminder
  • 20. Performance Methodolgies • Suggested order of execution: 1.Problem Statement 2.The USE Method 3.Workload Characterization 4.Drill-Down Analysis (Latency)
  • 21. Problem Statement • Typical support procedure (1st Methodology): 1.What makes you think there is a problem? 2.Has this system ever performed well? 3.What changed? Software? Hardware? Load? 4.Can the performance degradation be expressed in terms of latency or run time? 5.Does the problem affect other people or applications? 6.What is the environment? What software and hardware is used? Versions? Configuration?
  • 22. The USE Method • Quick System Health Check (2nd Methodology): • For every resource, check: • Utilization • Saturation • Errors
  • 23. The USE Method • Quick System Health Check (2nd Methodology): • For every resource, check: • Utilization: time resource was busy, or degree used • Saturation: degree of queued extra work • Errors: any errors Saturation X Errors Utilization
  • 24. The USE Method: Hardware Resources • CPUs • Main Memory • Network Interfaces • Storage Devices • Controllers • Interconnects
  • 25. The USE Method: Hardware Resources • A great way to determine resources is to find (or draw) the server functional diagram • The hardware team at vendors should have these • Analyze every component in the data path
  • 26. The USE Method: Functional Diagrams, Generic Example Memory Bus DRAM CPU Interconnect CPU 1 DRAM CPU 2 I/O Bus I/O Bridge I/O Controller Expander Interconnect Network Controller Interface Transports Disk Disk Port Port
  • 27. The USE Method: Resource Types • There are two different resource types, each define utilization differently: • I/O Resource: eg, network interface • utilization: time resource was busy. current IOPS / max or current throughput / max can be used in some cases • Capacity Resource: eg, main memory • utilization: space consumed • Storage devices act as both resource types
  • 28. The USE Method: Software Resources • Mutex Locks • Thread Pools • Process/Thread Capacity • File Descriptor Capacity
  • 29. The USE Method: Flow Diagram Choose Resource Errors Present? Y N High Utilization? Y N N Saturation? Y Problem Identified
  • 30. The USE Method: Interpretation • Utilization • 100% usually a bottleneck • 70%+ often a bottleneck for I/O resources, especially when high priority work cannot easily interrupt lower priority work (eg, disks) • Beware of time intervals. 60% utilized over 5 minutes may mean 100% utilized for 3 minutes then idle • Best examined per-device (unbalanced workloads)
  • 31. The USE Method: Interpretation • Saturation • Any non-zero value adds latency • Errors • Should be obvious
  • 32. The USE Method: Easy Combinations Resource Type CPU utilization CPU saturation Memory utilization Memory saturation Network Interface utilization Storage Device I/O utilization Storage Device I/O saturation Storage Device I/O errors Metric
  • 33. The USE Method: Easy Combinations Resource Type Metric CPU utilization CPU utilization CPU saturation run-queue length Memory utilization Memory saturation paging or swapping Network Interface utilization Storage Device I/O utilization available memory RX/TX tput/bandwidth device busy percent Storage Device I/O saturation wait queue length Storage Device I/O errors device errors
  • 34. The USE Method: Harder Combinations Resource Type CPU errors Network saturation Storage Controller utilization CPU Interconnect utilization Mem. Interconnect saturation I/O Interconnect saturation Metric
  • 35. The USE Method: Harder Combinations Resource Type Metric CPU errors eg, correctable CPU cache ECC events Network saturation “nocanputs”, buffering Storage Controller utilization CPU Interconnect utilization active vs max controller IOPS and tput per port tput / max bandwidth Mem. Interconnect saturation memory stall cycles I/O Interconnect bus throughput / max saturation bandwidth
  • 36. The USE Method: tools • To be thorough, you will need to use: • CPU performance counters • For bus and interconnect activity; eg, perf events, cpustat • Dynamic Tracing • For missing saturation and error metrics; eg, DTrace • Both can get tricky; tools can be developed to help • Please, no more top variants! ... unless it is interconnect-top or bus-top • I’ve written dozens of open source tools for both CPC and DTrace; much more can be done
  • 37. Workload Characterization • May use as a 3rd Methodology • Characterize workload by: • who is causing the load? PID, UID, IP addr, ... • why is the load called? code path • what is the load? IOPS, tput, type • how is the load changing over time? • Best performance wins are from eliminating unnecessary work • Identifies class of issues that are load-based, not architecture-based
  • 38. Drill-Down Analysis • May use as a 4th Methodology • Peel away software layers to drill down on the issue • Eg, software stack I/O latency analysis: Application System Call Interface File System Block Device Interface Storage Device Drivers Storage Devices
  • 39. Drill-Down Analysis: Open Source • With Dynamic Tracing, all function entry & return points can be traced, with nanosecond timestamps. • One Strategy is to measure latency pairs, to search for the source; eg, A->B & C->D: static int arc_cksum_equal(arc_buf_t *buf) A{ zio_cksum_t zc; int equal; C mutex_enter(&buf->b_hdr->b_freeze_lock); fletcher_2_native(buf->b_data, buf->b_hdr->b_size, &zc); D equal = ZIO_CHECKSUM_EQUAL(*buf->b_hdr->b_freeze_cksum, zc); mutex_exit(&buf->b_hdr->b_freeze_lock); B} return (equal);
  • 40. Other Methodologies • Method R • A latency-based analysis approach for Oracle databases. See “Optimizing Oracle Performance" by Cary Millsap and Jeff Holt (2003) • Experimental approaches • Can be very useful: eg, validating network throughput using iperf
  • 41. Specific Tools for the USE Method
  • 42. illumos-based • http://dtrace.org/blogs/brendan/2012/03/01/the-usemethod-solaris-performance-checklist/ Resource Type Metric CPU Utilization per-cpu: mpstat 1, “idl”; system-wide: vmstat 1, “id”; per-process:prstat -c 1 (“CPU” == recent), prstat mLc 1 (“USR” + “SYS”); per-kernel-thread: lockstat -Ii rate, DTrace profile stack() Saturation system-wide: uptime, load averages; vmstat 1, “r”; DTrace dispqlen.d (DTT) for a better “vmstat r”; per-process: prstat -mLc 1, “LAT” Errors fmadm faulty; cpustat (CPC) for whatever error counters are supported (eg, thermal throttling) Saturation system-wide: vmstat 1, “sr” (bad now), “w” (was very bad); vmstat -p 1, “api” (anon page ins == pain), “apo”; per-process: prstat -mLc 1, “DFL”; DTrace anonpgpid.d (DTT), vminfo:::anonpgin on execname CPU CPU Memory • ... etc for all combinations (would span a dozen slides)
  • 43. Linux-based • http://dtrace.org/blogs/brendan/2012/03/07/the-usemethod-linux-performance-checklist/ Resource Type Metric CPU Utilization per-cpu: mpstat -P ALL 1, “%idle”; sar -P ALL, “%idle”; system-wide: vmstat 1, “id”; sar -u, “%idle”; dstat -c, “idl”; per-process:top, “%CPU”; htop, “CPU%”; ps -o pcpu; pidstat 1, “%CPU”; per-kernel-thread: top/htop (“K” to toggle), where VIRT == 0 (heuristic). [1] Saturation system-wide: vmstat 1, “r” > CPU count [2]; sar -q, “runq-sz” > CPU count; dstat -p, “run” > CPU count; perprocess: /proc/PID/schedstat 2nd field (sched_info.run_delay); perf sched latency (shows “Average” and “Maximum” delay per-schedule); dynamic tracing, eg, SystemTap schedtimes.stp “queued(us)” [3] Errors perf (LPE) if processor specific error events (CPC) are available; eg, AMD64′s “04Ah Single-bit ECC Errors Recorded by Scrubber” [4] CPU CPU • ... etc for all combinations (would span a dozen slides)
  • 44. Products • Earlier I said methodologies could be supported by monitoring solutions • At Joyent we develop Cloud Analytics:
  • 45. Future • Methodologies for advanced performance issues • I recently worked a complex KVM bandwidth issue where no current methodologies really worked • Innovative methods based on open source + dynamic tracing • Less performance mystery. Less guesswork. • Better use of resources (price/performance) • Easier for beginners to get started
  • 46. Thank you • Resources: • http://dtrace.org/blogs/brendan • http://dtrace.org/blogs/brendan/2012/02/29/the-use-method/ • http://dtrace.org/blogs/brendan/tag/usemethod/ • http://dtrace.org/blogs/brendan/2011/12/18/visualizing-deviceutilization/ - ideas if you are a monitoring solution developer • brendan@joyent.com