SlideShare a Scribd company logo
© Copyright Azul Systems 2015
© Copyright Azul Systems 2015
@azulsystems
How NOT to
Measure Latency
Matt Schuetze
Product Management Director, Azul Systems
6/12/20151
QCon NY
Brooklyn, New York
© Copyright Azul Systems 2015
© Copyright Azul Systems 2015
@azulsystems
Understanding Latency and
Application Responsiveness
Matt Schuetze
Product Management Director, Azul Systems
6/12/20152
QCon NY
Brooklyn, New York
© Copyright Azul Systems 2015
© Copyright Azul Systems 2015
@azulsystems
The Oh $@%T! talk.
Matt Schuetze
Product Management Director, Azul Systems
6/12/20153
QCon NY
Brooklyn, New York
© Copyright Azul Systems 2015
About me: Matt Schuetze
 Product Management Director
at Azul Systems
 Translate Voice of Customer
into Zing and Zulu
requirements and work items
 Sing the praises of Azul efforts
through product launches
 Azul alternate on JCP exec
committee, co-lead Detroit
Java User Group
 Stand on the shoulders of
giants and admit it
6/12/20154
© Copyright Azul Systems 2015
Philosophy and motivation
What do we actually care about. And why?
6/12/20155
© Copyright Azul Systems 2015
Latency Behavior
 Latency: The time it took one operation to
happen
 Each operation occurrence has its own
latency
 What we care about is how latency
behaves
 Behavior is a lot more than “the common
case was X”
6/12/20156
© Copyright Azul Systems 2015©2013 Azul Systems, Inc.
95%’ile
The “We only want to show good things” chart
We like to look at charts
6/12/20157
© Copyright Azul Systems 2015
What do you care about?
Do you :
Care about latency in your system?
Care about the worst case?
Care about the 99.99%’ile?
Only care about the fastest thing in the day?
Only care about the best 50%
Only need 90% of operations to meet
requirements?
6/12/20158
© Copyright Azul Systems 2015©2013 Azul Systems, Inc.
We like to rant about latency
6/12/201515
© Copyright Azul Systems 2015
99%‘ile is ~60 usec.
(but mean is ~210usec)
“outliers”, “averages” and other nonsense
We nicknamed these
spikes “hiccups”
6/12/201516
© Copyright Azul Systems 2015
Dispelling standard deviation
6/12/201517
© Copyright Azul Systems 2015
Mean = 0.06 msec
Std. Deviation (𝞂) = 0.21msec
99.999% = 38.66msec
In a normal distribution,
These are NOT normal distributions
~184 σ (!!!) away from the mean
the 99.999%’ile falls within 4.5 σ
Dispelling standard deviation
6/12/201518
© Copyright Azul Systems 2015
Is the 99%’ile “rare”?
6/12/201519
© Copyright Azul Systems 2015
What are the chances of a single web page
view experiencing the 99%’ile latency of:
- A single search engine node?
- A single Key/Value store node?
- A single Database node?
- A single CDN request?
Cumulative probability…
6/12/201520
© Copyright Azul Systems 20156/12/201521
© Copyright Azul Systems 2015
Which HTTP response time metric
is more “representative” of user
experience?
The 95%’ile or the 99.9%’ile
6/12/201522
© Copyright Azul Systems 2015
Example: A typical user session involves 5 page loads,
averaging 40 resources per page.
- How many of our users will NOT experience something
worse than the 95%’ile?
Answer: ~0.003%
- How many of our users will experience at least one
response that is longer than the 99.9%’ile?
Answer: ~18%
Gauging user experience
6/12/201523
© Copyright Azul Systems 2015
Classic look at response time behavior
Response time as a function of load
source: IBM CICS server documentation, “understanding response times”
Average?
Max?
Median?
90%?
99.9%
6/12/201524
© Copyright Azul Systems 2015
Hiccups are strongly multi-modal
 They don’t look anything like a normal distribution
 A complete shift from one mode/behavior to another
 Mode A: “good”.
 Mode B: “Somewhat bad”
 Mode C: “terrible”, ...
 The real world is not a gentle, smooth curve
 Mode transitions are “phase changes”
6/12/201525
© Copyright Azul Systems 2015
Proven ways to deal with hiccups
Actually characterizing latency
Requirements
Response Time
Percentile plot
line
6/12/201526
© Copyright Azul Systems 2015
Different throughputs, configurations, or other parameters on one graph
Comparing Behavior
6/12/201529
© Copyright Azul Systems 2015
Shameless Bragging
6/12/201530
© Copyright Azul Systems 2015
Comparing Behaviors - Actual
Latency sensitive messaging distribution application: HotSpot vs. Zing
6/12/201531
© Copyright Azul Systems 2015
Zing
 A standards-compliant JVM for Linux/x86 servers
 Eliminates Garbage Collection as a concern for
enterprise applications in Java, Scala, or any JVM
language
 Very wide operating range: Used in both low latency and
large scale enterprise application spaces
 Decouples scale metrics from response time concerns
Transaction rate, data set size, concurrent users, heap
size, allocation rate, mutation rate, etc.
 Leverages elastic memory for resilient operation
6/12/201532
© Copyright Azul Systems 2015
What is Zing good for?
 If you have a server-based Java application
 And you are running on Linux
 And you use using more than ~300MB of
memory, up to as high as 1TB memory,
 Then Zing will likely deliver superior behavior
metrics
6/12/201533
© Copyright Azul Systems 2015
Where Zing shines
 Low latency
Eliminate behavior blips down to the sub-millisecond-units level
 Machine-to-machine “stuff”
Support higher *sustainable* throughput (one that meets SLAs)
Messaging, queues, market data feeds, fraud detection, analytics
 Human response times
Eliminate user-annoying response time blips. Multi-second and
even fraction-of-a-second blips will be completely gone.
Support larger memory JVMs *if needed* (e.g. larger virtual user
counts, or larger cache, in-memory state, or consolidating multiple
instances)
 “Large” data and in-memory analytics
Make batch stuff “business real time”. Gain super-efficiencies.
Cassandra, Spark, Solr, DataGrid, any large dataset in fast motion
6/12/201534
© Copyright Azul Systems 2015
An accidental conspiracy...
The coordinated omission
problem
6/12/201535
© Copyright Azul Systems 2015
The coordinated omission problem
 Common load testing example:
– each “client” issues requests at a certain rate
– measure/log response time for each request
 So what’s wrong with that?
– works only if ALL responses fit within interval
– implicit “automatic back off” coordination
Begin audience participation exercise now…
6/12/201536
© Copyright Azul Systems 2015
It is MUCH more common than you
may think...
Is coordinated omission rare?
6/12/201537
© Copyright Azul Systems 2015
Before
Correction
After
Correcting
for
Omission
JMeter makes this mistake...
And so do other tools!
6/12/201538
© Copyright Azul Systems 2015
Before
Correction
After
Correction
Wrong
by 7x
Real World Coordinated Omission effects
6/12/201539
© Copyright Azul Systems 2015
Uncorrected Data
Real World Coordinated Omission effects
6/12/201540
© Copyright Azul Systems 2015
Uncorrected Data
Corrected for
Coordinated
Omission
Real World Coordinated Omission effects
6/12/201541
© Copyright Azul Systems 2015
A ~2500x
difference in
reported
percentile levels
for the problem
that Zing
eliminates
Zing
“other” JVM
Real World Coordinated Omission effects
Why I care
6/12/201542
© Copyright Azul Systems 2015
How “real” people react
6/12/201543
© Copyright Azul Systems 2015
Suggestions
 Whatever your measurement technique is, test it.
 Run your measurement method against artificial systems
that create hypothetical pauses scenarios. See if your
reported results agree with how you would describe that
system behavior
 Don’t waste time analyzing until you establish sanity
 Don’t ever use or derive from standard deviation
 Always measure Max time. Consider what it means...
 Be suspicious.
 Measure %‘iles. Lots of them.
6/12/201544
© Copyright Azul Systems 2015
HdrHistogram
6/12/201545
© Copyright Azul Systems 2015
Then you need both good dynamic range and good resolution
HdrHistogram
If you want to be able to produce charts like this...
6/12/201546
© Copyright Azul Systems 20156/12/201548
© Copyright Azul Systems 20156/12/201549
© Copyright Azul Systems 2015
Shape of Constant latency
10K fixed line latency
6/12/201550
© Copyright Azul Systems 2015
Shape of Gaussian latency
10K fixed line latency with added
Gaussian noise (std dev. = 5K)
6/12/201551
© Copyright Azul Systems 2015
Shape of Random latency
10K fixed line latency with added
Gaussian (std dev. = 5K) vs. random (+5K)
6/12/201552
© Copyright Azul Systems 2015
Shape of Stalling latency
10K fixed base, stall magnitude of 50K
stall likelihood = 0.00005 (interval = 100)
6/12/201553
© Copyright Azul Systems 2015
Shape of Queuing latency
10K fixed base, occasional bursts of 500 msgs
handling time = 100, burst likelihood = 0.00005
6/12/201554
© Copyright Azul Systems 2015
Shape of Multi Modal latency
10K mode0 70K mode1 (likelihood 0.01)
180K mode2 (likelihood 0.00001)
6/12/201555
© Copyright Azul Systems 2015
And this what the real(?) world
sometimes looks like…
6/12/201556
© Copyright Azul Systems 2015
Real world “deductive reasoning”
6/12/201557
© Copyright Azul Systems 2015
http://www.jhiccup.org
6/12/201558
© Copyright Azul Systems 2015
jHiccup
6/12/201559
© Copyright Azul Systems 2015
Discontinuity in Java execution
6/12/201560
© Copyright Azul Systems 2015
Examples
6/12/201563
© Copyright Azul Systems 2015
Oracle HotSpot ParallelGC Oracle HotSpot G1
1GB live set in 8GB heap, same app, same HotSpot, different GC
6/12/201564
© Copyright Azul Systems 2015
Oracle HotSpot CMS Zing Pauseless GC
1GB live set in 8GB heap, same app, different JVM/GC
6/12/201565
© Copyright Azul Systems 2015
Oracle HotSpot CMS Zing Pauseless GC
1GB live set in 8GB heap, same app, different JVM/GC- drawn to scale
6/12/201566
© Copyright Azul Systems 2015
Zing
Low latency trading application
Oracle HotSpot (pure NewGen) Zing Pauseless GC
6/12/201567
© Copyright Azul Systems 2015
Oracle HotSpot (pure newgen) ZingOracle HotSpot (pure newgen)
Low latency trading application
Oracle HotSpot (pure NewGen) Zing Pauseless GC
6/12/201568
© Copyright Azul Systems 2015
Oracle HotSpot (pure newgen) Zing
Low latency trading application – drawn to scale
Oracle HotSpot (pure NewGen) Zing Pauseless GC
6/12/201569
© Copyright Azul Systems 2015
Q & A
www.azul.com
www.jhiccup.com
www.hdrhistogram.com
@schuetzematt
6/12/201570

More Related Content

Viewers also liked

Psl
Psl Psl
Git 들여다보기(1)
Git 들여다보기(1)Git 들여다보기(1)
Git 들여다보기(1)
Kim Byoungsu
 
Internet Governance by Its History (1966-2000)
Internet Governance by Its History (1966-2000)Internet Governance by Its History (1966-2000)
Internet Governance by Its History (1966-2000)
Jeremy Pesner
 
My daily routine
My daily routineMy daily routine
My daily routine
norapuig
 
Кружок "Математическая мозаика"
Кружок "Математическая мозаика"Кружок "Математическая мозаика"
Кружок "Математическая мозаика"
Юрий Карабанов
 
Silicon Valley JUG: JVM Mechanics
Silicon Valley JUG: JVM MechanicsSilicon Valley JUG: JVM Mechanics
Silicon Valley JUG: JVM Mechanics
Azul Systems, Inc.
 
Enabling Java in Latency Sensitive Environments - Dallas JUG April 2015
Enabling Java in Latency Sensitive Environments - Dallas JUG April 2015Enabling Java in Latency Sensitive Environments - Dallas JUG April 2015
Enabling Java in Latency Sensitive Environments - Dallas JUG April 2015
Azul Systems, Inc.
 
Estudios ambientales
Estudios ambientalesEstudios ambientales
Estudios ambientales
ValentinaM16
 
bilogi annelida
bilogi annelidabilogi annelida
bilogi annelida
Dinda Pertiwi
 
5 VARIABLES in MANUFACTURING PRODN
5 VARIABLES in MANUFACTURING PRODN5 VARIABLES in MANUFACTURING PRODN
5 VARIABLES in MANUFACTURING PRODN
Ed Aliño
 

Viewers also liked (11)

Psl
Psl Psl
Psl
 
О себе
О себеО себе
О себе
 
Git 들여다보기(1)
Git 들여다보기(1)Git 들여다보기(1)
Git 들여다보기(1)
 
Internet Governance by Its History (1966-2000)
Internet Governance by Its History (1966-2000)Internet Governance by Its History (1966-2000)
Internet Governance by Its History (1966-2000)
 
My daily routine
My daily routineMy daily routine
My daily routine
 
Кружок "Математическая мозаика"
Кружок "Математическая мозаика"Кружок "Математическая мозаика"
Кружок "Математическая мозаика"
 
Silicon Valley JUG: JVM Mechanics
Silicon Valley JUG: JVM MechanicsSilicon Valley JUG: JVM Mechanics
Silicon Valley JUG: JVM Mechanics
 
Enabling Java in Latency Sensitive Environments - Dallas JUG April 2015
Enabling Java in Latency Sensitive Environments - Dallas JUG April 2015Enabling Java in Latency Sensitive Environments - Dallas JUG April 2015
Enabling Java in Latency Sensitive Environments - Dallas JUG April 2015
 
Estudios ambientales
Estudios ambientalesEstudios ambientales
Estudios ambientales
 
bilogi annelida
bilogi annelidabilogi annelida
bilogi annelida
 
5 VARIABLES in MANUFACTURING PRODN
5 VARIABLES in MANUFACTURING PRODN5 VARIABLES in MANUFACTURING PRODN
5 VARIABLES in MANUFACTURING PRODN
 

Similar to How NOT to Measure Latency

Enabling Java in Latency-Sensitive Environments - Austin JUG April 2015
Enabling Java in Latency-Sensitive Environments - Austin JUG April 2015Enabling Java in Latency-Sensitive Environments - Austin JUG April 2015
Enabling Java in Latency-Sensitive Environments - Austin JUG April 2015
Azul Systems, Inc.
 
Vertafore: Database Evaluation - Selecting Apache Cassandra
Vertafore: Database Evaluation - Selecting Apache CassandraVertafore: Database Evaluation - Selecting Apache Cassandra
Vertafore: Database Evaluation - Selecting Apache Cassandra
DataStax Academy
 
QCon London: Low latency Java in the real world - LMAX Exchange and the Zing JVM
QCon London: Low latency Java in the real world - LMAX Exchange and the Zing JVMQCon London: Low latency Java in the real world - LMAX Exchange and the Zing JVM
QCon London: Low latency Java in the real world - LMAX Exchange and the Zing JVM
Azul Systems, Inc.
 
Skytap parasoft webinar new years resolution- accelerate sdlc
Skytap parasoft webinar new years resolution- accelerate sdlcSkytap parasoft webinar new years resolution- accelerate sdlc
Skytap parasoft webinar new years resolution- accelerate sdlc
Skytap Cloud
 
Taking IT Analytics to the Next Level
Taking IT Analytics to the Next LevelTaking IT Analytics to the Next Level
Taking IT Analytics to the Next Level
CA Technologies
 
MySQL High Availibility Solutions
MySQL High Availibility SolutionsMySQL High Availibility Solutions
MySQL High Availibility Solutions
Mark Swarbrick
 
Real World Problem Solving Using Application Performance Management 10
Real World Problem Solving Using Application Performance Management 10Real World Problem Solving Using Application Performance Management 10
Real World Problem Solving Using Application Performance Management 10
CA Technologies
 
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive EnvironmentsEnabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
C4Media
 
The Cloud Native Journey
The Cloud Native JourneyThe Cloud Native Journey
The Cloud Native Journey
VMware Tanzu
 
Stop the Blame Game with Increased Visibility of your Mobile-to-Mainframe IT ...
Stop the Blame Game with Increased Visibility of your Mobile-to-Mainframe IT ...Stop the Blame Game with Increased Visibility of your Mobile-to-Mainframe IT ...
Stop the Blame Game with Increased Visibility of your Mobile-to-Mainframe IT ...
CA Technologies
 
Migrate from Terma Software Jaws to CA Workload Automation iDash for Enhanced...
Migrate from Terma Software Jaws to CA Workload Automation iDash for Enhanced...Migrate from Terma Software Jaws to CA Workload Automation iDash for Enhanced...
Migrate from Terma Software Jaws to CA Workload Automation iDash for Enhanced...
CA Technologies
 
Silos Are For Farmers, Not IT
Silos Are For Farmers, Not ITSilos Are For Farmers, Not IT
Silos Are For Farmers, Not IT
Stonebranch, Inc.
 
Universal test solutions customer testimonial 10192013-v2.2
Universal test solutions customer testimonial 10192013-v2.2Universal test solutions customer testimonial 10192013-v2.2
Universal test solutions customer testimonial 10192013-v2.2
Universal Technology Solutions
 
oa predictive asset management executive briefing v20
oa predictive asset management executive briefing v20oa predictive asset management executive briefing v20
oa predictive asset management executive briefing v20
Ralph Overbeck
 
Techtonic Summit NYC
Techtonic Summit NYCTechtonic Summit NYC
Techtonic Summit NYC
Bob Wise
 
Hands-On Lab: Best Practices for Using CA Application Performance Management ...
Hands-On Lab: Best Practices for Using CA Application Performance Management ...Hands-On Lab: Best Practices for Using CA Application Performance Management ...
Hands-On Lab: Best Practices for Using CA Application Performance Management ...
CA Technologies
 
Netherlands Tech Tour 03 - MySQL Cluster
Netherlands Tech Tour 03 -   MySQL ClusterNetherlands Tech Tour 03 -   MySQL Cluster
Netherlands Tech Tour 03 - MySQL Cluster
Mark Swarbrick
 
Enabling Java in Latency-Sensitive Applications
Enabling Java in Latency-Sensitive ApplicationsEnabling Java in Latency-Sensitive Applications
Enabling Java in Latency-Sensitive Applications
Azul Systems Inc.
 
Test Your Cloud Maturity Level: A Practical Guide to Self Assessment
Test Your Cloud Maturity Level: A Practical Guide to Self AssessmentTest Your Cloud Maturity Level: A Practical Guide to Self Assessment
Test Your Cloud Maturity Level: A Practical Guide to Self Assessment
David Resnic
 
JVMCON Java in the 21st Century: are you thinking far enough ahead?
JVMCON Java in the 21st Century: are you thinking far enough ahead?JVMCON Java in the 21st Century: are you thinking far enough ahead?
JVMCON Java in the 21st Century: are you thinking far enough ahead?
Steve Poole
 

Similar to How NOT to Measure Latency (20)

Enabling Java in Latency-Sensitive Environments - Austin JUG April 2015
Enabling Java in Latency-Sensitive Environments - Austin JUG April 2015Enabling Java in Latency-Sensitive Environments - Austin JUG April 2015
Enabling Java in Latency-Sensitive Environments - Austin JUG April 2015
 
Vertafore: Database Evaluation - Selecting Apache Cassandra
Vertafore: Database Evaluation - Selecting Apache CassandraVertafore: Database Evaluation - Selecting Apache Cassandra
Vertafore: Database Evaluation - Selecting Apache Cassandra
 
QCon London: Low latency Java in the real world - LMAX Exchange and the Zing JVM
QCon London: Low latency Java in the real world - LMAX Exchange and the Zing JVMQCon London: Low latency Java in the real world - LMAX Exchange and the Zing JVM
QCon London: Low latency Java in the real world - LMAX Exchange and the Zing JVM
 
Skytap parasoft webinar new years resolution- accelerate sdlc
Skytap parasoft webinar new years resolution- accelerate sdlcSkytap parasoft webinar new years resolution- accelerate sdlc
Skytap parasoft webinar new years resolution- accelerate sdlc
 
Taking IT Analytics to the Next Level
Taking IT Analytics to the Next LevelTaking IT Analytics to the Next Level
Taking IT Analytics to the Next Level
 
MySQL High Availibility Solutions
MySQL High Availibility SolutionsMySQL High Availibility Solutions
MySQL High Availibility Solutions
 
Real World Problem Solving Using Application Performance Management 10
Real World Problem Solving Using Application Performance Management 10Real World Problem Solving Using Application Performance Management 10
Real World Problem Solving Using Application Performance Management 10
 
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive EnvironmentsEnabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
 
The Cloud Native Journey
The Cloud Native JourneyThe Cloud Native Journey
The Cloud Native Journey
 
Stop the Blame Game with Increased Visibility of your Mobile-to-Mainframe IT ...
Stop the Blame Game with Increased Visibility of your Mobile-to-Mainframe IT ...Stop the Blame Game with Increased Visibility of your Mobile-to-Mainframe IT ...
Stop the Blame Game with Increased Visibility of your Mobile-to-Mainframe IT ...
 
Migrate from Terma Software Jaws to CA Workload Automation iDash for Enhanced...
Migrate from Terma Software Jaws to CA Workload Automation iDash for Enhanced...Migrate from Terma Software Jaws to CA Workload Automation iDash for Enhanced...
Migrate from Terma Software Jaws to CA Workload Automation iDash for Enhanced...
 
Silos Are For Farmers, Not IT
Silos Are For Farmers, Not ITSilos Are For Farmers, Not IT
Silos Are For Farmers, Not IT
 
Universal test solutions customer testimonial 10192013-v2.2
Universal test solutions customer testimonial 10192013-v2.2Universal test solutions customer testimonial 10192013-v2.2
Universal test solutions customer testimonial 10192013-v2.2
 
oa predictive asset management executive briefing v20
oa predictive asset management executive briefing v20oa predictive asset management executive briefing v20
oa predictive asset management executive briefing v20
 
Techtonic Summit NYC
Techtonic Summit NYCTechtonic Summit NYC
Techtonic Summit NYC
 
Hands-On Lab: Best Practices for Using CA Application Performance Management ...
Hands-On Lab: Best Practices for Using CA Application Performance Management ...Hands-On Lab: Best Practices for Using CA Application Performance Management ...
Hands-On Lab: Best Practices for Using CA Application Performance Management ...
 
Netherlands Tech Tour 03 - MySQL Cluster
Netherlands Tech Tour 03 -   MySQL ClusterNetherlands Tech Tour 03 -   MySQL Cluster
Netherlands Tech Tour 03 - MySQL Cluster
 
Enabling Java in Latency-Sensitive Applications
Enabling Java in Latency-Sensitive ApplicationsEnabling Java in Latency-Sensitive Applications
Enabling Java in Latency-Sensitive Applications
 
Test Your Cloud Maturity Level: A Practical Guide to Self Assessment
Test Your Cloud Maturity Level: A Practical Guide to Self AssessmentTest Your Cloud Maturity Level: A Practical Guide to Self Assessment
Test Your Cloud Maturity Level: A Practical Guide to Self Assessment
 
JVMCON Java in the 21st Century: are you thinking far enough ahead?
JVMCON Java in the 21st Century: are you thinking far enough ahead?JVMCON Java in the 21st Century: are you thinking far enough ahead?
JVMCON Java in the 21st Century: are you thinking far enough ahead?
 

More from Azul Systems, Inc.

Unleash the Power of Apache Cassandra
Unleash the Power of Apache CassandraUnleash the Power of Apache Cassandra
Unleash the Power of Apache Cassandra
Azul Systems, Inc.
 
JVM Language Summit: Object layout presentation
JVM Language Summit: Object layout presentationJVM Language Summit: Object layout presentation
JVM Language Summit: Object layout presentation
Azul Systems, Inc.
 
JVM Language Summit: Object layout workshop
JVM Language Summit: Object layout workshopJVM Language Summit: Object layout workshop
JVM Language Summit: Object layout workshop
Azul Systems, Inc.
 
What's New in the JVM in Java 8?
What's New in the JVM in Java 8?What's New in the JVM in Java 8?
What's New in the JVM in Java 8?
Azul Systems, Inc.
 
DC JUG: Understanding Java Garbage Collection
DC JUG: Understanding Java Garbage CollectionDC JUG: Understanding Java Garbage Collection
DC JUG: Understanding Java Garbage Collection
Azul Systems, Inc.
 
Winning With Java at Market Open
Winning With Java at Market OpenWinning With Java at Market Open
Winning With Java at Market Open
Azul Systems, Inc.
 

More from Azul Systems, Inc. (6)

Unleash the Power of Apache Cassandra
Unleash the Power of Apache CassandraUnleash the Power of Apache Cassandra
Unleash the Power of Apache Cassandra
 
JVM Language Summit: Object layout presentation
JVM Language Summit: Object layout presentationJVM Language Summit: Object layout presentation
JVM Language Summit: Object layout presentation
 
JVM Language Summit: Object layout workshop
JVM Language Summit: Object layout workshopJVM Language Summit: Object layout workshop
JVM Language Summit: Object layout workshop
 
What's New in the JVM in Java 8?
What's New in the JVM in Java 8?What's New in the JVM in Java 8?
What's New in the JVM in Java 8?
 
DC JUG: Understanding Java Garbage Collection
DC JUG: Understanding Java Garbage CollectionDC JUG: Understanding Java Garbage Collection
DC JUG: Understanding Java Garbage Collection
 
Winning With Java at Market Open
Winning With Java at Market OpenWinning With Java at Market Open
Winning With Java at Market Open
 

Recently uploaded

一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
dakas1
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
XfilesPro
 
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
gapen1
 
Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...
Paul Brebner
 
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data PlatformAlluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio, Inc.
 
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
The Third Creative Media
 
42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert
vaishalijagtap12
 
Photoshop Tutorial for Beginners (2024 Edition)
Photoshop Tutorial for Beginners (2024 Edition)Photoshop Tutorial for Beginners (2024 Edition)
Photoshop Tutorial for Beginners (2024 Edition)
alowpalsadig
 
Boost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management AppsBoost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management Apps
Jhone kinadey
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
Patrick Weigel
 
Kubernetes at Scale: Going Multi-Cluster with Istio
Kubernetes at Scale:  Going Multi-Cluster  with IstioKubernetes at Scale:  Going Multi-Cluster  with Istio
Kubernetes at Scale: Going Multi-Cluster with Istio
Severalnines
 
Transforming Product Development using OnePlan To Boost Efficiency and Innova...
Transforming Product Development using OnePlan To Boost Efficiency and Innova...Transforming Product Development using OnePlan To Boost Efficiency and Innova...
Transforming Product Development using OnePlan To Boost Efficiency and Innova...
OnePlan Solutions
 
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
Luigi Fugaro
 
ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.
Maitrey Patel
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
Alina Yurenko
 
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdfBaha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
brainerhub1
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
kalichargn70th171
 

Recently uploaded (20)

一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
 
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
 
Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...
 
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data PlatformAlluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
 
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
 
42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert
 
Photoshop Tutorial for Beginners (2024 Edition)
Photoshop Tutorial for Beginners (2024 Edition)Photoshop Tutorial for Beginners (2024 Edition)
Photoshop Tutorial for Beginners (2024 Edition)
 
Boost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management AppsBoost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management Apps
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
 
Kubernetes at Scale: Going Multi-Cluster with Istio
Kubernetes at Scale:  Going Multi-Cluster  with IstioKubernetes at Scale:  Going Multi-Cluster  with Istio
Kubernetes at Scale: Going Multi-Cluster with Istio
 
Transforming Product Development using OnePlan To Boost Efficiency and Innova...
Transforming Product Development using OnePlan To Boost Efficiency and Innova...Transforming Product Development using OnePlan To Boost Efficiency and Innova...
Transforming Product Development using OnePlan To Boost Efficiency and Innova...
 
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
 
ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
 
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdfBaha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
 

How NOT to Measure Latency

  • 1. © Copyright Azul Systems 2015 © Copyright Azul Systems 2015 @azulsystems How NOT to Measure Latency Matt Schuetze Product Management Director, Azul Systems 6/12/20151 QCon NY Brooklyn, New York
  • 2. © Copyright Azul Systems 2015 © Copyright Azul Systems 2015 @azulsystems Understanding Latency and Application Responsiveness Matt Schuetze Product Management Director, Azul Systems 6/12/20152 QCon NY Brooklyn, New York
  • 3. © Copyright Azul Systems 2015 © Copyright Azul Systems 2015 @azulsystems The Oh $@%T! talk. Matt Schuetze Product Management Director, Azul Systems 6/12/20153 QCon NY Brooklyn, New York
  • 4. © Copyright Azul Systems 2015 About me: Matt Schuetze  Product Management Director at Azul Systems  Translate Voice of Customer into Zing and Zulu requirements and work items  Sing the praises of Azul efforts through product launches  Azul alternate on JCP exec committee, co-lead Detroit Java User Group  Stand on the shoulders of giants and admit it 6/12/20154
  • 5. © Copyright Azul Systems 2015 Philosophy and motivation What do we actually care about. And why? 6/12/20155
  • 6. © Copyright Azul Systems 2015 Latency Behavior  Latency: The time it took one operation to happen  Each operation occurrence has its own latency  What we care about is how latency behaves  Behavior is a lot more than “the common case was X” 6/12/20156
  • 7. © Copyright Azul Systems 2015©2013 Azul Systems, Inc. 95%’ile The “We only want to show good things” chart We like to look at charts 6/12/20157
  • 8. © Copyright Azul Systems 2015 What do you care about? Do you : Care about latency in your system? Care about the worst case? Care about the 99.99%’ile? Only care about the fastest thing in the day? Only care about the best 50% Only need 90% of operations to meet requirements? 6/12/20158
  • 9. © Copyright Azul Systems 2015©2013 Azul Systems, Inc. We like to rant about latency 6/12/201515
  • 10. © Copyright Azul Systems 2015 99%‘ile is ~60 usec. (but mean is ~210usec) “outliers”, “averages” and other nonsense We nicknamed these spikes “hiccups” 6/12/201516
  • 11. © Copyright Azul Systems 2015 Dispelling standard deviation 6/12/201517
  • 12. © Copyright Azul Systems 2015 Mean = 0.06 msec Std. Deviation (𝞂) = 0.21msec 99.999% = 38.66msec In a normal distribution, These are NOT normal distributions ~184 σ (!!!) away from the mean the 99.999%’ile falls within 4.5 σ Dispelling standard deviation 6/12/201518
  • 13. © Copyright Azul Systems 2015 Is the 99%’ile “rare”? 6/12/201519
  • 14. © Copyright Azul Systems 2015 What are the chances of a single web page view experiencing the 99%’ile latency of: - A single search engine node? - A single Key/Value store node? - A single Database node? - A single CDN request? Cumulative probability… 6/12/201520
  • 15. © Copyright Azul Systems 20156/12/201521
  • 16. © Copyright Azul Systems 2015 Which HTTP response time metric is more “representative” of user experience? The 95%’ile or the 99.9%’ile 6/12/201522
  • 17. © Copyright Azul Systems 2015 Example: A typical user session involves 5 page loads, averaging 40 resources per page. - How many of our users will NOT experience something worse than the 95%’ile? Answer: ~0.003% - How many of our users will experience at least one response that is longer than the 99.9%’ile? Answer: ~18% Gauging user experience 6/12/201523
  • 18. © Copyright Azul Systems 2015 Classic look at response time behavior Response time as a function of load source: IBM CICS server documentation, “understanding response times” Average? Max? Median? 90%? 99.9% 6/12/201524
  • 19. © Copyright Azul Systems 2015 Hiccups are strongly multi-modal  They don’t look anything like a normal distribution  A complete shift from one mode/behavior to another  Mode A: “good”.  Mode B: “Somewhat bad”  Mode C: “terrible”, ...  The real world is not a gentle, smooth curve  Mode transitions are “phase changes” 6/12/201525
  • 20. © Copyright Azul Systems 2015 Proven ways to deal with hiccups Actually characterizing latency Requirements Response Time Percentile plot line 6/12/201526
  • 21. © Copyright Azul Systems 2015 Different throughputs, configurations, or other parameters on one graph Comparing Behavior 6/12/201529
  • 22. © Copyright Azul Systems 2015 Shameless Bragging 6/12/201530
  • 23. © Copyright Azul Systems 2015 Comparing Behaviors - Actual Latency sensitive messaging distribution application: HotSpot vs. Zing 6/12/201531
  • 24. © Copyright Azul Systems 2015 Zing  A standards-compliant JVM for Linux/x86 servers  Eliminates Garbage Collection as a concern for enterprise applications in Java, Scala, or any JVM language  Very wide operating range: Used in both low latency and large scale enterprise application spaces  Decouples scale metrics from response time concerns Transaction rate, data set size, concurrent users, heap size, allocation rate, mutation rate, etc.  Leverages elastic memory for resilient operation 6/12/201532
  • 25. © Copyright Azul Systems 2015 What is Zing good for?  If you have a server-based Java application  And you are running on Linux  And you use using more than ~300MB of memory, up to as high as 1TB memory,  Then Zing will likely deliver superior behavior metrics 6/12/201533
  • 26. © Copyright Azul Systems 2015 Where Zing shines  Low latency Eliminate behavior blips down to the sub-millisecond-units level  Machine-to-machine “stuff” Support higher *sustainable* throughput (one that meets SLAs) Messaging, queues, market data feeds, fraud detection, analytics  Human response times Eliminate user-annoying response time blips. Multi-second and even fraction-of-a-second blips will be completely gone. Support larger memory JVMs *if needed* (e.g. larger virtual user counts, or larger cache, in-memory state, or consolidating multiple instances)  “Large” data and in-memory analytics Make batch stuff “business real time”. Gain super-efficiencies. Cassandra, Spark, Solr, DataGrid, any large dataset in fast motion 6/12/201534
  • 27. © Copyright Azul Systems 2015 An accidental conspiracy... The coordinated omission problem 6/12/201535
  • 28. © Copyright Azul Systems 2015 The coordinated omission problem  Common load testing example: – each “client” issues requests at a certain rate – measure/log response time for each request  So what’s wrong with that? – works only if ALL responses fit within interval – implicit “automatic back off” coordination Begin audience participation exercise now… 6/12/201536
  • 29. © Copyright Azul Systems 2015 It is MUCH more common than you may think... Is coordinated omission rare? 6/12/201537
  • 30. © Copyright Azul Systems 2015 Before Correction After Correcting for Omission JMeter makes this mistake... And so do other tools! 6/12/201538
  • 31. © Copyright Azul Systems 2015 Before Correction After Correction Wrong by 7x Real World Coordinated Omission effects 6/12/201539
  • 32. © Copyright Azul Systems 2015 Uncorrected Data Real World Coordinated Omission effects 6/12/201540
  • 33. © Copyright Azul Systems 2015 Uncorrected Data Corrected for Coordinated Omission Real World Coordinated Omission effects 6/12/201541
  • 34. © Copyright Azul Systems 2015 A ~2500x difference in reported percentile levels for the problem that Zing eliminates Zing “other” JVM Real World Coordinated Omission effects Why I care 6/12/201542
  • 35. © Copyright Azul Systems 2015 How “real” people react 6/12/201543
  • 36. © Copyright Azul Systems 2015 Suggestions  Whatever your measurement technique is, test it.  Run your measurement method against artificial systems that create hypothetical pauses scenarios. See if your reported results agree with how you would describe that system behavior  Don’t waste time analyzing until you establish sanity  Don’t ever use or derive from standard deviation  Always measure Max time. Consider what it means...  Be suspicious.  Measure %‘iles. Lots of them. 6/12/201544
  • 37. © Copyright Azul Systems 2015 HdrHistogram 6/12/201545
  • 38. © Copyright Azul Systems 2015 Then you need both good dynamic range and good resolution HdrHistogram If you want to be able to produce charts like this... 6/12/201546
  • 39. © Copyright Azul Systems 20156/12/201548
  • 40. © Copyright Azul Systems 20156/12/201549
  • 41. © Copyright Azul Systems 2015 Shape of Constant latency 10K fixed line latency 6/12/201550
  • 42. © Copyright Azul Systems 2015 Shape of Gaussian latency 10K fixed line latency with added Gaussian noise (std dev. = 5K) 6/12/201551
  • 43. © Copyright Azul Systems 2015 Shape of Random latency 10K fixed line latency with added Gaussian (std dev. = 5K) vs. random (+5K) 6/12/201552
  • 44. © Copyright Azul Systems 2015 Shape of Stalling latency 10K fixed base, stall magnitude of 50K stall likelihood = 0.00005 (interval = 100) 6/12/201553
  • 45. © Copyright Azul Systems 2015 Shape of Queuing latency 10K fixed base, occasional bursts of 500 msgs handling time = 100, burst likelihood = 0.00005 6/12/201554
  • 46. © Copyright Azul Systems 2015 Shape of Multi Modal latency 10K mode0 70K mode1 (likelihood 0.01) 180K mode2 (likelihood 0.00001) 6/12/201555
  • 47. © Copyright Azul Systems 2015 And this what the real(?) world sometimes looks like… 6/12/201556
  • 48. © Copyright Azul Systems 2015 Real world “deductive reasoning” 6/12/201557
  • 49. © Copyright Azul Systems 2015 http://www.jhiccup.org 6/12/201558
  • 50. © Copyright Azul Systems 2015 jHiccup 6/12/201559
  • 51. © Copyright Azul Systems 2015 Discontinuity in Java execution 6/12/201560
  • 52. © Copyright Azul Systems 2015 Examples 6/12/201563
  • 53. © Copyright Azul Systems 2015 Oracle HotSpot ParallelGC Oracle HotSpot G1 1GB live set in 8GB heap, same app, same HotSpot, different GC 6/12/201564
  • 54. © Copyright Azul Systems 2015 Oracle HotSpot CMS Zing Pauseless GC 1GB live set in 8GB heap, same app, different JVM/GC 6/12/201565
  • 55. © Copyright Azul Systems 2015 Oracle HotSpot CMS Zing Pauseless GC 1GB live set in 8GB heap, same app, different JVM/GC- drawn to scale 6/12/201566
  • 56. © Copyright Azul Systems 2015 Zing Low latency trading application Oracle HotSpot (pure NewGen) Zing Pauseless GC 6/12/201567
  • 57. © Copyright Azul Systems 2015 Oracle HotSpot (pure newgen) ZingOracle HotSpot (pure newgen) Low latency trading application Oracle HotSpot (pure NewGen) Zing Pauseless GC 6/12/201568
  • 58. © Copyright Azul Systems 2015 Oracle HotSpot (pure newgen) Zing Low latency trading application – drawn to scale Oracle HotSpot (pure NewGen) Zing Pauseless GC 6/12/201569
  • 59. © Copyright Azul Systems 2015 Q & A www.azul.com www.jhiccup.com www.hdrhistogram.com @schuetzematt 6/12/201570