SlideShare a Scribd company logo
1 of 50
Download to read offline
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Understanding Latency and
Response Time Behavior
Pitfalls, Lessons and Tools
Matt Schuetze
Director of Product Management
Azul Systems
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Latency Behavior
Latency: The time it took one operation to happen
Each operation occurrence has its own latency
So we need to measure again, and again, and again...
What we care about is how latency behaves
Behavior is more than “what was the common case?”
©2014 Azul Systems, Inc.
What do you care about?
Do you :
Care about latency in your system?
Care about the worst case?
Care about the 99.99%?
Only care about the fastest thing in the day?
Only care about the best 50%
Only need 90% of operations to meet needs?
Care if “only” 1% of operations are painfully slow?
Care if 90% of users see outliers every hour?
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Latency “wishful thinking”
We know how to compute
averages & std. deviation, etc.
Wouldn’t it be nice if latency had a
normal distribution?
The average, 90%, 99%, std.
deviation, etc. can give us a “feel”
for the rest of the distribution,
right?
If 99% of the stuff behaves well,
how bad can the rest be, really?
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
The real world: latency distribution
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
The real world: latency distribution
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
The real world: latency distribution
99% better than 0.5 msec
99.99% better than 5 msec
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
The real world: latency distribution
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
The real world: latency distribution
Mean = 0.06 msec
Std. Deviation (𝞂) = 0.21msec
99.999% = 38.66msec
~184 σ (!!!) away from the mean
In a normal distribution,
the 99.999%’ile falls within 4.5 σ
These are NOT normal distributions
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
The real world: “outliers”
99%‘ile is ~60 usec
Max is ~30,000% higher
than “typical”
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
A classic look at response time
behavior
Response time as a function of load
source: IBM CICS server documentation, “understanding response times”
Average?
Max?
Median?
90%?
99.9%
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Response time over time
When we measure behavior over time, we often see:
source: ZOHO QEngine White Paper: performance testing report analysis
“Hiccups”
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Hiccups are [typically]
strongly multi-modal
They don’t look anything like a normal distribution
They usually look like periodic freezes
A complete shift from one mode/behavior to another
Mode A: “good”.
Mode B: “Somewhat bad”
Mode C: “terrible”, ...
....
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Common ways people deal with hiccups
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Common ways people deal with hiccups
Averages and Standard Deviation
Always
Wrong!
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Better ways people can deal with hiccups
Actually characterizing latency
Requirements
Response Time
Percentile plot
line
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Requirements
Why we measure latency and response times
to begin with...
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Latency:
Stating Requirements
Requirements describe how latency should behave
Useful Latency requirements are usually stated as a
PASS/FAIL test against some predefined criteria
Different applications have different needs
Requirements should reflect application needs
Measurements should provide data to evaluate requirements
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Establishing Requirements
an interactive interview (or thought) process
Q: What are your latency requirements?
A: We need an avg. response of 20 msec
Q: Ok. Typical/average of 20 msec... So what is the worst case requirement?
A: We don’t have one
Q: So it’s ok for some things to take more than 5 hours?
A: No way in H%%&!
Q: So I’ll write down “5 hours worst case...”
A: No. That’s not what I said. Make that “nothing worse than 100 msec”
Q: Are you sure? Even if it’s only two times a day?
A: Ok... Make it “nothing worse than 2 seconds...”
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Establishing Requirements
an interactive interview (or thought) process
Ok. So we need a typical of 20msec, and a worst case of 2 seconds. How often is it ok to
have a 1 second response?
A: (Annoyed) I thought you said only a few times a day
Q: That was for the worst case. But if half the results are better than 20 msec, is it ok for the
other half to be just short of 2 seconds? What % of the time are you willing to take a 1
second, or a half second hiccup? Or some other level?
A: Oh. Let’s see. We have to better than 50 msec 90% of the time, or we’ll be losing money
even when we are fast the rest of the time. We need to be better than 500 msec 99.9% of
the time, or our customers will complain and go elsewhere
Now we have a service level expectation:
50% better than 20 msec
90% better than 50 msec
99.9% better than 500 msec
100% better than 2 seconds
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Latency does not live in a vacuum
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Remember this?
How much load can this system handle?
Where the
sysadmin is
willing to go
What the
marketing
benchmarks
will say
Where users
complain
Sustainable
Throughput
Level
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Comparing behavior under different throughputs
and/or configurations
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Latency behavior under different throughputs, configurations
latency sensitive messaging distribution application
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
The coordinated omission problem
An accidental conspiracy...
©2014 Azul Systems, Inc.
Coordinated Omission (“CO”) is the measurement error which
is introduced by naively recording requests, sorting them and
reporting the result as the percentile distribution of the request
latency.
Any recording method, synchronous or asynchronous, which
results in even partially coordinating sample times with the
system under test, and as a result avoids recording some of
the originally intended samples will exhibit CO. Synchronous
methods tend to naturally exhibit this sort of nativity when
intended sample time are not kept far apart from each other.
Synopsis of Coordinated Omission
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Real World Coordinated Omission effects
Uncorrected Data
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Real World Coordinated Omission effects
Uncorrected Data
Corrected for
Coordinated
Omission
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Real World Coordinated Omission effects
(Why I care)
A ~2500x
difference in
reported
percentile levels
for the problem
that Zing
eliminates
Zing
“other” JVM
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Suggestions
Whatever your measurement technique is, TEST IT.
Run your measurement method against artificial systems
that create hypothetical pauses scenarios. See if your
reported results agree with how you would describe that
system behavior
Don’t waste time analyzing until you establish sanity
Don’t EVER use or derive from std. deviation
ALWAYS measure Max time. Consider what it means... Be
suspicious.
Measure %‘iles. Lots of them.
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
HdrHistogram
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
HdrHistogram
If you want to be able to produce graphs like this...
You need both good dynamic range
and good resolution
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
HdrHistogram background
Goal: Collect data for good latency characterization...
Including acceptable precision at and between varying percentile levels
Existing alternatives
Record all data, analyze later (e.g. sort and get 99.9%‘ile).
Record in traditional histograms
Traditional Histograms: Linear bins, Logarithmic bins, or
Arbitrary bins
Linear requires lots of storage to cover range with good resolution
Logarithmic covers wide range but has terrible precisions
Arbitrary is.... arbitrary. Works only when you have a good feel for the
interesting parts of the value range
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
HdrHistogram
A High Dynamic Range Histogram
Covers a configurable dynamic value range
At configurable precision (expressed as number of significant digits)
For Example:
Track values between 1 microsecond and 1 hour
With 3 decimal points of resolution
Built-in [optional] compensation for Coordinated
Omission
Open Source
On github, released to the public domain, creative commons CC0
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Plotting HdrHistogram output
Convenient for plotting and comparing test results
X-Axis
Y-Axis
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
jHiccup
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
jHiccup
A tool for capturing and displaying platform hiccups
Records any observed non-continuity of the underlying platform
Plots results in simple, consistent format
Simple, non-intrusive
As simple as adding jHiccup.jar as a java agent:
% java -javaagent=jHiccup.jar myApp myflags
or attaching jHiccup to a running process:
% jHiccup -p <pid>
Adds a background thread that samples time @ 1000/sec into an
HdrHistogram
Open Source. Released to the public domain
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Examples
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Drawn to scale
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Oracle HotSpot (pure newgen) Zing
Low latency trading application
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Oracle HotSpot (pure newgen) Zing
Low latency trading application
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Low latency - Drawn to scale
Oracle HotSpot (pure newgen) Zing
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Shameless bragging
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Zing
A JVM for Linux/x86 servers
ELIMINATES Garbage Collection as a concern for enterprise
applications
Very wide operating range: Used in both low latency and
large scale enterprise application spaces
Decouples scale metrics from response time concerns
Transaction rate, data set size, concurrent users, heap
size, allocation rate, mutation rate, etc.
Leverages elastic memory for resilient operation
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Zing in Low Latency Java
A full solution to GC-related issues
Allows you to code in Java instead of “Java”
You can actually use third party code, even if it allocates stuff
You can code in idiomatic Java without worries
Faster time-to-market
less duct-tape engineering
Not just about GC.
We look at anything that makes a JVM “hiccup”.
“ReadyNow!”: Addresses “Warmup” problems
E.g. deoptimization storms at market open
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Takeaways
Standard Deviation and application latency should never
show up on the same page...
If you haven’t stated percentiles and a Max, you haven’t
specified your requirements
Measuring throughput without latency behavior is [usually]
meaningless
Mistakes in measurement/analysis can cause orders-of-
magnitude errors and lead to bad business decisions
jHiccup and HdrHistogram are pretty useful
The Zing JVM is cool...
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Q & A
http://www.azul.com
http://www.jhiccup.com
https://giltene.github.com/HdrHistogram
https://github.com/LatencyUtils/LatencyUtils

More Related Content

Viewers also liked

Attitudes and job_satisfaction
Attitudes and job_satisfactionAttitudes and job_satisfaction
Attitudes and job_satisfactionAbhishek Bhalla
 
Attitude - Organizational Behaviour
Attitude - Organizational Behaviour Attitude - Organizational Behaviour
Attitude - Organizational Behaviour Dr. Rajasshrie Pillai
 
Conflict management1
Conflict management1Conflict management1
Conflict management1renudhawan
 
Chapter 3 attitudes & job satisfaction (2)
Chapter 3  attitudes & job satisfaction (2)Chapter 3  attitudes & job satisfaction (2)
Chapter 3 attitudes & job satisfaction (2)Qamar Farooq
 
Attitude & Job Satisfaction
Attitude & Job SatisfactionAttitude & Job Satisfaction
Attitude & Job SatisfactionSaravanan rulez
 
Job satisfaction in Organizational behaviour
Job satisfaction in Organizational behaviourJob satisfaction in Organizational behaviour
Job satisfaction in Organizational behaviourRajesh Gautham
 

Viewers also liked (6)

Attitudes and job_satisfaction
Attitudes and job_satisfactionAttitudes and job_satisfaction
Attitudes and job_satisfaction
 
Attitude - Organizational Behaviour
Attitude - Organizational Behaviour Attitude - Organizational Behaviour
Attitude - Organizational Behaviour
 
Conflict management1
Conflict management1Conflict management1
Conflict management1
 
Chapter 3 attitudes & job satisfaction (2)
Chapter 3  attitudes & job satisfaction (2)Chapter 3  attitudes & job satisfaction (2)
Chapter 3 attitudes & job satisfaction (2)
 
Attitude & Job Satisfaction
Attitude & Job SatisfactionAttitude & Job Satisfaction
Attitude & Job Satisfaction
 
Job satisfaction in Organizational behaviour
Job satisfaction in Organizational behaviourJob satisfaction in Organizational behaviour
Job satisfaction in Organizational behaviour
 

Similar to Intelligent Trading Summit NY 2014: Understanding Latency: Key Lessons and Tools

Enabling Java in Latency Sensitive Applications by Gil Tene, CTO, Azul Systems
Enabling Java in Latency Sensitive Applications by Gil Tene, CTO, Azul SystemsEnabling Java in Latency Sensitive Applications by Gil Tene, CTO, Azul Systems
Enabling Java in Latency Sensitive Applications by Gil Tene, CTO, Azul SystemszuluJDK
 
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive EnvironmentsEnabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive EnvironmentsC4Media
 
Understanding Application Hiccups - and What You Can Do About Them
Understanding Application Hiccups - and What You Can Do About ThemUnderstanding Application Hiccups - and What You Can Do About Them
Understanding Application Hiccups - and What You Can Do About ThemAzul Systems Inc.
 
Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Brian Brazil
 
Prometheus - Open Source Forum Japan
Prometheus  - Open Source Forum JapanPrometheus  - Open Source Forum Japan
Prometheus - Open Source Forum JapanBrian Brazil
 
Architectural Patterns of Resilient Distributed Systems
 Architectural Patterns of Resilient Distributed Systems Architectural Patterns of Resilient Distributed Systems
Architectural Patterns of Resilient Distributed SystemsInes Sombra
 
Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014Brian Troutwine
 
Understanding Hardware Transactional Memory
Understanding Hardware Transactional MemoryUnderstanding Hardware Transactional Memory
Understanding Hardware Transactional MemoryC4Media
 
Ruby3x3: How are we going to measure 3x
Ruby3x3: How are we going to measure 3xRuby3x3: How are we going to measure 3x
Ruby3x3: How are we going to measure 3xMatthew Gaudet
 
Overview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesOverview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesAshutosh Agarwal
 
“Sensu and Sensibility” - The Story of a Journey From #monitoringsucks to #mo...
“Sensu and Sensibility” - The Story of a Journey From #monitoringsucks to #mo...“Sensu and Sensibility” - The Story of a Journey From #monitoringsucks to #mo...
“Sensu and Sensibility” - The Story of a Journey From #monitoringsucks to #mo...Puppet
 
Automatic Assessment of Failure Recovery in Erlang Applications
Automatic Assessment of Failure Recovery in Erlang ApplicationsAutomatic Assessment of Failure Recovery in Erlang Applications
Automatic Assessment of Failure Recovery in Erlang ApplicationsJan Henry Nystrom
 
[CLASS 2014] Palestra Técnica - Jason Larsen
[CLASS 2014] Palestra Técnica - Jason Larsen[CLASS 2014] Palestra Técnica - Jason Larsen
[CLASS 2014] Palestra Técnica - Jason LarsenTI Safe
 
Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)Brian Brazil
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)Brian Brazil
 
Misery Metrics & Consequences
Misery Metrics & ConsequencesMisery Metrics & Consequences
Misery Metrics & ConsequencesScyllaDB
 
How to Monitoring the SRE Golden Signals (E-Book)
How to Monitoring the SRE Golden Signals (E-Book)How to Monitoring the SRE Golden Signals (E-Book)
How to Monitoring the SRE Golden Signals (E-Book)Siglos
 
DC JUG: Understanding Java Garbage Collection
DC JUG: Understanding Java Garbage CollectionDC JUG: Understanding Java Garbage Collection
DC JUG: Understanding Java Garbage CollectionAzul Systems, Inc.
 
Normal accidents and outpatient surgeries
Normal accidents and outpatient surgeriesNormal accidents and outpatient surgeries
Normal accidents and outpatient surgeriesJonathan Creasy
 

Similar to Intelligent Trading Summit NY 2014: Understanding Latency: Key Lessons and Tools (20)

Enabling Java in Latency Sensitive Applications by Gil Tene, CTO, Azul Systems
Enabling Java in Latency Sensitive Applications by Gil Tene, CTO, Azul SystemsEnabling Java in Latency Sensitive Applications by Gil Tene, CTO, Azul Systems
Enabling Java in Latency Sensitive Applications by Gil Tene, CTO, Azul Systems
 
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive EnvironmentsEnabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
 
Understanding Application Hiccups - and What You Can Do About Them
Understanding Application Hiccups - and What You Can Do About ThemUnderstanding Application Hiccups - and What You Can Do About Them
Understanding Application Hiccups - and What You Can Do About Them
 
How NOT to Measure Latency
How NOT to Measure LatencyHow NOT to Measure Latency
How NOT to Measure Latency
 
Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)
 
Prometheus - Open Source Forum Japan
Prometheus  - Open Source Forum JapanPrometheus  - Open Source Forum Japan
Prometheus - Open Source Forum Japan
 
Architectural Patterns of Resilient Distributed Systems
 Architectural Patterns of Resilient Distributed Systems Architectural Patterns of Resilient Distributed Systems
Architectural Patterns of Resilient Distributed Systems
 
Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014
 
Understanding Hardware Transactional Memory
Understanding Hardware Transactional MemoryUnderstanding Hardware Transactional Memory
Understanding Hardware Transactional Memory
 
Ruby3x3: How are we going to measure 3x
Ruby3x3: How are we going to measure 3xRuby3x3: How are we going to measure 3x
Ruby3x3: How are we going to measure 3x
 
Overview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesOverview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practices
 
“Sensu and Sensibility” - The Story of a Journey From #monitoringsucks to #mo...
“Sensu and Sensibility” - The Story of a Journey From #monitoringsucks to #mo...“Sensu and Sensibility” - The Story of a Journey From #monitoringsucks to #mo...
“Sensu and Sensibility” - The Story of a Journey From #monitoringsucks to #mo...
 
Automatic Assessment of Failure Recovery in Erlang Applications
Automatic Assessment of Failure Recovery in Erlang ApplicationsAutomatic Assessment of Failure Recovery in Erlang Applications
Automatic Assessment of Failure Recovery in Erlang Applications
 
[CLASS 2014] Palestra Técnica - Jason Larsen
[CLASS 2014] Palestra Técnica - Jason Larsen[CLASS 2014] Palestra Técnica - Jason Larsen
[CLASS 2014] Palestra Técnica - Jason Larsen
 
Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
 
Misery Metrics & Consequences
Misery Metrics & ConsequencesMisery Metrics & Consequences
Misery Metrics & Consequences
 
How to Monitoring the SRE Golden Signals (E-Book)
How to Monitoring the SRE Golden Signals (E-Book)How to Monitoring the SRE Golden Signals (E-Book)
How to Monitoring the SRE Golden Signals (E-Book)
 
DC JUG: Understanding Java Garbage Collection
DC JUG: Understanding Java Garbage CollectionDC JUG: Understanding Java Garbage Collection
DC JUG: Understanding Java Garbage Collection
 
Normal accidents and outpatient surgeries
Normal accidents and outpatient surgeriesNormal accidents and outpatient surgeries
Normal accidents and outpatient surgeries
 

More from Azul Systems Inc.

Advancements ingc andc4overview_linkedin_oct2017
Advancements ingc andc4overview_linkedin_oct2017Advancements ingc andc4overview_linkedin_oct2017
Advancements ingc andc4overview_linkedin_oct2017Azul Systems Inc.
 
Understanding GC, JavaOne 2017
Understanding GC, JavaOne 2017Understanding GC, JavaOne 2017
Understanding GC, JavaOne 2017Azul Systems Inc.
 
What's New in the JVM in Java 8?
What's New in the JVM in Java 8?What's New in the JVM in Java 8?
What's New in the JVM in Java 8?Azul Systems Inc.
 
ObjectLayout: Closing the (last?) inherent C vs. Java speed gap
ObjectLayout: Closing the (last?) inherent C vs. Java speed gapObjectLayout: Closing the (last?) inherent C vs. Java speed gap
ObjectLayout: Closing the (last?) inherent C vs. Java speed gapAzul Systems Inc.
 
Priming Java for Speed at Market Open
Priming Java for Speed at Market OpenPriming Java for Speed at Market Open
Priming Java for Speed at Market OpenAzul Systems Inc.
 
Azul Systems open source guide
Azul Systems open source guideAzul Systems open source guide
Azul Systems open source guideAzul Systems Inc.
 
Understanding Java Garbage Collection
Understanding Java Garbage CollectionUnderstanding Java Garbage Collection
Understanding Java Garbage CollectionAzul Systems Inc.
 
The evolution of OpenJDK: From Java's beginnings to 2014
The evolution of OpenJDK: From Java's beginnings to 2014The evolution of OpenJDK: From Java's beginnings to 2014
The evolution of OpenJDK: From Java's beginnings to 2014Azul Systems Inc.
 
Push Technology's latest data distribution benchmark with Solarflare and Zing
Push Technology's latest data distribution benchmark with Solarflare and ZingPush Technology's latest data distribution benchmark with Solarflare and Zing
Push Technology's latest data distribution benchmark with Solarflare and ZingAzul Systems Inc.
 
Webinar: Zing Vision: Answering your toughest production Java performance que...
Webinar: Zing Vision: Answering your toughest production Java performance que...Webinar: Zing Vision: Answering your toughest production Java performance que...
Webinar: Zing Vision: Answering your toughest production Java performance que...Azul Systems Inc.
 
Speculative Locking: Breaking the Scale Barrier (JAOO 2005)
Speculative Locking: Breaking the Scale Barrier (JAOO 2005)Speculative Locking: Breaking the Scale Barrier (JAOO 2005)
Speculative Locking: Breaking the Scale Barrier (JAOO 2005)Azul Systems Inc.
 
The Java Evolution Mismatch - Why You Need a Better JVM
The Java Evolution Mismatch - Why You Need a Better JVMThe Java Evolution Mismatch - Why You Need a Better JVM
The Java Evolution Mismatch - Why You Need a Better JVMAzul Systems Inc.
 
Towards a Scalable Non-Blocking Coding Style
Towards a Scalable Non-Blocking Coding StyleTowards a Scalable Non-Blocking Coding Style
Towards a Scalable Non-Blocking Coding StyleAzul Systems Inc.
 
Experiences with Debugging Data Races
Experiences with Debugging Data RacesExperiences with Debugging Data Races
Experiences with Debugging Data RacesAzul Systems Inc.
 
Lock-Free, Wait-Free Hash Table
Lock-Free, Wait-Free Hash TableLock-Free, Wait-Free Hash Table
Lock-Free, Wait-Free Hash TableAzul Systems Inc.
 
How NOT to Write a Microbenchmark
How NOT to Write a MicrobenchmarkHow NOT to Write a Microbenchmark
How NOT to Write a MicrobenchmarkAzul Systems Inc.
 
The Art of Java Benchmarking
The Art of Java BenchmarkingThe Art of Java Benchmarking
The Art of Java BenchmarkingAzul Systems Inc.
 
Azul Zulu on Azure Overview -- OpenTech CEE Workshop, Warsaw, Poland
Azul Zulu on Azure Overview -- OpenTech CEE Workshop, Warsaw, PolandAzul Zulu on Azure Overview -- OpenTech CEE Workshop, Warsaw, Poland
Azul Zulu on Azure Overview -- OpenTech CEE Workshop, Warsaw, PolandAzul Systems Inc.
 

More from Azul Systems Inc. (20)

Advancements ingc andc4overview_linkedin_oct2017
Advancements ingc andc4overview_linkedin_oct2017Advancements ingc andc4overview_linkedin_oct2017
Advancements ingc andc4overview_linkedin_oct2017
 
Understanding GC, JavaOne 2017
Understanding GC, JavaOne 2017Understanding GC, JavaOne 2017
Understanding GC, JavaOne 2017
 
What's New in the JVM in Java 8?
What's New in the JVM in Java 8?What's New in the JVM in Java 8?
What's New in the JVM in Java 8?
 
ObjectLayout: Closing the (last?) inherent C vs. Java speed gap
ObjectLayout: Closing the (last?) inherent C vs. Java speed gapObjectLayout: Closing the (last?) inherent C vs. Java speed gap
ObjectLayout: Closing the (last?) inherent C vs. Java speed gap
 
Priming Java for Speed at Market Open
Priming Java for Speed at Market OpenPriming Java for Speed at Market Open
Priming Java for Speed at Market Open
 
Azul Systems open source guide
Azul Systems open source guideAzul Systems open source guide
Azul Systems open source guide
 
Understanding Java Garbage Collection
Understanding Java Garbage CollectionUnderstanding Java Garbage Collection
Understanding Java Garbage Collection
 
The evolution of OpenJDK: From Java's beginnings to 2014
The evolution of OpenJDK: From Java's beginnings to 2014The evolution of OpenJDK: From Java's beginnings to 2014
The evolution of OpenJDK: From Java's beginnings to 2014
 
Push Technology's latest data distribution benchmark with Solarflare and Zing
Push Technology's latest data distribution benchmark with Solarflare and ZingPush Technology's latest data distribution benchmark with Solarflare and Zing
Push Technology's latest data distribution benchmark with Solarflare and Zing
 
Webinar: Zing Vision: Answering your toughest production Java performance que...
Webinar: Zing Vision: Answering your toughest production Java performance que...Webinar: Zing Vision: Answering your toughest production Java performance que...
Webinar: Zing Vision: Answering your toughest production Java performance que...
 
Speculative Locking: Breaking the Scale Barrier (JAOO 2005)
Speculative Locking: Breaking the Scale Barrier (JAOO 2005)Speculative Locking: Breaking the Scale Barrier (JAOO 2005)
Speculative Locking: Breaking the Scale Barrier (JAOO 2005)
 
Java vs. C/C++
Java vs. C/C++Java vs. C/C++
Java vs. C/C++
 
What's Inside a JVM?
What's Inside a JVM?What's Inside a JVM?
What's Inside a JVM?
 
The Java Evolution Mismatch - Why You Need a Better JVM
The Java Evolution Mismatch - Why You Need a Better JVMThe Java Evolution Mismatch - Why You Need a Better JVM
The Java Evolution Mismatch - Why You Need a Better JVM
 
Towards a Scalable Non-Blocking Coding Style
Towards a Scalable Non-Blocking Coding StyleTowards a Scalable Non-Blocking Coding Style
Towards a Scalable Non-Blocking Coding Style
 
Experiences with Debugging Data Races
Experiences with Debugging Data RacesExperiences with Debugging Data Races
Experiences with Debugging Data Races
 
Lock-Free, Wait-Free Hash Table
Lock-Free, Wait-Free Hash TableLock-Free, Wait-Free Hash Table
Lock-Free, Wait-Free Hash Table
 
How NOT to Write a Microbenchmark
How NOT to Write a MicrobenchmarkHow NOT to Write a Microbenchmark
How NOT to Write a Microbenchmark
 
The Art of Java Benchmarking
The Art of Java BenchmarkingThe Art of Java Benchmarking
The Art of Java Benchmarking
 
Azul Zulu on Azure Overview -- OpenTech CEE Workshop, Warsaw, Poland
Azul Zulu on Azure Overview -- OpenTech CEE Workshop, Warsaw, PolandAzul Zulu on Azure Overview -- OpenTech CEE Workshop, Warsaw, Poland
Azul Zulu on Azure Overview -- OpenTech CEE Workshop, Warsaw, Poland
 

Recently uploaded

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 

Intelligent Trading Summit NY 2014: Understanding Latency: Key Lessons and Tools

  • 1. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Understanding Latency and Response Time Behavior Pitfalls, Lessons and Tools Matt Schuetze Director of Product Management Azul Systems
  • 2. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Latency Behavior Latency: The time it took one operation to happen Each operation occurrence has its own latency So we need to measure again, and again, and again... What we care about is how latency behaves Behavior is more than “what was the common case?”
  • 3. ©2014 Azul Systems, Inc. What do you care about? Do you : Care about latency in your system? Care about the worst case? Care about the 99.99%? Only care about the fastest thing in the day? Only care about the best 50% Only need 90% of operations to meet needs? Care if “only” 1% of operations are painfully slow? Care if 90% of users see outliers every hour?
  • 4. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Latency “wishful thinking” We know how to compute averages & std. deviation, etc. Wouldn’t it be nice if latency had a normal distribution? The average, 90%, 99%, std. deviation, etc. can give us a “feel” for the rest of the distribution, right? If 99% of the stuff behaves well, how bad can the rest be, really?
  • 5. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. The real world: latency distribution
  • 6. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. The real world: latency distribution
  • 7. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. The real world: latency distribution 99% better than 0.5 msec 99.99% better than 5 msec
  • 8. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. The real world: latency distribution
  • 9. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. The real world: latency distribution Mean = 0.06 msec Std. Deviation (𝞂) = 0.21msec 99.999% = 38.66msec ~184 σ (!!!) away from the mean In a normal distribution, the 99.999%’ile falls within 4.5 σ These are NOT normal distributions
  • 10. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. The real world: “outliers” 99%‘ile is ~60 usec Max is ~30,000% higher than “typical”
  • 11. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. A classic look at response time behavior Response time as a function of load source: IBM CICS server documentation, “understanding response times” Average? Max? Median? 90%? 99.9%
  • 12. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Response time over time When we measure behavior over time, we often see: source: ZOHO QEngine White Paper: performance testing report analysis “Hiccups”
  • 13. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Hiccups are [typically] strongly multi-modal They don’t look anything like a normal distribution They usually look like periodic freezes A complete shift from one mode/behavior to another Mode A: “good”. Mode B: “Somewhat bad” Mode C: “terrible”, ... ....
  • 14. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Common ways people deal with hiccups
  • 15. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Common ways people deal with hiccups Averages and Standard Deviation Always Wrong!
  • 16. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Better ways people can deal with hiccups Actually characterizing latency Requirements Response Time Percentile plot line
  • 17. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Requirements Why we measure latency and response times to begin with...
  • 18. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Latency: Stating Requirements Requirements describe how latency should behave Useful Latency requirements are usually stated as a PASS/FAIL test against some predefined criteria Different applications have different needs Requirements should reflect application needs Measurements should provide data to evaluate requirements
  • 19. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Establishing Requirements an interactive interview (or thought) process Q: What are your latency requirements? A: We need an avg. response of 20 msec Q: Ok. Typical/average of 20 msec... So what is the worst case requirement? A: We don’t have one Q: So it’s ok for some things to take more than 5 hours? A: No way in H%%&! Q: So I’ll write down “5 hours worst case...” A: No. That’s not what I said. Make that “nothing worse than 100 msec” Q: Are you sure? Even if it’s only two times a day? A: Ok... Make it “nothing worse than 2 seconds...”
  • 20. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Establishing Requirements an interactive interview (or thought) process Ok. So we need a typical of 20msec, and a worst case of 2 seconds. How often is it ok to have a 1 second response? A: (Annoyed) I thought you said only a few times a day Q: That was for the worst case. But if half the results are better than 20 msec, is it ok for the other half to be just short of 2 seconds? What % of the time are you willing to take a 1 second, or a half second hiccup? Or some other level? A: Oh. Let’s see. We have to better than 50 msec 90% of the time, or we’ll be losing money even when we are fast the rest of the time. We need to be better than 500 msec 99.9% of the time, or our customers will complain and go elsewhere Now we have a service level expectation: 50% better than 20 msec 90% better than 50 msec 99.9% better than 500 msec 100% better than 2 seconds
  • 21. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Latency does not live in a vacuum
  • 22. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Remember this? How much load can this system handle? Where the sysadmin is willing to go What the marketing benchmarks will say Where users complain Sustainable Throughput Level
  • 23. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Comparing behavior under different throughputs and/or configurations
  • 24. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Latency behavior under different throughputs, configurations latency sensitive messaging distribution application
  • 25. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. The coordinated omission problem An accidental conspiracy...
  • 26. ©2014 Azul Systems, Inc. Coordinated Omission (“CO”) is the measurement error which is introduced by naively recording requests, sorting them and reporting the result as the percentile distribution of the request latency. Any recording method, synchronous or asynchronous, which results in even partially coordinating sample times with the system under test, and as a result avoids recording some of the originally intended samples will exhibit CO. Synchronous methods tend to naturally exhibit this sort of nativity when intended sample time are not kept far apart from each other. Synopsis of Coordinated Omission
  • 27. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Real World Coordinated Omission effects Uncorrected Data
  • 28. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Real World Coordinated Omission effects Uncorrected Data Corrected for Coordinated Omission
  • 29. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Real World Coordinated Omission effects (Why I care) A ~2500x difference in reported percentile levels for the problem that Zing eliminates Zing “other” JVM
  • 30. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Suggestions Whatever your measurement technique is, TEST IT. Run your measurement method against artificial systems that create hypothetical pauses scenarios. See if your reported results agree with how you would describe that system behavior Don’t waste time analyzing until you establish sanity Don’t EVER use or derive from std. deviation ALWAYS measure Max time. Consider what it means... Be suspicious. Measure %‘iles. Lots of them.
  • 31. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. HdrHistogram
  • 32. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. HdrHistogram If you want to be able to produce graphs like this... You need both good dynamic range and good resolution
  • 33. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. HdrHistogram background Goal: Collect data for good latency characterization... Including acceptable precision at and between varying percentile levels Existing alternatives Record all data, analyze later (e.g. sort and get 99.9%‘ile). Record in traditional histograms Traditional Histograms: Linear bins, Logarithmic bins, or Arbitrary bins Linear requires lots of storage to cover range with good resolution Logarithmic covers wide range but has terrible precisions Arbitrary is.... arbitrary. Works only when you have a good feel for the interesting parts of the value range
  • 34. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. HdrHistogram A High Dynamic Range Histogram Covers a configurable dynamic value range At configurable precision (expressed as number of significant digits) For Example: Track values between 1 microsecond and 1 hour With 3 decimal points of resolution Built-in [optional] compensation for Coordinated Omission Open Source On github, released to the public domain, creative commons CC0
  • 35. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Plotting HdrHistogram output Convenient for plotting and comparing test results X-Axis Y-Axis
  • 36. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. jHiccup
  • 37. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. jHiccup A tool for capturing and displaying platform hiccups Records any observed non-continuity of the underlying platform Plots results in simple, consistent format Simple, non-intrusive As simple as adding jHiccup.jar as a java agent: % java -javaagent=jHiccup.jar myApp myflags or attaching jHiccup to a running process: % jHiccup -p <pid> Adds a background thread that samples time @ 1000/sec into an HdrHistogram Open Source. Released to the public domain
  • 38. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Examples
  • 39. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
  • 40. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
  • 41. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
  • 42. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Drawn to scale
  • 43. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Oracle HotSpot (pure newgen) Zing Low latency trading application
  • 44. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Oracle HotSpot (pure newgen) Zing Low latency trading application
  • 45. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Low latency - Drawn to scale Oracle HotSpot (pure newgen) Zing
  • 46. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Shameless bragging
  • 47. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Zing A JVM for Linux/x86 servers ELIMINATES Garbage Collection as a concern for enterprise applications Very wide operating range: Used in both low latency and large scale enterprise application spaces Decouples scale metrics from response time concerns Transaction rate, data set size, concurrent users, heap size, allocation rate, mutation rate, etc. Leverages elastic memory for resilient operation
  • 48. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Zing in Low Latency Java A full solution to GC-related issues Allows you to code in Java instead of “Java” You can actually use third party code, even if it allocates stuff You can code in idiomatic Java without worries Faster time-to-market less duct-tape engineering Not just about GC. We look at anything that makes a JVM “hiccup”. “ReadyNow!”: Addresses “Warmup” problems E.g. deoptimization storms at market open
  • 49. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Takeaways Standard Deviation and application latency should never show up on the same page... If you haven’t stated percentiles and a Max, you haven’t specified your requirements Measuring throughput without latency behavior is [usually] meaningless Mistakes in measurement/analysis can cause orders-of- magnitude errors and lead to bad business decisions jHiccup and HdrHistogram are pretty useful The Zing JVM is cool...
  • 50. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Q & A http://www.azul.com http://www.jhiccup.com https://giltene.github.com/HdrHistogram https://github.com/LatencyUtils/LatencyUtils