SlideShare a Scribd company logo
1 of 55
© Copyright Azul Systems 2015
© Copyright Azul Systems 2015
@azulsystems
Enabling Java
in
Latency Sensitive
Environments
Matt Schuetze
Azul Director of Product Management
Matt Schuetze, Product Manager, Azul Systems
Utah JUG, Murray UT, November 20, 2014
Austin Java Users Group
Austin, Texas
5/3/20151
© Copyright Azul Systems 2015
High Level Agenda
 Intro, jitter vs. JITTER
 Java in a low latency application world
 The (historical) fundamental problems
 What people have done to try to get around
them
 What if the fundamental problems were
eliminated?
 What 2015 looks like for Low Latency Java
developers
 Real World Case Studies
Welcome to all Austin JUG members!
5/3/20152
© Copyright Azul Systems 20155/3/20153
© Copyright Azul Systems 2015
Is “jitter” a proper word for this?
99%‘ile is
~60 usec
Max is ~30,000%
higher than
“typical”
Answer: no its not jitter at all. Its phase changes.
5/3/20154
© Copyright Azul Systems 2015
About Azul Systems
Vega
C4
 We make scalable Virtual
Machines
 Have built “whatever it
takes to get job done” since
2002
 3 generations of custom
SMP Multi-core HW (Vega)
 Now Pure software for
commodity x86 (Zing)
 Certified OpenJDK (Zulu)
 Known for Low Latency,
Consistent execution, and
Large data set excellence
Zing, Zulu, and everything about Java Virtual Machines
5/3/20155
© Copyright Azul Systems 2015
Java in the low latency world
5/3/20156
© Copyright Azul Systems 2015
Java in a low latency world
 Why do people use Java for low latency apps?
 Are they crazy?
 No. There are good, easy to articulate reasons
 Projected lifetime cost
 Developer productivity
 Time-to-product, Time-to-market, ...
 Leverage, ecosystem, ability to hire
Yep, low latency Java is goin’ down for real…
5/3/20157
© Copyright Azul Systems 2015
e.g. customer answer to:
“Why do you use Java in Algo Trading?”
 Strategies have a shelf life
 We have to keep developing and deploying new
ones
 Only one out of N is actually productive
 Profitability therefore depends on ability to
successfully deploy new strategies, and on the
cost of doing so
 Our developers seem to be able to produce 2x-3x
as much when using a Java environment as they
would with C++ ...
5/3/20158
© Copyright Azul Systems 2015
So what is the problem?
Is Java Slow?
 No
 A good programmer will get roughly the same
speed from both Java and C++
 A bad programmer won’t get you fast code on
either
 The 50%‘ile and 90%‘ile are typically excellent...
 It’s those pesky occasional stutters and stammers
and stalls that are the problem...
 Ever hear of Garbage Collection?
5/3/20159
© Copyright Azul Systems 2015
Java’s Achilles heel
5/3/201510
© Copyright Azul Systems 2015
Stop-The-World Garbage Collection:
How bad is it?
 Let’s ignore the bad multi-second pauses for now...
 Low latency applications regularly experience
“small”, “minor” GC events that range in the 10s of
msec
 Frequency directly related to allocation rate
 In turn, directly related to throughput
 So we have great 50%, 90%. Maybe even 99%
 But 99.9%, 99.99%, Max, all “suck”
 So bad that it affects risk, profitability, service
expectations, etc.
5/3/201511
© Copyright Azul Systems 2015
STW-GC effects in a low latency application
99%‘ile is
~60 usec
Max is ~30,000%
higher than
“typical”
5/3/201512
© Copyright Azul Systems 2015
One way to deal with Stop-The-World GC
I cannot see it, so it cannot see me.
5/3/201513
© Copyright Azul Systems 2015
More Stop-The-World GC avoidance
Time for a bigger rug.
5/3/201514
© Copyright Azul Systems 2015
What do actual low latency developers
do about it?
 They use “Java” instead of Java
 They write “in the Java syntax”
 They avoid allocation as much as possible
 E.g. They build their own object pools for
everything
 They write all the code they use (no 3rd party libs)
 They train developers for their local discipline
 In short: They revert to many of the practices that
hurt productivity. They lose out on much of Java.
5/3/201515
© Copyright Azul Systems 2015
Another way to cope: “Creative Language”
“Guarantee a worst case of 5 msec, 99% of the time”
Translation: “1% will be far worse than worst case”
“Mostly” Concurrent, “Mostly” Incremental
Translation: “Will at times exhibit long monolithic stop-the-
world pauses”
“Fairly Consistent”
Translation: “Will sometimes show results well outside this
range”
“Typical pauses in the tens of milliseconds”
Translation: “Some pauses are much longer than tens of
milliseconds”
Drawn from evil vendor marketing literature
5/3/201516
© Copyright Azul Systems 2015
What do low latency (Java) developers
get for all their effort?
 They still see pauses (usually ranging to tens of
msec)
 But they get fewer (as in less frequent) pauses
 And they see fewer people able to do the job
 And they have to write EVERYTHING themselves
 And they get to debug malloc/free patterns again
 And they can only use memory in certain ways
 ...
 Some call it “fun”... Others “duct tape
engineering”...
5/3/201517
© Copyright Azul Systems 2015
There is a fundamental problem:
Stop-The-World GC mechanisms
are contradictory to the fundamental
requirements of
low latency & low jitter apps
5/3/201518
© Copyright Azul Systems 2015
Unsustainable
Throughout
Sustainable Throughput
The throughput achieved while safely maintaining service levels
5/3/201519
© Copyright Azul Systems 2015
It’s an industry-wide
problem
5/3/201520
© Copyright Azul Systems 2015
It was an industry-wide
problem
It’s 2015... Now we have Zing®.
5/3/201521
© Copyright Azul Systems 2015
The common GC behavior across ALL
currently shipping (non-Zing) JVMs
 ALL use a Monolithic Stop-the-world NewGen
– “small” periodic pauses (small as in 10s of msec)
– pauses more frequent with higher throughput or allocation rates
 Development focus for ALL is on OldGen collectors
– Focus is on trying to address the many-second pause problem
– Usually by sweeping it farther and farther the rug
– “Mostly X” (e.g. “mostly concurrent”) hides the fact that they refer
only to the OldGen part of the collector
– E.g. CMS, G1, Balanced.... all are OldGen-only efforts
 ALL use a Fallback to Full Stop-the-world Collection
– Used to recover when other mechanisms (inevitably) fail
– Also hidden under the term “Mostly”...
5/3/201522
© Copyright Azul Systems 2015
At Azul, STW-GC was addressed head-on
 We decided to focus on the right core problems
– Scale & productivity being limited by responsiveness
– Even “short” GC pauses are considered a problem
 Responsiveness must be unlinked from key
metrics:
– Transaction Rate, Concurrent users, Data set size, etc.
– Heap size, Live Set size, Allocation rate, Mutation rate
– Responsiveness must be continually sustainable
– Can’t ignore “rare but periodic” events
 Eliminate ALL Stop-The-World Fallbacks
– Any STW fallback is a real-world failure
Trivia: Azul as a company founded predominantly around this one premise plaguing then Java servers
5/3/201523
© Copyright Azul Systems 2015
The Zing “C4” Collector
Continuously Concurrent Compacting Collector
 Concurrent, compacting old generation
 Concurrent, compacting new generation
 No stop-the-world fallback
– Always compacts, and always does so concurrently
5/3/201524
© Copyright Azul Systems 2015
Benefits
5/3/201525
© Copyright Azul Systems 20155/3/201526
Stay Responsive
Even when traffic patterns change without warning
7x Load
Increase
30 minute span shows
elevated load long after
event, yet no pauses.
© Copyright Azul Systems 20155/3/201527
Handle Real World traffic patterns
One second view of transactions. Not constant. Not random either. Bursty is normal.
Red line shows where
order pricing arrival rate
would be if constant
© Copyright Azul Systems 20155/3/201528
Achieve Measureable Benefits
 Zing helped LMAX tame GC-related
latency outlier pauses
– Highly-engineered system: 4ms every 30 seconds
down to 1ms every 2 hours
– Less well-tuned system: 50ms every 30 seconds down
to 3ms every 15 minutes
 No more unexpected/unwanted old-gen
pauses caused by external behavior
– CMS STW intra-day, generally ~500ms, gone
– Removed source of backpressure on latency critical
path.
– Pre-Azul these would occur less predictably, but
multiple times a week.
From joint LMAX/Azul talk at QCon London, March 2015
© Copyright Azul Systems 2015
This is not “just Theory”
jHiccup
A tool that measures and reports
(as your application is running)
if your JVM is running all the time
5/3/201529
© Copyright Azul Systems 2015
Discontinuities in Java execution - Easy To Measure
A telco
App with a
bit of a
“problem”
5/3/201530
We call these
“hiccups”
© Copyright Azul Systems 2015
Oracle HotSpot™ (pure
newgen)
Zing
Low latency trading application
5/3/201531
© Copyright Azul Systems 2015
Oracle HotSpot (pure newgen) Zing
Low latency trading application
5/3/201532
© Copyright Azul Systems 2015
Low latency - Drawn to scale
Oracle HotSpot (pure newgen) Zing
5/3/201533
© Copyright Azul Systems 2015
It’s not just for
Low Latency
Just as easy to demonstrate for human-
response-time apps
5/3/201534
© Copyright Azul Systems 2015
Portal Application, slow Ehcache “churn”
Oracle HotSpot CMS, 1GB in an 8GB heap Zing, 1GB in an 8GB heap
5/3/201535
© Copyright Azul Systems 2015
Portal Application, slow Ehcache “churn”
Oracle HotSpot CMS, 1GB in an 8GB heap Zing, 1GB in an 8GB heap
5/3/201536
© Copyright Azul Systems 2015
Portal Application - Drawn to scale
Oracle HotSpot CMS, 1GB in an 8GB heap Zing, 1GB in an 8GB heap
5/3/201537
© Copyright Azul Systems 2015
A Recent E-Commerce Case
Study
5/3/201538
© Copyright Azul Systems 2015
Cyber Monday comes earlier every year…
General trends of real world e-commerce traffic
5/3/201539
© Copyright Azul Systems 2015
Human-Time Real World Latency Case
 Web retail site faces spike loads every year over
Thanksgiving through Cyber Monday.
 Site latency suffers at peak viewing and buying
times, discouraging shoppers and leaving
abandoned carts.
 Hard to predict height of surge, just know its big,
far higher than regular traffic 362 other days of the
year.
 New features like gallery search (Solr/Lucene)
added higher memory footprint, longer GC times.
 Staff spent lots of effort tuning HotSpot.
Specific e-tail customer based in Salt Lake City, Utah.
5/3/201540
© Copyright Azul Systems 2015
Real World Latency Results
 Customer studied Azul, met at Strata, NYC
 Discussion led to Zing as viable alternative
 Customer ran pilot tests with positive results.
Needed one Linux adjustment, otherwise same
server gear.
 POC on customer live system showed better than
expected latency profiles.
 No more GC tuning!
 Experienced a stable and profitable Thanksgiving
2014 weekend.
Timeframe was fall 2014.
5/3/201541
© Copyright Azul Systems 2015
Remind me how GC tuning sucks
5/3/201542
© Copyright Azul Systems 2015
Java GC tuning is “hard”…
 Examples of actual command line GC tuning terms:
Java -Xmx12g -XX:MaxPermSize=64M -XX:PermSize=32M -XX:MaxNewSize=2g
-XX:NewSize=1g -XX:SurvivorRatio=128 -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:MaxTenuringThreshold=0
-XX:CMSInitiatingOccupancyFraction=60 -XX:+CMSParallelRemarkEnabled
-XX:+UseCMSInitiatingOccupancyOnly -XX:ParallelGCThreads=12
-XX:LargePageSizeInBytes=256m …
Java –Xms8g –Xmx8g –Xmn2g -XX:PermSize=64M -XX:MaxPermSize=256M
-XX:-OmitStackTraceInFastThrow -XX:SurvivorRatio=2 -XX:-UseAdaptiveSizePolicy
-XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled
-XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled
-XX:CMSMaxAbortablePrecleanTime=10000 -XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=63 -XX:+UseParNewGC –Xnoclassgc …
5/3/201543
© Copyright Azul Systems 2015
A few GC tuning flags
Source: Word Cloud created by Frank Pavageau in his Devoxx FR 2012 presentation titled “Death by Pauses”
5/3/201544
© Copyright Azul Systems 2015
Complete guide to Zing GC tuning
java -Xmx40g
5/3/201545
© Copyright Azul Systems 2015
Any other problems beyond GC?
5/3/201546
© Copyright Azul Systems 2015
JVMs make many tradeoffs
often trading speed vs. outliers
 Some speed techniques come at extreme outlier
costs
– E.g. (“regular”) biased locking
– E.g. counted loops optimizations
 Deoptimization
 Lock deflation
 Weak References, Soft References, Finalizers
 Time To Safe Point (TTSP)
5/3/201547
© Copyright Azul Systems 2015
Time To Safepoint: Your new #1 enemy
 Many things in a JVM (still) use a global safepoint
 All threads brought to a halt, at a “safe to analyze”
point in code, and then released after work is
done.
 E.g. GC phase shifts, Deoptimization, Class
unloading, Thread Dumps, Lock Deflation, etc.
etc.
 A single thread with a long time-to-safepoint path
can cause an effective pause for all other threads.
Consider this a variation on Amdahl’s law.
 Many code paths in the JVM are long...
Once GC itself was taken care of
5/3/201548
© Copyright Azul Systems 2015
Time To Safepoint (TTSP),
the most common examples
 Array copies and object clone()
 Counted loops
 Many other variants in the runtime...
 Measure, Measure, Measure...
 Zing has a built-in TTSP profiler
 At Azul, the CTO walks around with a 0.5msec
beat down stick...
5/3/201549
© Copyright Azul Systems 2015
OS related stuff
 OS related hiccups tend to dominate once GC
and TTSP are removed as issues.
 Take scheduling pressure seriously (Duh?)
 Hyper-threading (good? bad?)
 Swapping (Duh!)
 Power management
 Transparent Huge Pages (THP).
 ...
Once GC and TTSP are taken care of
5/3/201550
© Copyright Azul Systems 2015
Takeaway: In 2015, “Real” Java is finally
viable for low latency applications
 GC is no longer a dominant issue, even for
outliers
 2-3 msec worst case with “easy” tuning
 < 1 msec worst case is very doable
 No need to code in special ways any more
– You can finally use “real” Java for everything
– You can finally 3rd party libraries without worries
– You can finally use as much memory as you want
– You can finally use regular (good) programmers
5/3/201551
© Copyright Azul Systems 2015
One-liner Takeaway:
Zing: the cure for your
Java hiccups
5/3/201552
© Copyright Azul Systems 2015
Compulsory Marketing Pitch
5/3/201553
© Copyright Azul Systems 2015
Azul Hot Topics
5/3/201554
Zing 15.05 imminent
 1TB heap
 ReadyNow!
 JMX
 Oracle Linux
Zing for Cloud
 Amazon AMIs
 Rackspace
OnMetal compat
 Docker in R&D
Zing for Big Data
 Cloudera CDH5 cert
 Cassandra paper
 Spark is in Zing open
source program
Zulu®
 Azure Gallery
 JSE Embedded
 8u45 in the chute
© Copyright Azul Systems 2015
Q&A and In Closing…
Go get some Zing today!
azul.com/trial
At very least download JHiccup.
azul.com/jhiccup
Grab a Zing Free Trial card.
Let’s talk about best BBQ in Texas
azul.com
5/3/201555
@schuetzematt

More Related Content

Viewers also liked

Genomic analysis of diffuse intrinsic pontine gliomas identifies three molecu...
Genomic analysis of diffuse intrinsic pontine gliomas identifies three molecu...Genomic analysis of diffuse intrinsic pontine gliomas identifies three molecu...
Genomic analysis of diffuse intrinsic pontine gliomas identifies three molecu...Joshua Mangerel
 
Estudios ambientales
Estudios ambientalesEstudios ambientales
Estudios ambientalesValentinaM16
 
My daily routine
My daily routineMy daily routine
My daily routinenorapuig
 
Competitive Innovation and the Emergence of Technological Epochs/Adaptive Age...
Competitive Innovation and the Emergence of Technological Epochs/Adaptive Age...Competitive Innovation and the Emergence of Technological Epochs/Adaptive Age...
Competitive Innovation and the Emergence of Technological Epochs/Adaptive Age...Jeremy Pesner
 
Unleash the Power of Apache Cassandra
Unleash the Power of Apache CassandraUnleash the Power of Apache Cassandra
Unleash the Power of Apache CassandraAzul Systems, Inc.
 
Open course 소개
Open course 소개Open course 소개
Open course 소개Kim Byoungsu
 

Viewers also liked (12)

Chuyệ
ChuyệChuyệ
Chuyệ
 
Genomic analysis of diffuse intrinsic pontine gliomas identifies three molecu...
Genomic analysis of diffuse intrinsic pontine gliomas identifies three molecu...Genomic analysis of diffuse intrinsic pontine gliomas identifies three molecu...
Genomic analysis of diffuse intrinsic pontine gliomas identifies three molecu...
 
Estudios ambientales
Estudios ambientalesEstudios ambientales
Estudios ambientales
 
My daily routine
My daily routineMy daily routine
My daily routine
 
Games and Metaphor
Games and MetaphorGames and Metaphor
Games and Metaphor
 
Competitive Innovation and the Emergence of Technological Epochs/Adaptive Age...
Competitive Innovation and the Emergence of Technological Epochs/Adaptive Age...Competitive Innovation and the Emergence of Technological Epochs/Adaptive Age...
Competitive Innovation and the Emergence of Technological Epochs/Adaptive Age...
 
Unleash the Power of Apache Cassandra
Unleash the Power of Apache CassandraUnleash the Power of Apache Cassandra
Unleash the Power of Apache Cassandra
 
Ethnography
EthnographyEthnography
Ethnography
 
KCBJ 7 29 16
KCBJ 7 29 16KCBJ 7 29 16
KCBJ 7 29 16
 
Open course 소개
Open course 소개Open course 소개
Open course 소개
 
MW_Spring15_web
MW_Spring15_webMW_Spring15_web
MW_Spring15_web
 
feasibility study
feasibility studyfeasibility study
feasibility study
 

Similar to Enabling Java in Latency-Sensitive Environments - Austin JUG April 2015

Enabling Java in Latency-Sensitive Applications
Enabling Java in Latency-Sensitive ApplicationsEnabling Java in Latency-Sensitive Applications
Enabling Java in Latency-Sensitive ApplicationsAzul Systems Inc.
 
DotCMS Bootcamp: Enabling Java in Latency Sensitivie Environments
DotCMS Bootcamp: Enabling Java in Latency Sensitivie EnvironmentsDotCMS Bootcamp: Enabling Java in Latency Sensitivie Environments
DotCMS Bootcamp: Enabling Java in Latency Sensitivie EnvironmentsAzul Systems Inc.
 
QCon London: Low latency Java in the real world - LMAX Exchange and the Zing JVM
QCon London: Low latency Java in the real world - LMAX Exchange and the Zing JVMQCon London: Low latency Java in the real world - LMAX Exchange and the Zing JVM
QCon London: Low latency Java in the real world - LMAX Exchange and the Zing JVMAzul Systems, Inc.
 
Enabling Java in Latency Sensitive Applications by Gil Tene, CTO, Azul Systems
Enabling Java in Latency Sensitive Applications by Gil Tene, CTO, Azul SystemsEnabling Java in Latency Sensitive Applications by Gil Tene, CTO, Azul Systems
Enabling Java in Latency Sensitive Applications by Gil Tene, CTO, Azul SystemszuluJDK
 
JVMCON Java in the 21st Century: are you thinking far enough ahead?
JVMCON Java in the 21st Century: are you thinking far enough ahead?JVMCON Java in the 21st Century: are you thinking far enough ahead?
JVMCON Java in the 21st Century: are you thinking far enough ahead?Steve Poole
 
Vertafore: Database Evaluation - Selecting Apache Cassandra
Vertafore: Database Evaluation - Selecting Apache CassandraVertafore: Database Evaluation - Selecting Apache Cassandra
Vertafore: Database Evaluation - Selecting Apache CassandraDataStax Academy
 
The Cloud Native Journey
The Cloud Native JourneyThe Cloud Native Journey
The Cloud Native JourneyVMware Tanzu
 
Silos Are For Farmers, Not IT
Silos Are For Farmers, Not ITSilos Are For Farmers, Not IT
Silos Are For Farmers, Not ITStonebranch, Inc.
 
Shift Happens - Rapidly Rolling Forward During Production Failure
Shift Happens - Rapidly Rolling Forward During Production FailureShift Happens - Rapidly Rolling Forward During Production Failure
Shift Happens - Rapidly Rolling Forward During Production FailureIBM UrbanCode Products
 
Understanding Hardware Transactional Memory
Understanding Hardware Transactional MemoryUnderstanding Hardware Transactional Memory
Understanding Hardware Transactional MemoryC4Media
 
Asynchronous Event Streams – when java.util.stream met org.osgi.util.promise!...
Asynchronous Event Streams – when java.util.stream met org.osgi.util.promise!...Asynchronous Event Streams – when java.util.stream met org.osgi.util.promise!...
Asynchronous Event Streams – when java.util.stream met org.osgi.util.promise!...mfrancis
 
Software Development Methodologies By E2Logy
Software Development Methodologies By E2LogySoftware Development Methodologies By E2Logy
Software Development Methodologies By E2LogyE2LOGY
 
Cloud Native Empowered Culture
Cloud Native Empowered Culture Cloud Native Empowered Culture
Cloud Native Empowered Culture VMware Tanzu
 
Real World Problem Solving Using Application Performance Management 10
Real World Problem Solving Using Application Performance Management 10Real World Problem Solving Using Application Performance Management 10
Real World Problem Solving Using Application Performance Management 10CA Technologies
 
The Why and How of Continuous Delivery
The Why and How of Continuous DeliveryThe Why and How of Continuous Delivery
The Why and How of Continuous DeliveryNigel McNie
 
5 process synchronization
5 process synchronization5 process synchronization
5 process synchronizationBaliThorat1
 
Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?
Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?
Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?Steve Poole
 
Azul yandexjune010
Azul yandexjune010Azul yandexjune010
Azul yandexjune010yaevents
 
Techtonic Summit NYC
Techtonic Summit NYCTechtonic Summit NYC
Techtonic Summit NYCBob Wise
 

Similar to Enabling Java in Latency-Sensitive Environments - Austin JUG April 2015 (20)

How NOT to Measure Latency
How NOT to Measure LatencyHow NOT to Measure Latency
How NOT to Measure Latency
 
Enabling Java in Latency-Sensitive Applications
Enabling Java in Latency-Sensitive ApplicationsEnabling Java in Latency-Sensitive Applications
Enabling Java in Latency-Sensitive Applications
 
DotCMS Bootcamp: Enabling Java in Latency Sensitivie Environments
DotCMS Bootcamp: Enabling Java in Latency Sensitivie EnvironmentsDotCMS Bootcamp: Enabling Java in Latency Sensitivie Environments
DotCMS Bootcamp: Enabling Java in Latency Sensitivie Environments
 
QCon London: Low latency Java in the real world - LMAX Exchange and the Zing JVM
QCon London: Low latency Java in the real world - LMAX Exchange and the Zing JVMQCon London: Low latency Java in the real world - LMAX Exchange and the Zing JVM
QCon London: Low latency Java in the real world - LMAX Exchange and the Zing JVM
 
Enabling Java in Latency Sensitive Applications by Gil Tene, CTO, Azul Systems
Enabling Java in Latency Sensitive Applications by Gil Tene, CTO, Azul SystemsEnabling Java in Latency Sensitive Applications by Gil Tene, CTO, Azul Systems
Enabling Java in Latency Sensitive Applications by Gil Tene, CTO, Azul Systems
 
JVMCON Java in the 21st Century: are you thinking far enough ahead?
JVMCON Java in the 21st Century: are you thinking far enough ahead?JVMCON Java in the 21st Century: are you thinking far enough ahead?
JVMCON Java in the 21st Century: are you thinking far enough ahead?
 
Vertafore: Database Evaluation - Selecting Apache Cassandra
Vertafore: Database Evaluation - Selecting Apache CassandraVertafore: Database Evaluation - Selecting Apache Cassandra
Vertafore: Database Evaluation - Selecting Apache Cassandra
 
The Cloud Native Journey
The Cloud Native JourneyThe Cloud Native Journey
The Cloud Native Journey
 
Silos Are For Farmers, Not IT
Silos Are For Farmers, Not ITSilos Are For Farmers, Not IT
Silos Are For Farmers, Not IT
 
Shift Happens - Rapidly Rolling Forward During Production Failure
Shift Happens - Rapidly Rolling Forward During Production FailureShift Happens - Rapidly Rolling Forward During Production Failure
Shift Happens - Rapidly Rolling Forward During Production Failure
 
Understanding Hardware Transactional Memory
Understanding Hardware Transactional MemoryUnderstanding Hardware Transactional Memory
Understanding Hardware Transactional Memory
 
Asynchronous Event Streams – when java.util.stream met org.osgi.util.promise!...
Asynchronous Event Streams – when java.util.stream met org.osgi.util.promise!...Asynchronous Event Streams – when java.util.stream met org.osgi.util.promise!...
Asynchronous Event Streams – when java.util.stream met org.osgi.util.promise!...
 
Software Development Methodologies By E2Logy
Software Development Methodologies By E2LogySoftware Development Methodologies By E2Logy
Software Development Methodologies By E2Logy
 
Cloud Native Empowered Culture
Cloud Native Empowered Culture Cloud Native Empowered Culture
Cloud Native Empowered Culture
 
Real World Problem Solving Using Application Performance Management 10
Real World Problem Solving Using Application Performance Management 10Real World Problem Solving Using Application Performance Management 10
Real World Problem Solving Using Application Performance Management 10
 
The Why and How of Continuous Delivery
The Why and How of Continuous DeliveryThe Why and How of Continuous Delivery
The Why and How of Continuous Delivery
 
5 process synchronization
5 process synchronization5 process synchronization
5 process synchronization
 
Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?
Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?
Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?
 
Azul yandexjune010
Azul yandexjune010Azul yandexjune010
Azul yandexjune010
 
Techtonic Summit NYC
Techtonic Summit NYCTechtonic Summit NYC
Techtonic Summit NYC
 

More from Azul Systems, Inc.

JVM Language Summit: Object layout presentation
JVM Language Summit: Object layout presentationJVM Language Summit: Object layout presentation
JVM Language Summit: Object layout presentationAzul Systems, Inc.
 
JVM Language Summit: Object layout workshop
JVM Language Summit: Object layout workshopJVM Language Summit: Object layout workshop
JVM Language Summit: Object layout workshopAzul Systems, Inc.
 
What's New in the JVM in Java 8?
What's New in the JVM in Java 8?What's New in the JVM in Java 8?
What's New in the JVM in Java 8?Azul Systems, Inc.
 
DC JUG: Understanding Java Garbage Collection
DC JUG: Understanding Java Garbage CollectionDC JUG: Understanding Java Garbage Collection
DC JUG: Understanding Java Garbage CollectionAzul Systems, Inc.
 
Silicon Valley JUG: JVM Mechanics
Silicon Valley JUG: JVM MechanicsSilicon Valley JUG: JVM Mechanics
Silicon Valley JUG: JVM MechanicsAzul Systems, Inc.
 
Winning With Java at Market Open
Winning With Java at Market OpenWinning With Java at Market Open
Winning With Java at Market OpenAzul Systems, Inc.
 

More from Azul Systems, Inc. (6)

JVM Language Summit: Object layout presentation
JVM Language Summit: Object layout presentationJVM Language Summit: Object layout presentation
JVM Language Summit: Object layout presentation
 
JVM Language Summit: Object layout workshop
JVM Language Summit: Object layout workshopJVM Language Summit: Object layout workshop
JVM Language Summit: Object layout workshop
 
What's New in the JVM in Java 8?
What's New in the JVM in Java 8?What's New in the JVM in Java 8?
What's New in the JVM in Java 8?
 
DC JUG: Understanding Java Garbage Collection
DC JUG: Understanding Java Garbage CollectionDC JUG: Understanding Java Garbage Collection
DC JUG: Understanding Java Garbage Collection
 
Silicon Valley JUG: JVM Mechanics
Silicon Valley JUG: JVM MechanicsSilicon Valley JUG: JVM Mechanics
Silicon Valley JUG: JVM Mechanics
 
Winning With Java at Market Open
Winning With Java at Market OpenWinning With Java at Market Open
Winning With Java at Market Open
 

Recently uploaded

SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 

Recently uploaded (20)

SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 

Enabling Java in Latency-Sensitive Environments - Austin JUG April 2015

  • 1. © Copyright Azul Systems 2015 © Copyright Azul Systems 2015 @azulsystems Enabling Java in Latency Sensitive Environments Matt Schuetze Azul Director of Product Management Matt Schuetze, Product Manager, Azul Systems Utah JUG, Murray UT, November 20, 2014 Austin Java Users Group Austin, Texas 5/3/20151
  • 2. © Copyright Azul Systems 2015 High Level Agenda  Intro, jitter vs. JITTER  Java in a low latency application world  The (historical) fundamental problems  What people have done to try to get around them  What if the fundamental problems were eliminated?  What 2015 looks like for Low Latency Java developers  Real World Case Studies Welcome to all Austin JUG members! 5/3/20152
  • 3. © Copyright Azul Systems 20155/3/20153
  • 4. © Copyright Azul Systems 2015 Is “jitter” a proper word for this? 99%‘ile is ~60 usec Max is ~30,000% higher than “typical” Answer: no its not jitter at all. Its phase changes. 5/3/20154
  • 5. © Copyright Azul Systems 2015 About Azul Systems Vega C4  We make scalable Virtual Machines  Have built “whatever it takes to get job done” since 2002  3 generations of custom SMP Multi-core HW (Vega)  Now Pure software for commodity x86 (Zing)  Certified OpenJDK (Zulu)  Known for Low Latency, Consistent execution, and Large data set excellence Zing, Zulu, and everything about Java Virtual Machines 5/3/20155
  • 6. © Copyright Azul Systems 2015 Java in the low latency world 5/3/20156
  • 7. © Copyright Azul Systems 2015 Java in a low latency world  Why do people use Java for low latency apps?  Are they crazy?  No. There are good, easy to articulate reasons  Projected lifetime cost  Developer productivity  Time-to-product, Time-to-market, ...  Leverage, ecosystem, ability to hire Yep, low latency Java is goin’ down for real… 5/3/20157
  • 8. © Copyright Azul Systems 2015 e.g. customer answer to: “Why do you use Java in Algo Trading?”  Strategies have a shelf life  We have to keep developing and deploying new ones  Only one out of N is actually productive  Profitability therefore depends on ability to successfully deploy new strategies, and on the cost of doing so  Our developers seem to be able to produce 2x-3x as much when using a Java environment as they would with C++ ... 5/3/20158
  • 9. © Copyright Azul Systems 2015 So what is the problem? Is Java Slow?  No  A good programmer will get roughly the same speed from both Java and C++  A bad programmer won’t get you fast code on either  The 50%‘ile and 90%‘ile are typically excellent...  It’s those pesky occasional stutters and stammers and stalls that are the problem...  Ever hear of Garbage Collection? 5/3/20159
  • 10. © Copyright Azul Systems 2015 Java’s Achilles heel 5/3/201510
  • 11. © Copyright Azul Systems 2015 Stop-The-World Garbage Collection: How bad is it?  Let’s ignore the bad multi-second pauses for now...  Low latency applications regularly experience “small”, “minor” GC events that range in the 10s of msec  Frequency directly related to allocation rate  In turn, directly related to throughput  So we have great 50%, 90%. Maybe even 99%  But 99.9%, 99.99%, Max, all “suck”  So bad that it affects risk, profitability, service expectations, etc. 5/3/201511
  • 12. © Copyright Azul Systems 2015 STW-GC effects in a low latency application 99%‘ile is ~60 usec Max is ~30,000% higher than “typical” 5/3/201512
  • 13. © Copyright Azul Systems 2015 One way to deal with Stop-The-World GC I cannot see it, so it cannot see me. 5/3/201513
  • 14. © Copyright Azul Systems 2015 More Stop-The-World GC avoidance Time for a bigger rug. 5/3/201514
  • 15. © Copyright Azul Systems 2015 What do actual low latency developers do about it?  They use “Java” instead of Java  They write “in the Java syntax”  They avoid allocation as much as possible  E.g. They build their own object pools for everything  They write all the code they use (no 3rd party libs)  They train developers for their local discipline  In short: They revert to many of the practices that hurt productivity. They lose out on much of Java. 5/3/201515
  • 16. © Copyright Azul Systems 2015 Another way to cope: “Creative Language” “Guarantee a worst case of 5 msec, 99% of the time” Translation: “1% will be far worse than worst case” “Mostly” Concurrent, “Mostly” Incremental Translation: “Will at times exhibit long monolithic stop-the- world pauses” “Fairly Consistent” Translation: “Will sometimes show results well outside this range” “Typical pauses in the tens of milliseconds” Translation: “Some pauses are much longer than tens of milliseconds” Drawn from evil vendor marketing literature 5/3/201516
  • 17. © Copyright Azul Systems 2015 What do low latency (Java) developers get for all their effort?  They still see pauses (usually ranging to tens of msec)  But they get fewer (as in less frequent) pauses  And they see fewer people able to do the job  And they have to write EVERYTHING themselves  And they get to debug malloc/free patterns again  And they can only use memory in certain ways  ...  Some call it “fun”... Others “duct tape engineering”... 5/3/201517
  • 18. © Copyright Azul Systems 2015 There is a fundamental problem: Stop-The-World GC mechanisms are contradictory to the fundamental requirements of low latency & low jitter apps 5/3/201518
  • 19. © Copyright Azul Systems 2015 Unsustainable Throughout Sustainable Throughput The throughput achieved while safely maintaining service levels 5/3/201519
  • 20. © Copyright Azul Systems 2015 It’s an industry-wide problem 5/3/201520
  • 21. © Copyright Azul Systems 2015 It was an industry-wide problem It’s 2015... Now we have Zing®. 5/3/201521
  • 22. © Copyright Azul Systems 2015 The common GC behavior across ALL currently shipping (non-Zing) JVMs  ALL use a Monolithic Stop-the-world NewGen – “small” periodic pauses (small as in 10s of msec) – pauses more frequent with higher throughput or allocation rates  Development focus for ALL is on OldGen collectors – Focus is on trying to address the many-second pause problem – Usually by sweeping it farther and farther the rug – “Mostly X” (e.g. “mostly concurrent”) hides the fact that they refer only to the OldGen part of the collector – E.g. CMS, G1, Balanced.... all are OldGen-only efforts  ALL use a Fallback to Full Stop-the-world Collection – Used to recover when other mechanisms (inevitably) fail – Also hidden under the term “Mostly”... 5/3/201522
  • 23. © Copyright Azul Systems 2015 At Azul, STW-GC was addressed head-on  We decided to focus on the right core problems – Scale & productivity being limited by responsiveness – Even “short” GC pauses are considered a problem  Responsiveness must be unlinked from key metrics: – Transaction Rate, Concurrent users, Data set size, etc. – Heap size, Live Set size, Allocation rate, Mutation rate – Responsiveness must be continually sustainable – Can’t ignore “rare but periodic” events  Eliminate ALL Stop-The-World Fallbacks – Any STW fallback is a real-world failure Trivia: Azul as a company founded predominantly around this one premise plaguing then Java servers 5/3/201523
  • 24. © Copyright Azul Systems 2015 The Zing “C4” Collector Continuously Concurrent Compacting Collector  Concurrent, compacting old generation  Concurrent, compacting new generation  No stop-the-world fallback – Always compacts, and always does so concurrently 5/3/201524
  • 25. © Copyright Azul Systems 2015 Benefits 5/3/201525
  • 26. © Copyright Azul Systems 20155/3/201526 Stay Responsive Even when traffic patterns change without warning 7x Load Increase 30 minute span shows elevated load long after event, yet no pauses.
  • 27. © Copyright Azul Systems 20155/3/201527 Handle Real World traffic patterns One second view of transactions. Not constant. Not random either. Bursty is normal. Red line shows where order pricing arrival rate would be if constant
  • 28. © Copyright Azul Systems 20155/3/201528 Achieve Measureable Benefits  Zing helped LMAX tame GC-related latency outlier pauses – Highly-engineered system: 4ms every 30 seconds down to 1ms every 2 hours – Less well-tuned system: 50ms every 30 seconds down to 3ms every 15 minutes  No more unexpected/unwanted old-gen pauses caused by external behavior – CMS STW intra-day, generally ~500ms, gone – Removed source of backpressure on latency critical path. – Pre-Azul these would occur less predictably, but multiple times a week. From joint LMAX/Azul talk at QCon London, March 2015
  • 29. © Copyright Azul Systems 2015 This is not “just Theory” jHiccup A tool that measures and reports (as your application is running) if your JVM is running all the time 5/3/201529
  • 30. © Copyright Azul Systems 2015 Discontinuities in Java execution - Easy To Measure A telco App with a bit of a “problem” 5/3/201530 We call these “hiccups”
  • 31. © Copyright Azul Systems 2015 Oracle HotSpot™ (pure newgen) Zing Low latency trading application 5/3/201531
  • 32. © Copyright Azul Systems 2015 Oracle HotSpot (pure newgen) Zing Low latency trading application 5/3/201532
  • 33. © Copyright Azul Systems 2015 Low latency - Drawn to scale Oracle HotSpot (pure newgen) Zing 5/3/201533
  • 34. © Copyright Azul Systems 2015 It’s not just for Low Latency Just as easy to demonstrate for human- response-time apps 5/3/201534
  • 35. © Copyright Azul Systems 2015 Portal Application, slow Ehcache “churn” Oracle HotSpot CMS, 1GB in an 8GB heap Zing, 1GB in an 8GB heap 5/3/201535
  • 36. © Copyright Azul Systems 2015 Portal Application, slow Ehcache “churn” Oracle HotSpot CMS, 1GB in an 8GB heap Zing, 1GB in an 8GB heap 5/3/201536
  • 37. © Copyright Azul Systems 2015 Portal Application - Drawn to scale Oracle HotSpot CMS, 1GB in an 8GB heap Zing, 1GB in an 8GB heap 5/3/201537
  • 38. © Copyright Azul Systems 2015 A Recent E-Commerce Case Study 5/3/201538
  • 39. © Copyright Azul Systems 2015 Cyber Monday comes earlier every year… General trends of real world e-commerce traffic 5/3/201539
  • 40. © Copyright Azul Systems 2015 Human-Time Real World Latency Case  Web retail site faces spike loads every year over Thanksgiving through Cyber Monday.  Site latency suffers at peak viewing and buying times, discouraging shoppers and leaving abandoned carts.  Hard to predict height of surge, just know its big, far higher than regular traffic 362 other days of the year.  New features like gallery search (Solr/Lucene) added higher memory footprint, longer GC times.  Staff spent lots of effort tuning HotSpot. Specific e-tail customer based in Salt Lake City, Utah. 5/3/201540
  • 41. © Copyright Azul Systems 2015 Real World Latency Results  Customer studied Azul, met at Strata, NYC  Discussion led to Zing as viable alternative  Customer ran pilot tests with positive results. Needed one Linux adjustment, otherwise same server gear.  POC on customer live system showed better than expected latency profiles.  No more GC tuning!  Experienced a stable and profitable Thanksgiving 2014 weekend. Timeframe was fall 2014. 5/3/201541
  • 42. © Copyright Azul Systems 2015 Remind me how GC tuning sucks 5/3/201542
  • 43. © Copyright Azul Systems 2015 Java GC tuning is “hard”…  Examples of actual command line GC tuning terms: Java -Xmx12g -XX:MaxPermSize=64M -XX:PermSize=32M -XX:MaxNewSize=2g -XX:NewSize=1g -XX:SurvivorRatio=128 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:MaxTenuringThreshold=0 -XX:CMSInitiatingOccupancyFraction=60 -XX:+CMSParallelRemarkEnabled -XX:+UseCMSInitiatingOccupancyOnly -XX:ParallelGCThreads=12 -XX:LargePageSizeInBytes=256m … Java –Xms8g –Xmx8g –Xmn2g -XX:PermSize=64M -XX:MaxPermSize=256M -XX:-OmitStackTraceInFastThrow -XX:SurvivorRatio=2 -XX:-UseAdaptiveSizePolicy -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled -XX:CMSMaxAbortablePrecleanTime=10000 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=63 -XX:+UseParNewGC –Xnoclassgc … 5/3/201543
  • 44. © Copyright Azul Systems 2015 A few GC tuning flags Source: Word Cloud created by Frank Pavageau in his Devoxx FR 2012 presentation titled “Death by Pauses” 5/3/201544
  • 45. © Copyright Azul Systems 2015 Complete guide to Zing GC tuning java -Xmx40g 5/3/201545
  • 46. © Copyright Azul Systems 2015 Any other problems beyond GC? 5/3/201546
  • 47. © Copyright Azul Systems 2015 JVMs make many tradeoffs often trading speed vs. outliers  Some speed techniques come at extreme outlier costs – E.g. (“regular”) biased locking – E.g. counted loops optimizations  Deoptimization  Lock deflation  Weak References, Soft References, Finalizers  Time To Safe Point (TTSP) 5/3/201547
  • 48. © Copyright Azul Systems 2015 Time To Safepoint: Your new #1 enemy  Many things in a JVM (still) use a global safepoint  All threads brought to a halt, at a “safe to analyze” point in code, and then released after work is done.  E.g. GC phase shifts, Deoptimization, Class unloading, Thread Dumps, Lock Deflation, etc. etc.  A single thread with a long time-to-safepoint path can cause an effective pause for all other threads. Consider this a variation on Amdahl’s law.  Many code paths in the JVM are long... Once GC itself was taken care of 5/3/201548
  • 49. © Copyright Azul Systems 2015 Time To Safepoint (TTSP), the most common examples  Array copies and object clone()  Counted loops  Many other variants in the runtime...  Measure, Measure, Measure...  Zing has a built-in TTSP profiler  At Azul, the CTO walks around with a 0.5msec beat down stick... 5/3/201549
  • 50. © Copyright Azul Systems 2015 OS related stuff  OS related hiccups tend to dominate once GC and TTSP are removed as issues.  Take scheduling pressure seriously (Duh?)  Hyper-threading (good? bad?)  Swapping (Duh!)  Power management  Transparent Huge Pages (THP).  ... Once GC and TTSP are taken care of 5/3/201550
  • 51. © Copyright Azul Systems 2015 Takeaway: In 2015, “Real” Java is finally viable for low latency applications  GC is no longer a dominant issue, even for outliers  2-3 msec worst case with “easy” tuning  < 1 msec worst case is very doable  No need to code in special ways any more – You can finally use “real” Java for everything – You can finally 3rd party libraries without worries – You can finally use as much memory as you want – You can finally use regular (good) programmers 5/3/201551
  • 52. © Copyright Azul Systems 2015 One-liner Takeaway: Zing: the cure for your Java hiccups 5/3/201552
  • 53. © Copyright Azul Systems 2015 Compulsory Marketing Pitch 5/3/201553
  • 54. © Copyright Azul Systems 2015 Azul Hot Topics 5/3/201554 Zing 15.05 imminent  1TB heap  ReadyNow!  JMX  Oracle Linux Zing for Cloud  Amazon AMIs  Rackspace OnMetal compat  Docker in R&D Zing for Big Data  Cloudera CDH5 cert  Cassandra paper  Spark is in Zing open source program Zulu®  Azure Gallery  JSE Embedded  8u45 in the chute
  • 55. © Copyright Azul Systems 2015 Q&A and In Closing… Go get some Zing today! azul.com/trial At very least download JHiccup. azul.com/jhiccup Grab a Zing Free Trial card. Let’s talk about best BBQ in Texas azul.com 5/3/201555 @schuetzematt

Editor's Notes

  1. Main chart is half an hour on one day. Inset is exchange rate between swiss franc and euro. Point of slide is no real warning. Events happened, so traffic moves rapidly. Ie. nothing was happening then a 7x times increase, then higher floor throughout the day. No warning. Even though, latencies stayed stable.
  2. A typical day’s one second worth of data. X axis is which millisecond within every second. Arrival rate is not smooth. Every second. Every 250ms Every 100 seconds Bursty traffic the norm. Red line shows average if arrival rate were constant