Your SlideShare is downloading. ×
0
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Enabling Java in Latency Sensitive Environments

338

Published on

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1fgUGfx. …

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1fgUGfx.

Gil Tene examines the core issues that have historically kept Java environments from performing well in low latency environments and how it can perform now without trade-offs and compromises. Filmed at qconsf.com.

Gil Tene is CTO and co-founder of Azul Systems. He has been involved with virtual machine technologies for the past 20 years. Gil pioneered Azul's Continuously Concurrent Compacting Collector (C4), and various managed runtime and systems stack technologies that combine to deliver the industry's most scalable and robust Java platforms.

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
338
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Enabling Java in Latency Sensitive Applications Gil Tene, CTO & co-Founder, Azul Systems ©2013 Azul Systems, Inc.
  • 2. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /java-tuning-latency InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month
  • 3. Presented at QCon San Francisco www.qconsf.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 4. High level agenda Low Latency? In Java? Seriously? The problem with . . . What happens when you solve that problem? The next problems A bit of shameless bragging ©2013 Azul Systems, Inc.
  • 5. About me: Gil Tene co-founder, CTO @Azul Systems Have been working on “think different” GC approaches since 2002 Created Pauseless & C4 core GC algorithms (Tene, Wolf) A Long history building Virtual & Physical Machines, Operating Systems, Enterprise apps, etc... JCP EC Member... ©2013 Azul Systems, Inc. * working on real-world trash compaction issues, circa 2004
  • 6. About Azul We make scalable Virtual Machines Vega Have built “whatever it takes to get job done” since 2002 3 generations of custom SMP Multi-core HW (Vega) Zing: Pure software for commodity x86 Known for Low Latency, Consistent execution, and Large data set excellence ©2013 Azul Systems, Inc. C4
  • 7. Java in the low latency world? ©2013 Azul Systems, Inc.
  • 8. Java in the low latency world? Why do people use Java for low latency apps? Are they crazy? No. There are good, easy to articulate reasons Projected lifetime cost Developer productivity Time-to-product, Time-to-market, ... Leverage, ecosystem, ability to hire ©2013 Azul Systems, Inc.
  • 9. E.g. Customer answer to: “Why do you use Java in Algo Trading?” Strategies have a shelf life We have to keep developing and deploying new ones Only one out of N is actually productive Profitability therefore depends on ability to successfully deploy new strategies, and on the cost of doing so Our developers seem to be able to produce 2x-3x as much when using a Java environment as they would with C/C++ ... ©2013 Azul Systems, Inc.
  • 10. So what is the problem? Is Java Slow? No A good programmer will get roughly the same speed from both Java and C++ A bad programmer won’t get you fast code on either The 50%‘ile and 90%‘ile are typically excellent... It’s those pesky occasional stutters and stammers and stalls that are the problem... Ever hear of Garbage Collection? ©2013 Azul Systems, Inc.
  • 11. Is “jitter” even the right word for this? Hiccups&by&Time&Interval& Max"per"Interval" 99%" 99.90%" 99.99%" Max" Hiccup&Dura*on&(msec)& 25" 20" 15" 10" 5" 0" 0" 100" 200" 300" 400" 500" &Elapsed&Time&(sec)& 99%‘ile is ©2012 Azul Systems, Inc. 25" Max is ~30,000% Hiccups&by&Percen*le&Distribu*on& ~60 usec higher than “typical” 600"
  • 12. Stop-The-World Garbage Collection: Java’s Achilles heel Let’s ignore the bad multi-second pauses for now... Low latency applications regularly experience “small”, “minor” GC events that range in the 10s of msec Frequency directly related to allocation rate So we have great 50%, 90%. Maybe even 99% But 99.9%, 99.99%, Max, all “suck” So bad that it affects risk, profitability, service expectations, etc. ©2013 Azul Systems, Inc.
  • 13. One way to deal with Stop-The-World GC ©2011 Azul Systems, Inc.
  • 14. A common way to “deal” with STW-GC Averages and Standard Deviation ©2011 Azul Systems, Inc.
  • 15. Reality: Latency is usually strongly “multi-modal” Usually does’t look anything like a normal distribution In software systems, usually shows periodic freezes Complete shifts from one mode/behavior to another Mode A: “good”. Mode B: “Somewhat bad” Mode C: “terrible”, ... .... ©2012 ©2011 Azul Systems, Inc.
  • 16. Another way to cope: “Creative Language” “Guarantee a worst case of 5 msec, 99% of the time” “Mostly” Concurrent, “Mostly” Incremental Translation: “Will at times exhibit long monolithic stopthe-world pauses” “Fairly Consistent” Translation: “Will sometimes show results well outside this range” “Typical pauses in the tens of milliseconds” Translation: “Some pauses are much longer than tens of milliseconds” ©2012 Azul Systems, Inc.
  • 17. Another way to deal with STW-GC ©2012 Azul Systems, Inc.
  • 18. What do actual low latency developers do about it? They use “Java” instead of Java They write “in the Java syntax” They avoid allocation as much as possible E.g. They build their own object pools for everything They write all the code they use (no 3rd party libs) They train developers for their local discipline In short: They revert to many of the practices that hurt productivity. They loose out on much of Java. ©2013 Azul Systems, Inc.
  • 19. was** It is an industry-wide problem Stop-The-World GC mechanisms contradict the fundamental requirements of low latency & low jitter apps (** It’s 2013... We now have Zing.) ©2013 Azul Systems, Inc.
  • 20. The common GC behavior across ALL currently shipping (non-Zing) JVMs ALL use a Monolithic Stop-the-world NewGen “small” periodic pauses (small as in 10s of msec) pauses more frequent with higher throughput or allocation rates Development focus for ALL is on Oldgen collectors Focus is on trying to address the many-second pause problem Usually by sweeping it farther and farther the rug “Mostly X” (e.g. “mostly concurrent”) hides the fact that they refer only to the OldGen part of the collector E.g. CMS, G1, Balanced.... all are OldGen-only efforts ALL use a Fallback to Full Stop-the-world Collection Used to recover when other mechanisms (inevitably) fail Also hidden under the term “Mostly”... ©2013 Azul Systems, Inc. ©2013 Azul Systems, Inc.
  • 21. A Recipe: address STW-GC head-on E.g. at Azul, we decided to focus on the core problems Scale & productivity limited by responsiveness/latency And it’s not the “typical” latency, it’s the outliers... Even “short” GC pauses must be considered a problem Responsiveness must be unlinked from key metrics: Transaction Rate, Concurrent users, Data set size, etc. Heap size, Live Set size, Allocation rate, Mutation rate Responsiveness must be continually sustainable Can’t ignore “rare but periodic” events Eliminate ALL Stop-The-World Fallbacks Any STW fallback is a real-world failure ©2013 Azul Systems, Inc. ©2013 Azul Systems, Inc.
  • 22. Existence proof: The Zing “C4” Collector 
 Continuously Concurrent Compacting Collector Concurrent, compacting old generation ! Concurrent, compacting new generation ! No stop-the-world fallback Always compacts, and always does so concurrently ©2013 Azul Systems, Inc.
  • 23. Benefits ©2013 Azul Systems, Inc. ©2013 Azul Systems, Inc.
  • 24. An example of “First day’s run” behavior E-Commerce application 5 msec ©2013 Azul Systems, Inc.
  • 25. An example of behavior after 4 days of system tuning Low latency application 1 msec ©2013 Azul Systems, Inc.
  • 26. Measuring Theory in Practice jHiccup: A tool that measures and reports (as your application is running) if your JVM is actually running all the time ©2013 Azul Systems, Inc. ©2013 Azul Systems, Inc.
  • 27. Discontinuities in Java platform execution - Easy To Measure Incontinuities in Java platform execution Hiccups"by"Time"Interval" Max"per"Interval" 99%" 99.90%" 99.99%" Max" Hiccup&Dura*on&(msec)& 1800" I call these “hiccups” 1600" 1400" 1200" 1000" 800" 600" 400" 200" 0" 0" 200" 400" 600" 800" 1000" 1200" 1400" 1600" 1800" &Elapsed&Time&(sec)& Hiccups"by"Percen@le"Distribu@on" 1800" Hiccup&Dura*on&(msec)& 1600" Max=1665.024& 1400" 1200" 1000" 800" 600" 400" 200" 0" ©2012 Azul Systems, Inc. 0%" 90%" 99%" & 99.9%" & Percen*le& 99.99%" 99.999%" A telco App with a bit of a “problem”
  • 28. Fun with jHiccup ©2012 Azul Systems, Inc.
  • 29. Oracle HotSpot (pure newgen) Zing Hiccups&by&Time&Interval& Hiccups&by&Time&Interval& Max"per"Interval" 99%" 99.90%" 99.99%" Max"per"Interval" Max" 20" 15" 10" 5" 0" 99.99%" Max" 1.6" 1.4" 1.2" 1" 0.8" 0.6" 0.4" 0.2" 0" 0" 100" 200" 300" 400" 500" 600" 0" 100" 200" &Elapsed&Time&(sec)& 300" 400" 500" 600" &Elapsed&Time&(sec)& Hiccups&by&Percen*le&Distribu*on& Hiccups&by&Percen*le&Distribu*on& 1.8" 25" Max=22.656& 20" Hiccup&Dura*on&(msec)& Hiccup&Dura*on&(msec)& 99.90%" 1.8" Hiccup&Dura*on&(msec)& Hiccup&Dura*on&(msec)& 25" 99%" 15" 10" 5" 1.6" Max=1.568& 1.4" 1.2" 1" 0.8" 0.6" 0.4" 0.2" 0" 0%" 90%" & 99%" 99.9%" & 99.99%" 99.999%" Percen*le& 0" 0%" 90%" & 99%" Low latency trading application ©2012 Azul Systems, Inc. 99.9%" & 99.99%" Percen*le& 99.999%"
  • 30. Oracle HotSpot (pure newgen) Zing Hiccups&by&Time&Interval& Hiccups&by&Time&Interval& Max"per"Interval" 99%" 99.90%" 99.99%" Max"per"Interval" Max" 20" 15" 10" 5" 0" 99.99%" Max" 1.6" 1.4" 1.2" 1" 0.8" 0.6" 0.4" 0.2" 0" 0" 100" 200" 300" 400" 500" 600" 0" 100" 200" &Elapsed&Time&(sec)& 300" 400" 500" 600" &Elapsed&Time&(sec)& Hiccups&by&Percen*le&Distribu*on& Hiccups&by&Percen*le&Distribu*on& 1.8" 25" Max=22.656& 20" Hiccup&Dura*on&(msec)& Hiccup&Dura*on&(msec)& 99.90%" 1.8" Hiccup&Dura*on&(msec)& Hiccup&Dura*on&(msec)& 25" 99%" 15" 10" 5" 1.6" Max=1.568& 1.4" 1.2" 1" 0.8" 0.6" 0.4" 0.2" 0" 0%" 90%" & 99%" 99.9%" & 99.99%" 99.999%" Percen*le& 0" 0%" 90%" & 99%" Low latency trading application ©2012 Azul Systems, Inc. 99.9%" & 99.99%" Percen*le& 99.999%"
  • 31. Oracle HotSpot (pure newgen) Zing Hiccups&by&Time&Interval& Max"per"Interval" 99%" 99.90%" Hiccups&by&Time&Interval& 99.99%" Max" Max"per"Interval" 20" 15" 10" 5" 0" 0" Max" 100" 200" 300" 400" 500" 15" 10" 5" 0" 600" 0" 100" 200" 300" 400" 500" 600" &Elapsed&Time&(sec)& Hiccups&by&Percen*le&Distribu*on& Hiccups&by&Percen*le&Distribu*on& 25" 25" Max=22.656& 20" Hiccup&Dura*on&(msec)& Hiccup&Dura*on&(msec)& 99.99%" 20" &Elapsed&Time&(sec)& 15" 10" 5" 0" 99.90%" 25" Hiccup&Dura*on&(msec)& Hiccup&Dura*on&(msec)& 25" 99%" 0%" 90%" & 99%" 99.9%" & 99.99%" 99.999%" Percen*le& 20" 15" 10" 5" 0" Max=1.568& 0%" 90%" & 99%" Low latency - Drawn to scale ©2012 Azul Systems, Inc. 99.9%" & 99.99%" Percen*le& 99.999%"
  • 32. Takeaway: In 2013, “Real” Java is finally viable for low latency applications GC is no longer a dominant issue, even for outliers 2-3msec worst observed case with “easy” tuning < 1 msec worst observed case is very doable No need to code in special ways any more You can finally use “real” Java for everything You can finally 3rd party libraries without worries You can finally use as much memory as you want You can finally use regular (good) programmers ©2012 Azul Systems, Inc.
  • 33. Lets not forget about GC tuning ©2013 Azul Systems, Inc. ©2013 Azul Systems, Inc.
  • 34. Java GC tuning is “hard”… Examples of actual command line GC tuning parameters: Java -Xmx12g -XX:MaxPermSize=64M -XX:PermSize=32M -XX:MaxNewSize=2g -XX:NewSize=1g -XX:SurvivorRatio=128 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:MaxTenuringThreshold=0 -XX:CMSInitiatingOccupancyFraction=60 -XX:+CMSParallelRemarkEnabled -XX:+UseCMSInitiatingOccupancyOnly -XX:ParallelGCThreads=12 -XX:LargePageSizeInBytes=256m … Java –Xms8g –Xmx8g –Xmn2g -XX:PermSize=64M -XX:MaxPermSize=256M -XX:-OmitStackTraceInFastThrow -XX:SurvivorRatio=2 -XX:-UseAdaptiveSizePolicy -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled -XX:CMSMaxAbortablePrecleanTime=10000 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=63 -XX:+UseParNewGC –Xnoclassgc … ©2013 Azul Systems, Inc. ©2013 Azul Systems, Inc.
  • 35. A  few  GC  tuning  flags Source:  Word  Cloud  created  by  Frank  Pavageau  in  his  Devoxx  FR  2012  presentaFon  Ftled  “Death  by  Pauses”
  • 36. The complete guide to modern GC tuning** java -Xmx40g (** It’s 2013... We now have Zing.) ©2013 Azul Systems, Inc. ©2013 Azul Systems, Inc.
  • 37. So what’s next? GC is only the biggest problem... ©2013 Azul Systems, Inc. ©2013 Azul Systems, Inc.
  • 38. JVMs make many tradeoffs often trading speed vs. outliers Some speed techniques come at extreme outlier costs E.g. (“regular”) biased locking E.g. counted loops optimizations Deoptimization Lock deflation Weak References, Soft References, Finalizers Time To Safe Point (TTSP) ©2012 Azul Systems, Inc.
  • 39. Time To Safepoint (TTSP) Your new #1 enemy (Once GC itself was taken care of) Many things in a JVM (still) use a global safepoint All threads brought to a halt, at a “safe to analyze” point in code, and then released after work is done. E.g. GC phase shifts, Deoptimization, Class Unloading, Thread Dumps, Lock Deflation, Deadlock Detection, etc. A single thread with a long time-to-safepoint path can cause an effective pause for all other threads Many code paths in the JVM are long... ©2012 Azul Systems, Inc.
  • 40. Time To Safepoint (TTSP) the most common examples Array copies and object clone() Counted loops Many other other variants in the runtime... Measure, Measure, Measure... Zing has a built-in TTSP profiler At Azul, I walk around with a 0.5msec stick... ©2012 Azul Systems, Inc.
  • 41. OS related stuff (once GC and TTSP are taken care of) OS related hiccups tend to dominate once GC and TTSP are removed as issues. Take scheduling pressure seriously (Duh?) Hyper-threading (good? bad? yes!) Swapping (Duh!) Power management Transparent Huge Pages (THP). ... ©2012 Azul Systems, Inc.
  • 42. Shameless bragging ©2013 Azul Systems, Inc.
  • 43. Zing A JVM for Linux/x86 servers ELIMINATES Garbage Collection as a concern for enterprise applications Very wide operating range: Used in both low latency and large scale enterprise application spaces Decouples scale metrics from response time concerns Transaction rate, data set size, concurrent users, heap size, allocation rate, mutation rate, etc. Leverages elastic memory for resilient operation ©2013 Azul Systems, Inc.
  • 44. What is Zing good for? If you have a server-based Java application And you are running on Linux And you use using more than ~300MB of memory Then Zing will likely deliver superior behavior metrics ©2013 Azul Systems, Inc.
  • 45. Where Zing shines Low latency Eliminate behavior blips down to the sub-millisecond-units level Machine-to-machine “stuff” Support higher *sustainable* throughput (the one that meets SLAs) Human response times Eliminate user-annoying response time blips. Multi-second and even fraction-of-a-second blips will be completely gone. Support larger memory JVMs *if needed* (e.g. larger virtual user counts, or larger cache, in-memory state, or consolidating multiple instances) “Large” data and in-memory analytics Make batch stuff “business real time”. Gain super-efficiencies. ©2013 Azul Systems, Inc.
  • 46. Takeaway: In 2013, “Real” Java is finally viable for low latency applications GC is no longer a dominant issue, even for outliers 2-3msec worst observed case with “easy” tuning < 1 msec worst observed case is very doable No need to code in special ways any more You can finally use “real” Java for everything You can finally 3rd party libraries without worries You can finally use as much memory as you want You can finally use regular (good) programmers ©2012 Azul Systems, Inc.
  • 47. One-liner Takeaway: Zing: A cure for the Java hiccups ©2013 Azul Systems, Inc.
  • 48. Q&A ! http:/ /www.azulsystems.com ! http:/ /www.jhiccup.com http:/ /giltene.github.com/HdrHistogram http:/ /latencyutils.github.com/LatencyUtils ©2013 Azul Systems, Inc.
  • 49. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/javatuning-latency

×