Successfully reported this slideshow.
Your SlideShare is downloading. ×

Get Lower Latency and Higher Throughput for Java Applications

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 25 Ad

Get Lower Latency and Higher Throughput for Java Applications

Getting the best performance out of your Java applications can often be a challenge due to the managed environment nature of the Java Virtual Machine and the non-deterministic behaviour that this introduces. Automatic garbage collection (GC) can seriously affect the ability to hit SLAs for the 99th percentile and above.

This session will start by looking at what we mean by speed and how the JVM, whilst extremely powerful, means we don’t always get the performance characteristics we want. We’ll then move on to discuss some critical features and tools that address these issues, i.e. garbage collection, JIT compilers, etc. At the end of the session, attendees will have a clear understanding of the challenges and solutions for low-latency Java.

Getting the best performance out of your Java applications can often be a challenge due to the managed environment nature of the Java Virtual Machine and the non-deterministic behaviour that this introduces. Automatic garbage collection (GC) can seriously affect the ability to hit SLAs for the 99th percentile and above.

This session will start by looking at what we mean by speed and how the JVM, whilst extremely powerful, means we don’t always get the performance characteristics we want. We’ll then move on to discuss some critical features and tools that address these issues, i.e. garbage collection, JIT compilers, etc. At the end of the session, attendees will have a clear understanding of the challenges and solutions for low-latency Java.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Get Lower Latency and Higher Throughput for Java Applications (20)

Advertisement

More from ScyllaDB (20)

Recently uploaded (20)

Advertisement

Get Lower Latency and Higher Throughput for Java Applications

  1. 1. Brought to you by Get Lower Latency and Higher Throughput for Java Applications Simon Ritter Deputy CTO at
  2. 2. Simon Ritter Deputy CTO ■ Java Champion and two times JavaOne Rockstar ■ 99th Percentile is the hard part of performance ■ Away from work, my son and I are restoring a Classic Mini
  3. 3. JVM Performance Challenges ■ Latency ● Biggest issue is Garbage Collection ● Stop-the-world pauses for almost all collectors ● Pauses are typically proportional to heap size, not live data ■ Throughput ● Adaptive JIT compilation: Interpreted, C1 compiled, C2 compiled ● Deoptimisations ● Level of optimisation is key ■ Warmup ● Time taken to get to fully optimised code for all hot methods ● Restart of an application requires the same warmup work to be carried out
  4. 4. Azul Platform Prime: An Alternative JVM ■ Based on OpenJDK source code ■ Passes all Java SE TCK/JCK tests ● Drop-in replacement for other JVMs ● No application code changes, no recompilation ■ Hotspot collectors replaced with C4 ■ C2 JIT compiler replaced with Falcon ■ ReadyNow! warm up elimination technology
  5. 5. Azul Continuous Concurrent Compacting Collector (C4)
  6. 6. C4 Basics ■ Generational (young and old) ● Uses the same GC collector for both ● For efficiency rather than pause containment ■ All phases are parallel ■ No STW compacting fallback ● Heap scales from 512Mb to 12Tb (with no change to GC latency) ■ Algorithm is mark, relocate, remap ■ Only supported on Linux ● Sophisticated OS memory management interaction
  7. 7. Loaded Value Barrier ■ Read barrier ● Tests all object references as they are loaded ■ Enforces two invariants ● Reference is marked through ● Reference points to correct object position ■ Minimal performance overhead ● Test and jump (2 instructions) ● x86 architecture reduces this to one micro-op
  8. 8. Concurrent Mark Phase Root Set GC Threads App Threads X X X X X
  9. 9. Relocation Phase Compaction A B C D E A’ B’ C’ D’ E’ A -> A’ B -> B’ C -> C’ D -> D’ E -> E’
  10. 10. Remapping Phase App Threads GC Threads A -> A’ B -> B’ C -> C’ D -> D’ E -> E’ X X X
  11. 11. Measuring Platform Performance ■ jHiccup ■ Spends most of its time asleep ● Minimal effect on performance ● Wakes every 1 ms ● Records delta of time it expects to wake up ● Measured effect is what would be experienced by your application ■ Generates histogram log files ● These can be graphed for easy evaluation
  12. 12. Eliminating ElasticSearch Latency HotSpot Azul Prime 128Gb heap Prime:128GB: Prime:128GB:
  13. 13. Eliminating ElasticSearch Latency HotSpot Azul Prime 128Gb heap Prime:128GB: Prime:128GB:
  14. 14. Azul Falcon JIT Compiler
  15. 15. Advancing Adaptive Compilation ■ Replacement for C2 JIT compiler ■ Azul Falcon compiler ● Based on latest compiler research ● LLVM project ■ Better performance ● Better intrinsics ● More inlining ● Fewer compiler excludes
  16. 16. Vector Code Example ■ Conditional array cell addition loop ● Hard for compiler to identify for vector instruction use private void addArraysIfEven(int a[], int b[]) { if (a.length != b.length) throw new RuntimeException("length mismatch"); for (int i = 0; i < a.length; i++) if ((b[i] & 0x1) == 0) a[i] += b[i]; }
  17. 17. Traditional JVM JIT Per element jumps 2 elements per iteration
  18. 18. Falcon JIT Using AVX2 vector instructions 32 elements per iteration Broadwell E5-2690-v4
  19. 19. Recent Customer Success Story ■ Leading cloud-based IT security company ● Cloud security, compliance and other services ■ Big Kafka user ● 2.5 billion messages across Kafka clusters daily ● Initially approached us about their Cassandra clusters and eliminating latency ■ Kafka improvements ● 20% performance gain, out-of-the-box, with no tuning ● Falcon improved code generation ● Resulted in a 15% saving in cloud hardware costs ● Platform Core was effectively cheaper than free!
  20. 20. ReadyNow! Warmup Elimination Technology ■ Save JVM JIT profiling information ● Classes loaded ● Classes initialised ● Instruction profiling data ● Speculative optimisation failure data ■ Data can be gathered over much longer period ● JVM/JIT profiles quickly ● Significant reduction in deoptimisations ■ Able to load, initialise and compile most code before main()
  21. 21. Impact on Latency Before After
  22. 22. Compile Stashing Effect Performance Time Performance Time Without Compile Stashing With Compile Stashing Up to 80% reduction in compile time and 60% reduction in CPU load
  23. 23. Summary
  24. 24. Improving Java Performance ■ Collect and re-use profiles to reduce warm-up time ■ Use alternative JIT compilation strategies ■ Eliminate GC STW pauses through use of read-barrier ■ Azul working to deliver better Java performance.
  25. 25. Brought to you by Simon Ritter sritter@azul.com @speakjava

×