yo ur
                    y        r ite
                            w ark
                tsk      to m
            zni
 ...
Agenda


  • Introduction
  • Java™ micro benchmarking pitfalls
  • Writing your own benchmark
  • Micro benchmarking tool...
Microbenchmark – simple definition




 1. Start the   2. Run the code   3. Stop the   4. Report
    clock                ...
Better microbenchmark definition


   • Small program
   • Goal: Measure something about a few
     lines of code
   • All...
Why do I need microbenchmarks?


 • Discover something about my code:
  • How fast is it
  • Calculate throughput – TPS, K...
Why are you talking about this?


   • It’s hard to write a robust
     microbenchmark
   • it’s even harder to do it in J...
Agenda


  • Introduction
  • Java micro benchmarking pitfalls
  • Writing your own benchmark
  • Micro benchmarking tools...
A microbenchmark story: the problem


The boss asks you to solve a performance issue
  in one of the components


        ...
A microbenchmark story: the cause


   You find out that the cause is excessive use
     of Math.sqrt()




              ...
A microbenchmark story: a solution?




   • You decide to develop a state of the art
     square root approximation
   • ...
SQRT approximation microbenchmark


  Let’s run this little piece of code in a loop
    and see what happens …

   public ...
SQRT microbenchmark results


  Wow, this is really fast !

  Test duration: 0 (ms)




                               12
Flawed microbenchmark




                        13
SQRT microbenchmark: what’s wrong?


    Dynamic optimizations
   Garbage collection         Dead code elimination


The J...
The HotSpot: a mixed mode system



                                            2
             Code is
        1
         ...
Dynamic compilation


 • Dynamic compilation is unpredictable
  • Don’t know when the compiler will run
  • Don’t know how...
Dynamic compilation cont.


   • Dynamic compilation can seriously
     influence microbenchmark results

   Continuous re...
Dynamic optimizations



   • The HotSpot server compiler performs
     large variety of optimizations:
     • loop unroll...
Code hoisting ?


                  Did he just said
                     “code
                     hoisting”?




      ...
What the heck is code hoisting ?


   • Hoist = to raise or lift
   • Size optimization
   • Eliminate duplicated pieces
 ...
Code hoisting example


         a + b is a busy                                      After hoisting the
        expressio...
Dynamic optimizations cont.




  • Most of the optimizations are performed
    at runtime
  • Profiling data is used by t...
Example: Very fast square root?



 10,000,000 calls to Math.sqrt() ~ 4 ms

  public static void main(String[] args) {
   ...
Example: not so fast?


 Now it takes ~ 2000 ms ?!?
  public static void main(String[] args) {
      long start = System.n...
DCE - Dead Code Elimination


 • Dead code - code that has no effect on the
   outcome of the program execution
  public s...
OSR - On Stack Replacement



   • Methods are HOT if they cumulatively
     execute more than 10,000 of loop
     iterati...
OSR and microbenchmarking


 • OSR’d code may be less performant
   • Some optimizations are not performed
 • OSR usually ...
Classloading



  • Classes are usually loaded only when
     they are first used
  • Class loading takes time
    • I/O
 ...
Garbage Collection


   • JVM automatically claim resources by
     • Garbage collection
     • Objects finalization
   • ...
Time measurement


     How long is one millisecond?
 public static void main(String[] args) throws
    InterruptedExcepti...
System.curremtTimeMillis()


 • Accuracy varies with platform

  Resolution              Platform            Source
 55 ms...
Wrong target platform


   • Choosing the wrong platform for your
     microbenchmark
     • Benchmarking on Windows when ...
Caching



  • Caching
    • Hardware – CPU caching
    • Operating System – File system caching
    • Database – query ca...
Caching: CPU L1 and L2 caches


 • The more the data accessed are far from the
   CPU, the more the delays are high
 • Siz...
Busy environment


 • Running in a busy environment – CPU,
   IO, Memory




                                          35
Agenda


  • Introduction
  • Java micro benchmarking pitfalls
  • Writing your own benchmark
  • Micro benchmarking tools...
Warm-up your code




                    37
Warm-up up your code




  • Let the JVM reach steady state execution
     profile before you start benchmarking
  • All c...
Warm-up up your code – cont.


   • Detect JIT compilations by using
    • CompilationMXBean.
         getTotalCompilation...
CompilationMXBean usage



 import java.lang.management.ManagementFactory;
 import java.lang.management.CompilationMXBean;...
Dynamic optimizations


   • Avoid on stack replacement
     • Don’t put all your benchmark code in one
       big main() ...
Garbage Collection


   • Measure garbage collection time
     • Force garbage collection and finalization
       before b...
Time measurement


 • Use System.nanoTime()
   • Microseconds accuracy on modern operating
     systems and hardware
   • ...
JVM configuration


  • Use similar JVM options to your target
    environment:
    • -server or –client JVM
    • Enough ...
Other issues


   • Use fixed size data sets
     • Too large data sets can cause L1 cache
       blowout
   • Notice syst...
Agenda


  • Introduction
  • Java micro benchmarking pitfalls
  • Writing your own benchmark
  • Micro benchmarking tools...
Java™ benchmarking tools


  • Various specialized benchmarks
    • SPECjAppServer ®
    • SPECjvm™
    • CaffeineMark 3.0...
Japex Micro-Benchmark framework


  • Similar in spirit to JUnit
  • Measures throughput – work over time
    • Transactio...
Japex: Drivers


   • Encapsulates knowledge about a specific
      algorithm implementation
   • Must extend JapexDriverB...
Japex: Writing your own driver



public class SqrtNewtonApproxDriver extends JapexDriverBase {
    private long tmp;
    ...
Japex: Test suite


 <testSuite name=quot;SQRT Test Suitequot;
        xmlns=http://www.sun.com/japex/testSuite …>
    <pa...
Japex: HTML Reports




                      52
Japex: more chart types




                          Scatter chart




        Line chart




                           ...
Japex: pros and cons


   • Pros
     • Similar to JUnit
     • Nice HTML reports
   • Cons
     • Last stable release on ...
Brent Boyer’s Benchmark framework


 • Part of the “Robust Java benchmarking”
   article by Brent Boyer
 • Automate as man...
Benchmark framework example


 Benchmark.Params params = new Benchmark.Params(true);

 params.setExecutionTimeGoal(0.5);

...
Benchmark single line summary


  Benchmark output:
  first = 25.702 us,
  mean = 91.070 ns
    (CI deltas: -115.591 ps, +...
Outlier and serial correlation issues

   • Records outlier and serial correlation
     issues
   • Outliers indicate that...
Benchmark : pros and cons


   • Pros
    • Handles HotSpot related issues
    • Detailed statistics
   • Cons
    • Each ...
Agenda


  • Introduction
  • Java micro benchmarking pitfalls
  • Writing your own benchmark
  • Micro benchmarking tools...
Summary 1


  • Micro benchmarking is hard when it
    comes to Java™
  • Define what you want to measure and
    how want...
Summary 2


 • Do not rely solely on microbenchmark
   results
   • Sanity check results
   • Use a profiler
   • Test you...
Summary: resources


  • http://www.ibm.com/developerworks/java/librar
     y/j-benchmark1.html
  • http://www.azulsystems...
Thank
You !
        64
Upcoming SlideShare
Loading in …5
×

So You Want To Write Your Own Benchmark

18,274 views
17,642 views

Published on

Performance has always been a major concern in software development and should not be taken lightly even when commodity computers have multicore CPUs and a few gigabytes of RAM. One of the most handy, simple tools for performance testing are microbenchmarks. Unfortunately, developing correct Java microbenchmarks is a complex task with many pitfalls on the way. This presentation is about the Do's and Don'ts of Java microbenchmarking and about what tools are out there to help with this tricky task.

Published in: Technology
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
18,274
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
147
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide

So You Want To Write Your Own Benchmark

  1. yo ur y r ite w ark tsk to m zni er e a nt ch Dr or B w en ou rob y ic So m n ow 8t h2 008 er 1 mb Dece
  2. Agenda • Introduction • Java™ micro benchmarking pitfalls • Writing your own benchmark • Micro benchmarking tools • Summary 2
  3. Microbenchmark – simple definition 1. Start the 2. Run the code 3. Stop the 4. Report clock clock 3
  4. Better microbenchmark definition • Small program • Goal: Measure something about a few lines of code • All other variables should be removed • Returns some kind of a numeric result 4
  5. Why do I need microbenchmarks? • Discover something about my code: • How fast is it • Calculate throughput – TPS, KB/s • Measure the result of changing my code: • Should I replace a HashMap with a TreeMap? • What is the cost of synchronizing a method? 5
  6. Why are you talking about this? • It’s hard to write a robust microbenchmark • it’s even harder to do it in Java™ • There are not enough Java microbenchmarking tools • There are too many flawed microbenchmarks out there 6
  7. Agenda • Introduction • Java micro benchmarking pitfalls • Writing your own benchmark • Micro benchmarking tools • Summary 7
  8. A microbenchmark story: the problem The boss asks you to solve a performance issue in one of the components Blah, blah … 8
  9. A microbenchmark story: the cause You find out that the cause is excessive use of Math.sqrt() 9
  10. A microbenchmark story: a solution? • You decide to develop a state of the art square root approximation • After developing the square root approximation you want to benchmark it against the java.lang.Math implementation 10
  11. SQRT approximation microbenchmark Let’s run this little piece of code in a loop and see what happens … public static void main(String[] args) { long start = System.currentTimeMillis(); // start the clock for (double i = 0; i < 10 * 1000 * 1000; i++) { mySqrt(i); // little piece of code } long end = System.currentTimeMillis(); // stop the clock long duration = end - start; System.out.format(quot;Test duration: %d (ms) %nquot;, duration); } 11
  12. SQRT microbenchmark results Wow, this is really fast ! Test duration: 0 (ms) 12
  13. Flawed microbenchmark 13
  14. SQRT microbenchmark: what’s wrong? Dynamic optimizations Garbage collection Dead code elimination The Java™ HotSpot virtual machine Classloading Dynamic Compilation On Stack Replacement 14
  15. The HotSpot: a mixed mode system 2 Code is 1 interpreted Profiling 3 Interpreted again Dynamic or recompiled Compilation 5 Stuff 4 Happen 15
  16. Dynamic compilation • Dynamic compilation is unpredictable • Don’t know when the compiler will run • Don’t know how long the compiler will run • Same code may be compiled more than once • The JVM can switch to compiled code at will 16
  17. Dynamic compilation cont. • Dynamic compilation can seriously influence microbenchmark results Continuous recompilation Steady-state Interpreted execution + Compiled / Interpreted code Dynamic compilation + ≠ execution Compiled code execution 17
  18. Dynamic optimizations • The HotSpot server compiler performs large variety of optimizations: • loop unrolling • range check elimination • dead-code elimination • code hoisting … 18
  19. Code hoisting ? Did he just said “code hoisting”? 19
  20. What the heck is code hoisting ? • Hoist = to raise or lift • Size optimization • Eliminate duplicated pieces of code in method bodies by hoisting expressions or statements 20
  21. Code hoisting example a + b is a busy After hoisting the expression expression a + b. A new local variable t has been introduced Optimizing Java for Size: Compiler Techniques for Code Compaction, Samuli Heilala 21
  22. Dynamic optimizations cont. • Most of the optimizations are performed at runtime • Profiling data is used by the compiler to improve optimization decisions • You don’t have access to the dynamically compiled code 22
  23. Example: Very fast square root? 10,000,000 calls to Math.sqrt() ~ 4 ms public static void main(String[] args) { long start = System.nanoTime(); int result = 0; for (int i = 0; i < 10 * 1000 * 1000; i++) { result += Math.sqrt(i); } long duration = (System.nanoTime() - start) / 1000000; System.out.format(quot;Test duration: %d (ms) %nquot;, duration); } 23
  24. Example: not so fast? Now it takes ~ 2000 ms ?!? public static void main(String[] args) { long start = System.nanoTime(); int result = 0; for (int i = 0; i < 10 * 1000 * 1000; i++) { result += Math.sqrt(i); Single line of code } added System.out.format(quot;Result: %d %nquot;, result); long duration = (System.nanoTime() - start) / 1000000; System.out.format(quot;Test duration: %d (ms) %nquot;, duration); } 24
  25. DCE - Dead Code Elimination • Dead code - code that has no effect on the outcome of the program execution public static void main(String[] args) { long start = System.nanoTime(); int result = 0; for (int i = 0; i < 10 * 1000 * 1000; i++) { result += Math.sqrt(i); } Dead Code long duration = (System.nanoTime() - start) / 1000000; System.out.format(quot;Test duration: %d (ms) %nquot;, duration); } 25
  26. OSR - On Stack Replacement • Methods are HOT if they cumulatively execute more than 10,000 of loop iterations • Older JVM versions did not switch to the compiled version until the method exited and was re-entered • OSR - switch from interpretation to compiled code in the middle of a loop 26
  27. OSR and microbenchmarking • OSR’d code may be less performant • Some optimizations are not performed • OSR usually happen when you put everything into one long method • Developers tend to write long main() methods when benchmarking • Real life applications are hopefully divided into more fine grained methods 27
  28. Classloading • Classes are usually loaded only when they are first used • Class loading takes time • I/O • Parsing • Verification • May flow your benchmark results 28
  29. Garbage Collection • JVM automatically claim resources by • Garbage collection • Objects finalization • Outside of developer’s control • Unpredictable • Should be measured if invoked as a result of the benchmarked code 29
  30. Time measurement How long is one millisecond? public static void main(String[] args) throws InterruptedException { long start = System.currentTimeMillis(); Thread.sleep(1); final long end = System.currentTimeMillis(); final long duration = (end - start); System.out.format(quot;Test duration: %d (ms) %nquot;, duration); } Test duration: 16 (ms) 30
  31. System.curremtTimeMillis() • Accuracy varies with platform Resolution Platform Source 55 ms Windows 95/98 Java Glossary 10 – 15 ms Windows NT, 2K, XP, 2003 David Holmes 1 ms Mac OS X Java Glossary 1 ms Linux – 2.6 kernel Markus Kobler 31
  32. Wrong target platform • Choosing the wrong platform for your microbenchmark • Benchmarking on Windows when your target platform is Linux • Benchmarking a highly threaded application on a single core machine • Benchmarking on a Sun JVM when the target platform is Oracle (BEA) JRockit 32
  33. Caching • Caching • Hardware – CPU caching • Operating System – File system caching • Database – query caching 33
  34. Caching: CPU L1 and L2 caches • The more the data accessed are far from the CPU, the more the delays are high • Size of dataset affects access cost Array size Time (us) Cost (ns) 16k 413451 9.821 8192K 5743812 136.446 Jcachev2 results for Intel® core™2 duo T8300, L1 = 32 KB, L2 = 3 MB 34
  35. Busy environment • Running in a busy environment – CPU, IO, Memory 35
  36. Agenda • Introduction • Java micro benchmarking pitfalls • Writing your own benchmark • Micro benchmarking tools • Summary 36
  37. Warm-up your code 37
  38. Warm-up up your code • Let the JVM reach steady state execution profile before you start benchmarking • All classes should be loaded before benchmarking • Usually executing your code for ~10 seconds should be enough 38
  39. Warm-up up your code – cont. • Detect JIT compilations by using • CompilationMXBean. getTotalCompilationTime() • -XX:+PrintCompilation • Measure classloading time • Use the ClassLoadingMXBean 39
  40. CompilationMXBean usage import java.lang.management.ManagementFactory; import java.lang.management.CompilationMXBean; long compilationTimeTotal; CompilationMXBean compBean = ManagementFactory.getCompilationMXBean(); if (compBean.isCompilationTimeMonitoringSupported()) compilationTimeTotal = compBean.getTotalCompilationTime(); 40
  41. Dynamic optimizations • Avoid on stack replacement • Don’t put all your benchmark code in one big main() method • Avoid dead code elimination • Print the final result • Report unreasonable speedups 41
  42. Garbage Collection • Measure garbage collection time • Force garbage collection and finalization before benchmarking • Perform enough iteration to reach garbage collection steady state • Gather gc stats: -XX:PrintGCTimeStamps -XX:PrintGCDetails 42
  43. Time measurement • Use System.nanoTime() • Microseconds accuracy on modern operating systems and hardware • Not worse than currentTimeMillis() • Notice: Windows users • executes in microseconds • don’t overuse ! 43
  44. JVM configuration • Use similar JVM options to your target environment: • -server or –client JVM • Enough heap space (-Xmx) • Garbage collection options • Thread stack size (-Xss) • JIT compiling options 44
  45. Other issues • Use fixed size data sets • Too large data sets can cause L1 cache blowout • Notice system load • Don’t play GTA while benchmarking ! 45
  46. Agenda • Introduction • Java micro benchmarking pitfalls • Writing your own benchmark • Micro benchmarking tools • Summary 46
  47. Java™ benchmarking tools • Various specialized benchmarks • SPECjAppServer ® • SPECjvm™ • CaffeineMark 3.0™ • SciMark 2.0 • Only a few benchmarking frameworks 47
  48. Japex Micro-Benchmark framework • Similar in spirit to JUnit • Measures throughput – work over time • Transactions Per Second (Default) • KBs per second • XML based configuration • XML/HTML reports 48
  49. Japex: Drivers • Encapsulates knowledge about a specific algorithm implementation • Must extend JapexDriverBase public interface JapexDriver extends Runnable { public void initializeDriver(); public void prepare(TestCase testCase); public void warmup(TestCase testCase); public void run(TestCase testCase); public void finish(TestCase testCase); public void terminateDriver(); } 49
  50. Japex: Writing your own driver public class SqrtNewtonApproxDriver extends JapexDriverBase { private long tmp; … @Override public void warmup(TestCase testCase) { tmp += sqrt(getNextRandomNumber()); } … } 50
  51. Japex: Test suite <testSuite name=quot;SQRT Test Suitequot; xmlns=http://www.sun.com/japex/testSuite …> <param name=quot;libraryDirquot; value=quot;C:/java/japex/libquot;/> <param name=quot;japex.classPathquot; value=quot;./target/classesquot;/> <param name=quot;japex.runIterationsquot; value=quot;1000000quot; /> <driver name=quot;SqrtApproxNewtonDriverquot;> <param name=quot;Descriptionquot; value=quot;Newton Driverquot;/> <param name=quot;japex.driverClass“ value=quot;com.alphacsp.javaedge.benchmark. japex.driver.SqrtNewtonApproxDriverquot;/> </driver> <testCase name=quot;testcase1quot;/> </testSuite> 51
  52. Japex: HTML Reports 52
  53. Japex: more chart types Scatter chart Line chart 53
  54. Japex: pros and cons • Pros • Similar to JUnit • Nice HTML reports • Cons • Last stable release on March 2007 • HotSpot issues are not handled • XML configuration 54
  55. Brent Boyer’s Benchmark framework • Part of the “Robust Java benchmarking” article by Brent Boyer • Automate as many aspects as possible: • Resource reclamation • Class loading • Dead code elimination • Statistics 55
  56. Benchmark framework example Benchmark.Params params = new Benchmark.Params(true); params.setExecutionTimeGoal(0.5); params.setNumberMeasurements(50); Runnable task = new Runnable() { public void run() { sqrt(getNextRandomNumber()); } }; Benchmark benchmark = new Benchmark(task, params); System.out.println(benchmark.toString()); 56
  57. Benchmark single line summary Benchmark output: first = 25.702 us, mean = 91.070 ns (CI deltas: -115.591 ps, +171.423 ps) sd = 1.451 us (CI deltas: -461.523 ns, +676.964 ns) WARNING: execution times have mild outliers, SD VALUES MAY BE INACCURATE 57
  58. Outlier and serial correlation issues • Records outlier and serial correlation issues • Outliers indicate that a major measurement error happened • Large outliers - some other activity started on the computer during measurement • Small outliers might hint that DCE occurred • Serial correlation indicates that the JVM has not reached its steady-state performance profile 58
  59. Benchmark : pros and cons • Pros • Handles HotSpot related issues • Detailed statistics • Cons • Each run takes a lot of time • Not a formal project • Lacks documentation 59
  60. Agenda • Introduction • Java micro benchmarking pitfalls • Writing your own benchmark • Micro benchmarking tools • Summary 60
  61. Summary 1 • Micro benchmarking is hard when it comes to Java™ • Define what you want to measure and how want to do it, pick your goals • Know what you are doing • Always warm-up your code • Handle DCE, OSR, GC issues • Use fixed size data sets and fixed work 61
  62. Summary 2 • Do not rely solely on microbenchmark results • Sanity check results • Use a profiler • Test your code in real life scenarios under realistic load (macro-benchmark) 62
  63. Summary: resources • http://www.ibm.com/developerworks/java/librar y/j-benchmark1.html • http://www.azulsystems.com/events/javaone_20 02/microbenchmarks.pdf • https://japex.dev.java.net/ • http://www.ibm.com/developerworks/java/librar y/j-jtp12214/ • http://www.dei.unipd.it/~bertasi/jcache/ 63
  64. Thank You ! 64

×