Your SlideShare is downloading. ×
0
Java Performance           Zdeněk Troníček  Faculty of Information Technology Czech Technical University in Prague        ...
Who am I?    Zdeněk Troníček, tronicek@fit.cvut.cz      Assistant professor at Faculty of Information Technology, Czech   ...
You will learn…       How to approach performance problems       How to find a bottleneck in Java program       How to sol...
Tools                                   disassembler                jmap          jconsole                                ...
Prerequisites       Enthusiasm       Knowledge of the Java programming language (we will not do any       programming but ...
Course Map       Introduction       Java Virtual Machine       Memory management       Threads and synchronization6   Zden...
Introductions       Name       Affiliation       Experience related to topics presented in this course       Expectations ...
Questions & Answers    Next module: Introduction8   Zdeněk Troníček: Java Performance
1 Introduction9   Zdeněk Troníček: Java Performance
Module Content        Performance improvements        Premature optimization        Coding in Java        Hardware monitor...
Introduction        Our goal is to improve user experience        Java performance tuning is more an art than a science   ...
User Experience        User experience is subjective        Focus on improving the user experience        User experience ...
Demo: User Experience     1.    Run the project in demo/Introduction/UserExperience     2.    Compare your perception of o...
Performance Tips        Do not trust them        Do not use them unless they are justified by measurement        Beware of...
Premature Optimization     D. E. Knuth (1974): “We should forget about small efficiencies, say     about 97% of the time: ...
Premature Optimization (cont.)     Let’s assume that findMax is O( m )…     for ( int i : values ) {       if ( i > findMa...
Premature Optimization (cont.)     for ( int i = 0, n = p.size(); i < n; i++ ) { … }                                      ...
How to Code in Java?        Focus on architecture        Use efficient technologies and frameworks        Use sound design...
Performance Tuning        The goal of performance tuning is better (subjective) performance of        application        H...
Performance Tuning (cont.)     Performance problem can be in        Hardware         o CPU         o Memory         o I/O ...
Hardware Monitoring                   Tools                     Utilization of CPU                                        ...
Hardware Monitoring (cont.)                                         Utilization of network                                ...
vmstat procs     procs      memory          page            disk       faults     cpu     r bw     swap free re mf pi po f...
vmstat memory     procs      memory          page            disk       faults     cpu     r bw     swap free re mf pi po ...
vmstat page     procs      memory          page            disk       faults     cpu     r bw     swap free re mf pi po fr...
vmstat disk     procs      memory          page            disk       faults     cpu     r bw     swap free re mf pi po fr...
vmstat faults     procs      memory          page            disk       faults     cpu     r bw     swap free re mf pi po ...
vmstat cpu     procs      memory          page            disk       faults     cpu     r bw     swap free re mf pi po fr ...
Performance Bounds     Performance can be bound by        CPU        Memory        I/O (disk or network)        Locks     ...
Performance Patterns     Manifestation                       Possible cause     CPU utilization in user space       Applic...
CPU Utilization in User Space               Utilization of CPU                                           User             ...
GC Monitoring (Visual VM)            Observation                   Next step            GC activity is < 5% of CPU    CPU ...
CPU Utilization in Kernel Space                 Utilization of CPU                                            User        ...
Idle CPU                                         Utilization of CPU              Observation                          Next...
CPU Profiling      Methods with largest time are candidates for optimization35   Zdeněk Troníček: Java Performance
Profiler        Sampling – the profiler agent periodically samples the stack of all        running threads or periodically...
Instrumentation     before                              after     public void m1() {                  public void m1() {  ...
Instrumentation: systematic error     void a() { while (b()) { c(); … } }     boolean b() { for (…) { d(); … } return … } ...
Sampling vs. Instrumentation                                  Sampling                        Instrumentation     Code mod...
Monitoring Threads       • Sleeping: Thread.sleep(…)       • Wait: Object.wait()       • Monitor: trying to enter a guarde...
Demo: Visual VM     1.    Run Visual VM from jdk/bin     2.    Open some application in Visual VM     3.    Monitor the ap...
Lab: Monitoring     Classify programs in labs/lab1 into four categories:        CPU utilized by application        CPU uti...
Conclusion        Performance improvements        Premature optimization        Hardware monitoring        Performance pat...
Questions & Answers     Next module: Java Virtual Machine44   Zdeněk Troníček: Java Performance
2 Java Virtual Machine45   Zdeněk Troníček: Java Performance
Module Content        Java bytecode        Java disassembler        Just-in-Time (JIT) compiler        Micro-benchmark46  ...
Java Virtual Machine (JVM)                           javac       A.java                            A.class     bytecode   ...
Java Bytecode                                                    Operand stack Local variables                int x = 42; ...
Java Bytecode (cont.)                                                              x      int y = x + 1;                  ...
Class File Structure        Magic number (0xCAFEBABE, 4 bytes)        Minor and major versions of the class file (2 bytes,...
Constant Pool                                                          Constant pool     StringBuilder sb =               ...
Bytecode Descriptors      Java type Bytecode descriptor        Java type Bytecode descriptor      int             I       ...
Java Disassembler     javap <options> <classes>         -c                    Disassemble the code         -public        ...
Lab: String Concatenation     1.    Open the project in labs/lab2/StringConcatenation in NetBeans     2.    Get familiar w...
Evaluation     String concat( String s1, String s2 ) {         return s1 + " " + s2;     }                                ...
Evaluation (cont.)     String concatNames( String... names ) {        String s = "";        for ( String n : names ) {    ...
Discussion        What does the result say about performance?        Should we use the + operator or StringBuilder for str...
Measuring Time        System.currentTimeMillis() – returns the number of milliseconds since        January 1, 1970        ...
How to measure?     long start = System.nanoTime();     // some code     long end = System.nanoTime();     long time = end...
Resolution     resolution of System.currentTimeMillis() is        10 ms on Windows NT, Windows 2000        15.6 ms on Wind...
ThreadMXBean     getCurrentThreadCpuTime() returns the total CPU time for the current     thread in nanoseconds      Threa...
Sun’s HotSpot JVM        Sun’s HotSpot JVM combines interpretation and Just-in-time (JIT)        compilation        Initia...
Sun’s HotSpot JVM (cont.)        Two modes in JDK 6        HotSpot client, -client         o default on Windows         o ...
Hotspot JIT Compiler     You can change the default JIT compiler behavior using the following     switches (don’t use them...
-XX:+PrintCompilation     java -XX:+PrintCompilation …      1    java.lang.String::charAt (33 bytes)      2    java.lang.S...
-XX:+LogCompilation     java -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation …        the log file can be set by -XX:Lo...
CompilationMXBean     getTotalCompilationTime() returns the number of milliseconds that JIT     compiler spent compiling t...
Demo: Compilation     1.    Run the project in jdk/demo/jfc/Java2D     2.    Run jconsole in jdk/bin     3.    Inspect the...
-XX:+PrintAssembly        Must be preceded by -XX:+UnlockDiagnosticVMOptions        Requires disassembler plugin        Mo...
Dynamic Optimization        On stack replacement        Dead-code elimination        Escape analysis        Autobox elisio...
On Stack Replacement     JVM can replace the current code with new compiled code when     executing the method     public ...
Dead-code Elimination     Dead code can be eliminated either by javac or by JVM     private static final boolean debug = f...
Escape Analysis        An object escapes the thread that allocated it, if some other thread can        see it        Many ...
Autobox Elision        Autoboxing is conversion from primitive type to an instance of        wrapper class        Under so...
Inlining     Inlining is an optimization that replaces a function call with the body of     the function     static int ma...
Loop Unrolling     Loop unrolling is an optimization that optimize program’s execution     speed at the expense of its siz...
Micro-benchmark        Measures performance of small piece of code        JIT compiler can change what we measure (warm-up...
Demo: Inlining     1.    Open the project in demo/JavaVirtualMachine/Inlining in NetBeans     2.    Get familiar with the ...
Demo: Autoboxing Performance     1.    Run the project in demo/JavaVirtualMachine/AutoboxingPerformance     2.    Run the ...
Lab: Micro-benchmark     1.    Open the project in labs/lab2/Microbenchmark in NetBeans     2.    Get familiar with the so...
Evaluation     3: aload_0     4: invokeinterface #11, 1; //InterfaceMethod java/util/List.iterator:()Ljava/util/Iterator; ...
Evaluation (cont.)     3: iconst_0     4: istore_2     5: iload_2     6: aload_0     7: invokeinterface #16, 1; //Interfac...
Lab: Lazy Bezier     1.    Run the project in labs/lab2/LazyBezier     2.    Monitor the project in Visual VM     3.    Fi...
Lab: Lazy Textures     1.    Run the project in labs/lab2/LazyTextures     2.    Profile the project     3.    Find the ho...
Conclusion        Java Virtual Machine (JVM)        Java disassembler        Just-in-Time (JIT) compiler        Micro-benc...
Questions & Answers     Next module: Memory management86   Zdeněk Troníček: Java Performance
3 Memory Management87   Zdeněk Troníček: Java Performance
Module Content        Java heap        Garbage collectors        Monitoring garbage collector     Oracle JVM        Memory...
Weak Generational Hypothesis        Most objects die young        Few references from older to younger objects exist      ...
Generational Algorithm        Young generation – “young” objects        Tenured (old) generation – “old” or large objects ...
Garbage Collections        “Stop the world” – all Java application threads are blocked for the        duration of the garb...
Mark & Sweep                                             Local variables                                             Metho...
Young Generation        Three areas: Eden and two Survivor spaces (From and To)        Copying collector      Eden        ...
Minor Collection                        1                           2     Eden                                Eden        ...
Minor Collection (cont.)                        5                                              6     Eden                 ...
Minor Collection (cont.)        When the Eden space is full, minor garbage collection event occurs        Each object that...
Minor Collection (cont.)        When there is not enough room in survivor spaces, the full garbage        collection event...
Sizing Java Heap                                                   Tenured                                         virtual...
Sizing Java Heap (cont.)        -XX:NewSize – initial size of young generation        -XX:MaxNewSize – maximum size of you...
Sizing Java Heap (cont.)      For application performance we almost always set             o -Xms and -Xmx             o -...
Demo: Visual GC      1.      Run the project in jdk/demo/jfc/Java2D with              -XX:+DisableExplicitGC      2.      ...
GC Features         Serial vs. Parallel         Stop-the-world vs. Concurrent         Copying vs. Compacting vs. Non-compa...
GC Performance Metrics         Throughput – the percentage of total time not spent in garbage         collection, consider...
GC Performance Metrics (cont.)         Frequency of collection – how often collection occurs within some time         peri...
Garbage Collectors         Serial collector         Throughput collector         Concurrent collector105   Zdeněk Troníček...
Serial Collector         Single threaded collector         Selected by -XX:+UseSerialGC (default on client machines)      ...
Throughput (Parallel) Collector         Selected by -XX:+UseParallelGC (default on server machines)         Young generati...
Serial vs. Parallel Collector                            serial                   parallel                                ...
Demo: Memory Consumer      1.       Run the project in demo/MemoryManagement/MemoryConsumer               with the followi...
Demo: Memory Consumer (cont.)      1.       Run the project in demo/MemoryManagement/MemoryConsumer               with the...
Parallel Compacting Collector         Selected by -XX:+UseParallelOldGC         Young generation: multi-threaded copy coll...
Concurrent Collector (CMS)         Selected by -XX:UseConcMarkSweepGC         Multi-threaded young generation collector (s...
Concurrent Collector Phases         Initial mark phase: marks objects directly reachable from application         Concurre...
Serial vs. CMS Collector                               major collection:                  serial                          ...
CMS Incremental Mode         Selected by -XX:+CMSIncrementalMode         Concurrent phases are done incrementally         ...
Explicit GC         Explicit GC can be disabled with -XX:+DisableExplicitGC         Do not use System.gc() unless you have...
Lab: Serial & Parallel GC      1.       Run the project in jdk/demo/jfc/Java2D with the following settings:           o   ...
Finalizers         Finalizers are not destructors         Object cannot be collected until the finalizer is executed      ...
Ergonomics         Evaluates the system and chooses defaults for the HotSpot JVM         Uses definition of “server class ...
Server Mode         Server JIT compiler         Throughput collector         Initial heap size is 1/64th of the physical m...
Client Mode         Client JIT compiler         Serial garbage collector         Initial heap size is 4 MB         Maximum...
Parallel Collector Tuning         -XX:MaxGCPauseMillis=n – specifies maximum desired pause time         -XX:GCTimeRatio=n ...
GC Monitoring         -verbose:gc, -XX:+PrintGC         -XX:+PrintGCDetails         -XX:+PrintGCTimeStamps, -XX:+PrintGCDa...
-verbose:gc, +XX:+PrintGC                                          Occupancy                                              ...
-XX:+PrintGCDetails        Minor collection       [GC [DefNew: 4415K->512K(4928K), 0.0036650 secs]       4415K->4285K(1587...
-XX:+PrintGCDetails (cont.)        Minor collection with promotion       [GC [DefNew: 4906K->503K(4928K), 0.0031987 secs] ...
-XX:+PrintGCTimeStamps       java -XX:+PrintGC -XX:+PrintGCTimeStamps …                         Time when                 ...
-XX:+PrintGCApplicationStoppedTime       java -XX:+PrintGC -XX:+PrintGCTimeStamps               -XX:+PrintGCApplicationSto...
Common Problems         Young generation too small         Old generation too small         Perm generation too small129  ...
Young Generation Too Small         Young generation is filled up before temporary objects get unreachable         Temporar...
Young Generation Too Small (cont.)        Young generation: 4096 KB        Eden: 3328 KB                                  ...
Old Generation Too Small         Manifestation: major collections are frequent and occur at the same         time as minor...
Demo: Old Generation Too Small         Run the project in jdk/demo/jfc/Java2D with the following settings:          o Init...
Perm Generation Too Small         You get an exception:         java.lang.OutOfMemoryError: PermGen space         Solution...
Demo: Perm Generation Too Small         Run the project in jdk/demo/jfc/Java2D with the following settings:          o Ini...
GC Tuning         Are minor collection pauses too long? Try parallel collector.         Are major collection pauses too lo...
Heap Size Settings                                                                 total size                             ...
Memory Leak      Also known as unintentional object retention (the object that is not      needed anymore and should be un...
Basic Types of Memory Leaks         A single large object that is unintentionally held in memory         Small objects tha...
Finding Memory Leaks         Generations in memory profiler         Heap snapshots comparison         Heap dumps140   Zden...
Generations in Memory Profiler        Object age – the number of garbage collections that the object        survived      ...
Demo: Memory Leak      1.    Run the project in demo/MemoryManagement/MemoryLeak      2.    Run Visual VM and open the pro...
Heap Snapshots Comparison         Take memory snapshots in different points of application (it is         advisable to run...
Demo: Lucky Numbers      1.    Open the project in demo/MemoryManagement/LuckyNumbers in            NetBeans      2.    Pr...
Demo: Stop-time Game      1.    Open the project in demo/MemoryManagement/StopTimeGame in            NetBeans      2.    P...
Lab: Nasty Ellipses      1.    Open the project in labs/lab3/NastyEllipses in NetBeans      2.    Run the project      3. ...
Lab: Slow Baker      1.    Open the project in labs/lab3/SlowBaker in NetBeans      2.    Run the project      3.    Check...
References (java.lang.ref)         Strong reference – a common Java reference         Car c = new Car( id );         Weak ...
Object Reachability         Strong reachability – an object is strongly reachable if it can be reached         without tra...
Object Reachability (cont.)                  created                  strongly   unreachable                              ...
Object Reachability (cont.)                                              softly                                           ...
Demo: References      1.    Open the project in demo/MemoryManagement/References in            NetBeans      2.    Run the...
String         Immutable sequence of characters         Compiler places string literals into the constant pool         Str...
WeakHashMap         A hash-table based Map implementation with weak keys         When a key is no longer in use, the entry...
Demo: WeakHashMap      1.    Open the project in demo/MemoryManagement/WeakHashMap in            NetBeans      2.    Open ...
Lab: Web Browser      1.    Open the project in labs/lab3/WebBrowser in NetBeans      2.    Run the project      3.    Fin...
OutOfMemoryError      Additional message:         Java heap space         PermGen space         Requested array size excee...
Heap Dump      We can dump all reachable objects to a file by         Visual VM         jmap         -XX:+HeapDumpOnOutOfM...
Lab: Hungry Duke      1.    Run HungryDuke.jar with the following settings:                Maximum heap size 32 MB        ...
Questions & Answers      Next module: Threads and Synchronization160   Zdeněk Troníček: Java Performance
Upcoming SlideShare
Loading in...5
×

Java Performance

3,476

Published on

Published in: Technology, News & Politics
1 Comment
7 Likes
Statistics
Notes
No Downloads
Views
Total Views
3,476
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
322
Comments
1
Likes
7
Embeds 0
No embeds

No notes for slide

Transcript of "Java Performance"

  1. 1. Java Performance Zdeněk Troníček Faculty of Information Technology Czech Technical University in Prague Czech Republic tronicek@fit.cvut.cz http://users.fit.cvut.cz/~tronicek © Zdeněk Troníček, 2011
  2. 2. Who am I? Zdeněk Troníček, tronicek@fit.cvut.cz Assistant professor at Faculty of Information Technology, Czech Technical University in Prague Teaching courses Enterprise Java, Programming in Java, Programming languages and Compilers, Database systems 10+ years experience with Java trainings Certifications: o Sun Certified Java Programmer o Sun Certified Web Component Developer o Sun Certified Business Component Developer o Sun Certified Specialist for NetBeans IDE Blog http://tronicek.blogspot.com/2 Zdeněk Troníček: Java Performance
  3. 3. You will learn… How to approach performance problems How to find a bottleneck in Java program How to solve some common problems Something about how Java Virtual Machine (JVM) works Warning! • This course does not teach tools • This course does not contain any performance tips you should follow3 Zdeněk Troníček: Java Performance
  4. 4. Tools disassembler jmap jconsole sampler profiler jhat debugger jps jstack NetBeans VisualVM You can use any tool, provided that it gives information you need. Motto: Tools do not matter. Methodology does.4 Zdeněk Troníček: Java Performance
  5. 5. Prerequisites Enthusiasm Knowledge of the Java programming language (we will not do any programming but we will read source codes)5 Zdeněk Troníček: Java Performance
  6. 6. Course Map Introduction Java Virtual Machine Memory management Threads and synchronization6 Zdeněk Troníček: Java Performance
  7. 7. Introductions Name Affiliation Experience related to topics presented in this course Expectations for this course Something interesting about your country7 Zdeněk Troníček: Java Performance
  8. 8. Questions & Answers Next module: Introduction8 Zdeněk Troníček: Java Performance
  9. 9. 1 Introduction9 Zdeněk Troníček: Java Performance
  10. 10. Module Content Performance improvements Premature optimization Coding in Java Hardware monitoring Performance patterns10 Zdeněk Troníček: Java Performance
  11. 11. Introduction Our goal is to improve user experience Java performance tuning is more an art than a science Do not optimize prematurely Correctness is always over performance Kirk Pepperdine: “Measure, don’t guess”11 Zdeněk Troníček: Java Performance
  12. 12. User Experience User experience is subjective Focus on improving the user experience User experience can be improved by o optimization o tuning o other means (e.g. a progress bar)12 Zdeněk Troníček: Java Performance
  13. 13. Demo: User Experience 1. Run the project in demo/Introduction/UserExperience 2. Compare your perception of operation time if the operation is being performed with and without a progress bar13 Zdeněk Troníček: Java Performance
  14. 14. Performance Tips Do not trust them Do not use them unless they are justified by measurement Beware of premature optimization14 Zdeněk Troníček: Java Performance
  15. 15. Premature Optimization D. E. Knuth (1974): “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.“ M. A. Jackson (1975): “We follow two rules in the matter of optimizations: Rule 1. Dont do it. Rule 2 (for experts only). Dont do it yet - that is, not until you have a perfectly clear and unoptimized solution.“ Why is premature optimization bad? It seldom has any effect on performance It usually worsens readability15 Zdeněk Troníček: Java Performance
  16. 16. Premature Optimization (cont.) Let’s assume that findMax is O( m )… for ( int i : values ) { if ( i > findMax ( anotherValues ) ) { … } } O( m * n ) vs. O( m + n ) int max = findMax ( anotherValues ); for ( int i : values ) { if ( i > max ) { … } }16 Zdeněk Troníček: Java Performance
  17. 17. Premature Optimization (cont.) for ( int i = 0, n = p.size(); i < n; i++ ) { … } vs. for ( int i = 0; i < p.size(); i++ ) { … } This is okay if size() is O( 1 )17 Zdeněk Troníček: Java Performance
  18. 18. How to Code in Java? Focus on architecture Use efficient technologies and frameworks Use sound design principles and patterns Use efficient algorithms Prefer readable and understandable code Brian Goetz: “Write dumb code”18 Zdeněk Troníček: Java Performance
  19. 19. Performance Tuning The goal of performance tuning is better (subjective) performance of application How to arrange it? o Optimize the application o Increase hardware utilization o Tune JVM (garbage collector, JIT, …) o Use other means (progress bar, …) o Use more efficient hardware19 Zdeněk Troníček: Java Performance
  20. 20. Performance Tuning (cont.) Performance problem can be in Hardware o CPU o Memory o I/O (disk, network) JVM configuration Application20 Zdeněk Troníček: Java Performance
  21. 21. Hardware Monitoring Tools Utilization of CPU User • Task Manager space • vmstat Kernel space Utilization of memory Physical memory21 Zdeněk Troníček: Java Performance
  22. 22. Hardware Monitoring (cont.) Utilization of network Utilization of disk22 Zdeněk Troníček: Java Performance
  23. 23. vmstat procs procs memory page disk faults cpu r bw swap free re mf pi po fr de sr s0 s1 s2 s3 in sy cs us sy id 00 0 11456 4120 1 41 19 1 3 0 2 0 4 0 0 48 112 130 4 14 82 00 1 10132 4280 0 4 44 0 0 0 0 0 23 0 0 211 230 144 3 35 62 00 1 10132 4616 0 0 20 0 0 0 0 0 19 0 0 150 172 146 3 33 64 00 1 10132 5292 0 0 9 0 0 0 0 0 21 0 0 165 105 130 1 21 78 r = runnable threads in run queue b = blocked threads w = runnable but swapped23 Zdeněk Troníček: Java Performance
  24. 24. vmstat memory procs memory page disk faults cpu r bw swap free re mf pi po fr de sr s0 s1 s2 s3 in sy cs us sy id 00 0 11456 4120 1 41 19 1 3 0 2 0 4 0 0 48 112 130 4 14 82 00 1 10132 4280 0 4 44 0 0 0 0 0 23 0 0 211 230 144 3 35 62 00 1 10132 4616 0 0 20 0 0 0 0 0 19 0 0 150 172 146 3 33 64 00 1 10132 5292 0 0 9 0 0 0 0 0 21 0 0 165 105 130 1 21 78 swap = swap space available (in kilobytes) free = memory available (in kilobytes)24 Zdeněk Troníček: Java Performance
  25. 25. vmstat page procs memory page disk faults cpu r bw swap free re mf pi po fr de sr s0 s1 s2 s3 in sy cs us sy id 00 0 11456 4120 1 41 19 1 3 0 2 0 4 0 0 48 112 130 4 14 82 00 1 10132 4280 0 4 44 0 0 0 0 0 23 0 0 211 230 144 3 35 62 00 1 10132 4616 0 0 20 0 0 0 0 0 19 0 0 150 172 146 3 33 64 00 1 10132 5292 0 0 9 0 0 0 0 0 21 0 0 165 105 130 1 21 78 pi = kilobytes paged in po = kilobytes paged out25 Zdeněk Troníček: Java Performance
  26. 26. vmstat disk procs memory page disk faults cpu r bw swap free re mf pi po fr de sr s0 s1 s2 s3 in sy cs us sy id 00 0 11456 4120 1 41 19 1 3 0 2 0 4 0 0 48 112 130 4 14 82 00 1 10132 4280 0 4 44 0 0 0 0 0 23 0 0 211 230 144 3 35 62 00 1 10132 4616 0 0 20 0 0 0 0 0 19 0 0 150 172 146 3 33 64 00 1 10132 5292 0 0 9 0 0 0 0 0 21 0 0 165 105 130 1 21 78 number of disk operations per second26 Zdeněk Troníček: Java Performance
  27. 27. vmstat faults procs memory page disk faults cpu r bw swap free re mf pi po fr de sr s0 s1 s2 s3 in sy cs us sy id 00 0 11456 4120 1 41 19 1 3 0 2 0 4 0 0 48 112 130 4 14 82 00 1 10132 4280 0 4 44 0 0 0 0 0 23 0 0 211 230 144 3 35 62 00 1 10132 4616 0 0 20 0 0 0 0 0 19 0 0 150 172 146 3 33 64 00 1 10132 5292 0 0 9 0 0 0 0 0 21 0 0 165 105 130 1 21 78 in = device interrupts sy = system calls cs = context switches27 Zdeněk Troníček: Java Performance
  28. 28. vmstat cpu procs memory page disk faults cpu r bw swap free re mf pi po fr de sr s0 s1 s2 s3 in sy cs us sy id 00 0 11456 4120 1 41 19 1 3 0 2 0 4 0 0 48 112 130 4 14 82 00 1 10132 4280 0 4 44 0 0 0 0 0 23 0 0 211 230 144 3 35 62 00 1 10132 4616 0 0 20 0 0 0 0 0 19 0 0 150 172 146 3 33 64 00 1 10132 5292 0 0 9 0 0 0 0 0 21 0 0 165 105 130 1 21 78 us = user time sy = system time id = idle time28 Zdeněk Troníček: Java Performance
  29. 29. Performance Bounds Performance can be bound by CPU Memory I/O (disk or network) Locks These bounds manifest themselves in different ways29 Zdeněk Troníček: Java Performance
  30. 30. Performance Patterns Manifestation Possible cause CPU utilization in user space Application or JVM CPU utilization in kernel space Bad interaction with operating system Locks Idle CPU Locks Small thread pool Small connection pool Slow I/O30 Zdeněk Troníček: Java Performance
  31. 31. CPU Utilization in User Space Utilization of CPU User space Kernel space Observation Next step CPU utilization is near 100% Monitoring garbage collector and mostly in user space31 Zdeněk Troníček: Java Performance
  32. 32. GC Monitoring (Visual VM) Observation Next step GC activity is < 5% of CPU CPU profiling GC activity is > 10% of CPU Tuning garbage collector32 Zdeněk Troníček: Java Performance
  33. 33. CPU Utilization in Kernel Space Utilization of CPU User space Kernel space Observation Next step CPU utilization in kernel space CPU profiling is more than 10% or more than ½ of total time33 Zdeněk Troníček: Java Performance
  34. 34. Idle CPU Utilization of CPU Observation Next step CPU is not fully utilized Monitoring threads34 Zdeněk Troníček: Java Performance
  35. 35. CPU Profiling Methods with largest time are candidates for optimization35 Zdeněk Troníček: Java Performance
  36. 36. Profiler Sampling – the profiler agent periodically samples the stack of all running threads or periodically polls for class histogram Instrumentation – the profiler agent instruments class files so that it can, for example, record method invocations36 Zdeněk Troníček: Java Performance
  37. 37. Instrumentation before after public void m1() { public void m1() { m2(); long start = currentTime(); m3(); try { } m2(); m3(); } finally { long time = currentTime() – start; log( "m1", time ); } }37 Zdeněk Troníček: Java Performance
  38. 38. Instrumentation: systematic error void a() { while (b()) { c(); … } } boolean b() { for (…) { d(); … } return … } void c() { … } void d() { e(); f(); } void e() { … } void f() { … } Even if we were able to subtract the overhead from the measurements, the code was changed and JVM would behave differently38 Zdeněk Troníček: Java Performance
  39. 39. Sampling vs. Instrumentation Sampling Instrumentation Code modification None Byte-code is modified (JVM does not work the same way) Overhead Only when sampling Large overhead at byte-code modification, small constant overhead while running Accuracy Results are analyzed Produces systematic error that statistically, error ratio is is caused by embedded small and decreasing instructions Repeatability Different session might Different session produces produce different results similar results Recommendation: start with sampling to get the first idea39 Zdeněk Troníček: Java Performance
  40. 40. Monitoring Threads • Sleeping: Thread.sleep(…) • Wait: Object.wait() • Monitor: trying to enter a guarded section40 Zdeněk Troníček: Java Performance
  41. 41. Demo: Visual VM 1. Run Visual VM from jdk/bin 2. Open some application in Visual VM 3. Monitor the application41 Zdeněk Troníček: Java Performance
  42. 42. Lab: Monitoring Classify programs in labs/lab1 into four categories: CPU utilized by application CPU utilized by garbage collector CPU utilized by operating system Idle CPU42 Zdeněk Troníček: Java Performance
  43. 43. Conclusion Performance improvements Premature optimization Hardware monitoring Performance patterns43 Zdeněk Troníček: Java Performance
  44. 44. Questions & Answers Next module: Java Virtual Machine44 Zdeněk Troníček: Java Performance
  45. 45. 2 Java Virtual Machine45 Zdeněk Troníček: Java Performance
  46. 46. Module Content Java bytecode Java disassembler Just-in-Time (JIT) compiler Micro-benchmark46 Zdeněk Troníček: Java Performance
  47. 47. Java Virtual Machine (JVM) javac A.java A.class bytecode Java Virtual Machine Bytecode Interpret Just-in-time or interpreter compile compiler47 Zdeněk Troníček: Java Performance
  48. 48. Java Bytecode Operand stack Local variables int x = 42; x bipush 42 istore_0 Instructions are typed: i = int, d = double, … bipush 42 x istore 0 x 42 4248 Zdeněk Troníček: Java Performance
  49. 49. Java Bytecode (cont.) x int y = x + 1; iload_0 42 iload_0 0x1a y iconst_1 iload 0 iadd 0x15 istore_1 0x00 iload 0 iconst_1 x x 42 iadd 42 y 1 y 43 4249 Zdeněk Troníček: Java Performance
  50. 50. Class File Structure Magic number (0xCAFEBABE, 4 bytes) Minor and major versions of the class file (2 bytes, 2 bytes) Constant pool Access flags (2 bytes) The class name (2 bytes) The superclass name (2 bytes) Interfaces Fields Methods Attributes50 Zdeněk Troníček: Java Performance
  51. 51. Constant Pool Constant pool StringBuilder sb = Index Type Value new StringBuilder(); 1 Class #8 sb.append( "tortilla" ); 2 Method #1.#7 3 String #9 4 Method #1.#12 new #1 5 Asciz <init> dup 6 Asciz ()V invokespecial #2 7 NameAndType #5:#6 astore_1 8 Asciz java/lang/StringBuilder ldc #3 9 Asciz tortilla invokevirtual #4 … … …51 Zdeněk Troníček: Java Performance
  52. 52. Bytecode Descriptors Java type Bytecode descriptor Java type Bytecode descriptor int I boolean Z long J String Ljava/lang/String; float F int[] [I double D int[][] [[I char C void V Method Bytecode descriptor int m1() p1/A.m1:()I void m2( int x ) p1/A.m2:(I)V String m3( int x, int y ) p1/A.m3:(II)Ljava/lang/String; void m4( String s ) p1/A.m4:( Ljava/lang/String; )V52 Zdeněk Troníček: Java Performance
  53. 53. Java Disassembler javap <options> <classes> -c Disassemble the code -public Show only public classes and members -private Show all classes and members -verbose Print stack size, number of locals and arguments for methods53 Zdeněk Troníček: Java Performance
  54. 54. Lab: String Concatenation 1. Open the project in labs/lab2/StringConcatenation in NetBeans 2. Get familiar with sources 3. Compile the project 4. Use disassembler to compare the + operator and the StringBuilder class54 Zdeněk Troníček: Java Performance
  55. 55. Evaluation String concat( String s1, String s2 ) { return s1 + " " + s2; } These two compile the String concat( String s1, String s2 ) { same way return new StringBuilder().append( s1 ).append( " " ).append( s2 ).toString(); }55 Zdeněk Troníček: Java Performance
  56. 56. Evaluation (cont.) String concatNames( String... names ) { String s = ""; for ( String n : names ) { s += n + " "; } return s; } These two compile the String concatNames( String... names ) { same way String s = ""; for ( String n : names ) { s = new StringBuilder().append( s ).append( n ).append( " " ).toString(); } return s; }56 Zdeněk Troníček: Java Performance
  57. 57. Discussion What does the result say about performance? Should we use the + operator or StringBuilder for string concatenation?57 Zdeněk Troníček: Java Performance
  58. 58. Measuring Time System.currentTimeMillis() – returns the number of milliseconds since January 1, 1970 System.nanoTime() – returns the number of nanoseconds since some fixed but arbitrary time interface ThreadMXBean o long getCurrentThreadCpuTime() o long getThreadCpuTime( long id )58 Zdeněk Troníček: Java Performance
  59. 59. How to measure? long start = System.nanoTime(); // some code long end = System.nanoTime(); long time = end – start; Both currentTimeMillis() and nanoTime() are implemented through JNI and their costs vary from system to system59 Zdeněk Troníček: Java Performance
  60. 60. Resolution resolution of System.currentTimeMillis() is 10 ms on Windows NT, Windows 2000 15.6 ms on Windows XP and Windows 7 1 ms on Linux 2.6, Solaris, and other Unix’s resolution of System.nanoTime() depends on the operating system should not be worse than System.currentTimeMillis() usually is in microseconds range60 Zdeněk Troníček: Java Performance
  61. 61. ThreadMXBean getCurrentThreadCpuTime() returns the total CPU time for the current thread in nanoseconds ThreadMXBean bean = ManagementFactory.getThreadMXBean(); long time1 = bean.getCurrentThreadCpuTime(); // some code long time2 = bean.getCurrentThreadCpuTime();61 Zdeněk Troníček: Java Performance
  62. 62. Sun’s HotSpot JVM Sun’s HotSpot JVM combines interpretation and Just-in-time (JIT) compilation Initially the bytecode is interpreted and JVM monitors which parts are executed frequently After reaching a compile threshold, “hot” methods and loops are compiled into native code JIT compiler uses the execution statistics collected by JVM during the interpretation to perform optimization62 Zdeněk Troníček: Java Performance
  63. 63. Sun’s HotSpot JVM (cont.) Two modes in JDK 6 HotSpot client, -client o default on Windows o focused on reduced startup time and footprint size o simple, quick optimizations HotSpot server, -server o focused on long running applications o sophisticated optimizations If not specified, ergonomics will choose automatically based on the hardware and software configuration63 Zdeněk Troníček: Java Performance
  64. 64. Hotspot JIT Compiler You can change the default JIT compiler behavior using the following switches (don’t use them except for experiments!): -Xint – don’t compile, interpret only -Xcomp – compile everything, don’t interpret You can monitor the JIT compiler: -XX:+PrintCompilation -XX:+LogCompilation CompilationMXBean64 Zdeněk Troníček: Java Performance
  65. 65. -XX:+PrintCompilation java -XX:+PrintCompilation … 1 java.lang.String::charAt (33 bytes) 2 java.lang.String::hashCode (64 bytes) [id][flags][class name::method name] [@ bci] ([size]) • id is ordinal number of compilation • flags is none or more of: o % indicates “on stack replacement” o ! indicates that method has an exception handler o * indicates that method is native o b means that compilation is not done in parallel with execution o s means that method is synchronized • bci is byte code index for an “on stack replacement” operation • size is the size of the code generated65 Zdeněk Troníček: Java Performance
  66. 66. -XX:+LogCompilation java -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation … the log file can be set by -XX:LogFile=… (default is hotspot.log) more detailed output than -XX:+PrintCompilation <task compile_id=2 method=java/lang/String hashCode ()I bytes=60 count=240 backedge_count=937 iicount=0 stamp=0.111> <task_done success=1 nmsize=176 count=240 backedge_count=937 stamp=0.111/> </task> <task compile_id=3 method=java/lang/String indexOf (II)I bytes=151 count=92 backedge_count=864 iicount=0 stamp=0.120> <task_done success=1 nmsize=492 count=92 backedge_count=864 stamp=0.121/> </task>66 Zdeněk Troníček: Java Performance
  67. 67. CompilationMXBean getTotalCompilationTime() returns the number of milliseconds that JIT compiler spent compiling the bytecode CompilationMXBean bean = ManagementFactory.getCompilationMXBean(); long time1 = bean.getTotalCompilationTime(); // some code long time2 = bean.getTotalCompilationTime();67 Zdeněk Troníček: Java Performance
  68. 68. Demo: Compilation 1. Run the project in jdk/demo/jfc/Java2D 2. Run jconsole in jdk/bin 3. Inspect the VM summary tab 4. Inspect the java.lang.Compilation node in the Mbeans tab 5. Run the project again with the PrintCompilation flag68 Zdeněk Troníček: Java Performance
  69. 69. -XX:+PrintAssembly Must be preceded by -XX:+UnlockDiagnosticVMOptions Requires disassembler plugin More info on http://wikis.sun.com/display/HotSpotInternals/PrintAssembly69 Zdeněk Troníček: Java Performance
  70. 70. Dynamic Optimization On stack replacement Dead-code elimination Escape analysis Autobox elision Inlining Loop unrolling70 Zdeněk Troníček: Java Performance
  71. 71. On Stack Replacement JVM can replace the current code with new compiled code when executing the method public static void main( String[] args ) { int result = 0; for ( int i = 0; i < 1000000; i++ ) { for ( int j = 0; j < values.length; j++ ) { result ^= values[j]; } } … }71 Zdeněk Troníček: Java Performance
  72. 72. Dead-code Elimination Dead code can be eliminated either by javac or by JVM private static final boolean debug = false; public static void main( String[] args ) { if ( debug ) { System.out.println( "start" ); } … }72 Zdeněk Troníček: Java Performance
  73. 73. Escape Analysis An object escapes the thread that allocated it, if some other thread can see it Many objects are accessed by one thread only If an object does not escape, JVM can perform: o Thread stack allocation – fields are stored in the stack frame o Scalar replacement – scalar fields are stored in registers o Object explosion – object’s fields are allocated in different places o Eliminate synchronization o Eliminate memory read / write barriers73 Zdeněk Troníček: Java Performance
  74. 74. Autobox Elision Autoboxing is conversion from primitive type to an instance of wrapper class Under some circumstances, JIT compiler can eliminate object allocation HashMap<Integer, String> HashMap<int, String>74 Zdeněk Troníček: Java Performance
  75. 75. Inlining Inlining is an optimization that replaces a function call with the body of the function static int max( int x, int y ) { int m = max( i, j ); return x > y ? x : y; } int m = i > j ? i : j;75 Zdeněk Troníček: Java Performance
  76. 76. Loop Unrolling Loop unrolling is an optimization that optimize program’s execution speed at the expense of its size for ( int i = 0; i < 1000; i++ ) { for ( int i = 0; i < 3000; i++ ) { p[ i ] = 42; p[i] = 42; p[ i + 1 ] = 42; } p[ i + 2 ] = 42; }76 Zdeněk Troníček: Java Performance
  77. 77. Micro-benchmark Measures performance of small piece of code JIT compiler can change what we measure (warm-up the JVM) Beware of the garbage collection System.currentTimeMillis() does not have millisecond accuracy System.nanoTime() does not have nanosecond accuracy Measure several times and interpret data statistically77 Zdeněk Troníček: Java Performance
  78. 78. Demo: Inlining 1. Open the project in demo/JavaVirtualMachine/Inlining in NetBeans 2. Get familiar with the source code 3. Run the project with the PrintCompilation flag 4. Run the project again with -XX:MaxInlineSize=128 5. Compare the results78 Zdeněk Troníček: Java Performance
  79. 79. Demo: Autoboxing Performance 1. Run the project in demo/JavaVirtualMachine/AutoboxingPerformance 2. Run the project again with -XX:+PrintGC 3. Run the project again with the following settings: Initial heap size 1400 MB (-Xms1400M) Maximum heap size 1400 MB (-Xmx1400M) Young generation size 1000 MB (-Xmn1000M) 4. Run the project again with -Djava.lang.Integer.IntegerCache.high=1000 5. Compare the times [From http://www.javaspecialists.eu/archive/Issue191.html]79 Zdeněk Troníček: Java Performance
  80. 80. Lab: Micro-benchmark 1. Open the project in labs/lab2/Microbenchmark in NetBeans 2. Get familiar with the source code (the micro-benchmark measures the times of iteration through List using Iterator and using get( index )) 3. Run the project and compare the times80 Zdeněk Troníček: Java Performance
  81. 81. Evaluation 3: aload_0 4: invokeinterface #11, 1; //InterfaceMethod java/util/List.iterator:()Ljava/util/Iterator; 9: astore_2 10: aload_2 11: invokeinterface #12, 1; //InterfaceMethod java/util/Iterator.hasNext:()Z 16: ifeq 45 19: aload_2 20: invokeinterface #13, 1; //InterfaceMethod java/util/Iterator.next:()Ljava/lang/Object; 25: checkcast #14; //class java/lang/Integer 28: astore_3 … 42: goto 1081 Zdeněk Troníček: Java Performance
  82. 82. Evaluation (cont.) 3: iconst_0 4: istore_2 5: iload_2 6: aload_0 7: invokeinterface #16, 1; //InterfaceMethod java/util/List.size:()I 12: if_icmpge 45 15: aload_0 16: iload_2 17: invokeinterface #17, 2; //InterfaceMethod java/util/List.get:(I)Ljava/lang/Object; 22: checkcast #14; //class java/lang/Integer 25: astore_3 … 39: iinc 2, 1 42: goto 582 Zdeněk Troníček: Java Performance
  83. 83. Lab: Lazy Bezier 1. Run the project in labs/lab2/LazyBezier 2. Monitor the project in Visual VM 3. Find out what is keeping threads out of CPU 4. Propose code changes so that CPU is fully utilized83 Zdeněk Troníček: Java Performance
  84. 84. Lab: Lazy Textures 1. Run the project in labs/lab2/LazyTextures 2. Profile the project 3. Find the hot spot84 Zdeněk Troníček: Java Performance
  85. 85. Conclusion Java Virtual Machine (JVM) Java disassembler Just-in-Time (JIT) compiler Micro-benchmark85 Zdeněk Troníček: Java Performance
  86. 86. Questions & Answers Next module: Memory management86 Zdeněk Troníček: Java Performance
  87. 87. 3 Memory Management87 Zdeněk Troníček: Java Performance
  88. 88. Module Content Java heap Garbage collectors Monitoring garbage collector Oracle JVM Memory management tuning88 Zdeněk Troníček: Java Performance
  89. 89. Weak Generational Hypothesis Most objects die young Few references from older to younger objects exist Allocation Young generation Promotion Tenured (old) generation89 Zdeněk Troníček: Java Performance
  90. 90. Generational Algorithm Young generation – “young” objects Tenured (old) generation – “old” or large objects Permanent generation – “permanent” objects, e.g. instances of Class, Method, and Field90 Zdeněk Troníček: Java Performance
  91. 91. Garbage Collections “Stop the world” – all Java application threads are blocked for the duration of the garbage collection Minor collection: young generation Major (full) collection: all heap Usually different algorithms are used for minor and major collections91 Zdeněk Troníček: Java Performance
  92. 92. Mark & Sweep Local variables Method parameters Active threads Classes loaded by the system class loader Root references JNI references In-flight exceptions The finalizer queue Constant-pool strings Heap X X X X X X92 Zdeněk Troníček: Java Performance
  93. 93. Young Generation Three areas: Eden and two Survivor spaces (From and To) Copying collector Eden New objects are allocated in Eden From To93 Zdeněk Troníček: Java Performance
  94. 94. Minor Collection 1 2 Eden Eden X X X X From To From To 3 4 Eden Eden To From To From94 Zdeněk Troníček: Java Performance
  95. 95. Minor Collection (cont.) 5 6 Eden Eden X X X X To From From To X old generation 7 8 Eden Eden X X X X X From To From To X95 Zdeněk Troníček: Java Performance old generation
  96. 96. Minor Collection (cont.) When the Eden space is full, minor garbage collection event occurs Each object that survives a garbage collection has its age incremented Objects exceeding a defined age threshold are promoted to the tenured generation space96 Zdeněk Troníček: Java Performance
  97. 97. Minor Collection (cont.) When there is not enough room in survivor spaces, the full garbage collection event occurs Minor garbage collections are very quick compared to full garbage collections97 Zdeněk Troníček: Java Performance
  98. 98. Sizing Java Heap Tenured virtual virtual virtual From Eden To Young Perm -Xmx – maximum size of Java heap (young generation + tenured generation) -Xms – initial size of Java heap (young generation + tenured generation) -Xmn – size of young generation98 Zdeněk Troníček: Java Performance
  99. 99. Sizing Java Heap (cont.) -XX:NewSize – initial size of young generation -XX:MaxNewSize – maximum size of young generation -XX:NewRatio – ratio of young generation to tenured generation -XX:PermSize – initial size of permanent generation -XX:MaxPermSize – maximum size of permanent generation99 Zdeněk Troníček: Java Performance
  100. 100. Sizing Java Heap (cont.) For application performance we almost always set o -Xms and -Xmx o -XX:PermSize and -XX:MaxPermSize to the same values because growing the space requires full garbage collection100 Zdeněk Troníček: Java Performance
  101. 101. Demo: Visual GC 1. Run the project in jdk/demo/jfc/Java2D with -XX:+DisableExplicitGC 2. Run Visual VM in jdk/bin 3. Open the project in Visual VM and switch to the Visual GC tab 4. Observe a saw tooth pattern on the Eden space101 Zdeněk Troníček: Java Performance
  102. 102. GC Features Serial vs. Parallel Stop-the-world vs. Concurrent Copying vs. Compacting vs. Non-compacting X X X X X X Non-compacting free Compacting102 Zdeněk Troníček: Java Performance
  103. 103. GC Performance Metrics Throughput – the percentage of total time not spent in garbage collection, considered over long periods of time GC overhead – the percentage of total time spent in garbage collection Pause time – the length of time when application is stopped due to garbage collection103 Zdeněk Troníček: Java Performance
  104. 104. GC Performance Metrics (cont.) Frequency of collection – how often collection occurs within some time period Footprint – how much memory the garbage collector needs Promptness – the time between when an object becomes garbage and when it is collected104 Zdeněk Troníček: Java Performance
  105. 105. Garbage Collectors Serial collector Throughput collector Concurrent collector105 Zdeněk Troníček: Java Performance
  106. 106. Serial Collector Single threaded collector Selected by -XX:+UseSerialGC (default on client machines) Young generation: copy collector Tenured generation: mark-sweep-compact Suitable for single processor core machine106 Zdeněk Troníček: Java Performance
  107. 107. Throughput (Parallel) Collector Selected by -XX:+UseParallelGC (default on server machines) Young generation: multi-threaded copy collector Tenured generation: single-threaded mark-sweep-compact Suitable for multi-core processor systems107 Zdeněk Troníček: Java Performance
  108. 108. Serial vs. Parallel Collector serial parallel stop the world108 Zdeněk Troníček: Java Performance
  109. 109. Demo: Memory Consumer 1. Run the project in demo/MemoryManagement/MemoryConsumer with the following settings: o Initial and maximum heap size 1024 MB o Serial garbage collector 2. Run Visual VM and open the project 3. Observe the Visual GC tab109 Zdeněk Troníček: Java Performance
  110. 110. Demo: Memory Consumer (cont.) 1. Run the project in demo/MemoryManagement/MemoryConsumer with the following settings: o Initial and maximum heap size 1024 MB o Parallel garbage collector 2. Open the project in Visual VM 3. Observe the Visual GC tab110 Zdeněk Troníček: Java Performance
  111. 111. Parallel Compacting Collector Selected by -XX:+UseParallelOldGC Young generation: multi-threaded copy collector Tenured generation: multi-threaded mark-sweep-compact Suitable for multi-core processor systems The number of threads can be controlled with -XX:ParallelGCThreads Default settings ParallelGCThreads = (nps <= 8) ? nps : 3 + ((nps * 5) / 8) where nps is the number of processors111 Zdeněk Troníček: Java Performance
  112. 112. Concurrent Collector (CMS) Selected by -XX:UseConcMarkSweepGC Multi-threaded young generation collector (same as in throughput collector) Single threaded tenure generation collector that runs concurrently with Java application threads (most of the work is done concurrently with application) Does not compact memory Suitable when application responsiveness is more important than application throughput112 Zdeněk Troníček: Java Performance
  113. 113. Concurrent Collector Phases Initial mark phase: marks objects directly reachable from application Concurrent mark phase: traverses the object graph concurrently with Java application threads Remark: finds objects that were missed by the concurrent mark phase due to updates by Java application threads Concurrent sweep: collects the objects identified as unreachable during marking phases Concurrent reset: prepares for next concurrent collection113 Zdeněk Troníček: Java Performance
  114. 114. Serial vs. CMS Collector major collection: serial cms stop the initial mark world concurrent mark remark concurrent sweep114 Zdeněk Troníček: Java Performance
  115. 115. CMS Incremental Mode Selected by -XX:+CMSIncrementalMode Concurrent phases are done incrementally Collector periodically gives processor back to the application115 Zdeněk Troníček: Java Performance
  116. 116. Explicit GC Explicit GC can be disabled with -XX:+DisableExplicitGC Do not use System.gc() unless you have a strong reason (e.g. in performance tests) -XX:+DisableExplicitGC disables RMI distributed garbage collection116 Zdeněk Troníček: Java Performance
  117. 117. Lab: Serial & Parallel GC 1. Run the project in jdk/demo/jfc/Java2D with the following settings: o Initial and maximum heap size 32 MB o Young generation size 16 MB o Initial and maximum permanent generation size 16 MB o Explicit GC disabled o Serial or parallel GC 2. Measure time for 10 minor collections 3. Compare performance of the serial and parallel GCs117 Zdeněk Troníček: Java Performance
  118. 118. Finalizers Finalizers are not destructors Object cannot be collected until the finalizer is executed Do not use finalizers to manage resources other than memory Finalizers can be a safety net (but try to limit their use) If you use finalizers, keep the work being done in them as small as possible118 Zdeněk Troníček: Java Performance
  119. 119. Ergonomics Evaluates the system and chooses defaults for the HotSpot JVM Uses definition of “server class machine”: o 2 or more processor cores and 2 or more GB of physical memory o special case: 32-bit Windows machine are never server class o special case: 64-bit machine are always server class If a machine is considered server class, the JVM runs in server mode Otherwise, the JVM runs in client mode119 Zdeněk Troníček: Java Performance
  120. 120. Server Mode Server JIT compiler Throughput collector Initial heap size is 1/64th of the physical memory up to 1 GB (i.e. the minimum initial heap size is 32 MB) Maximum heap size is 1/4th of the physical memory up to 1 GB120 Zdeněk Troníček: Java Performance
  121. 121. Client Mode Client JIT compiler Serial garbage collector Initial heap size is 4 MB Maximum heap size is 64 MB121 Zdeněk Troníček: Java Performance
  122. 122. Parallel Collector Tuning -XX:MaxGCPauseMillis=n – specifies maximum desired pause time -XX:GCTimeRatio=n – specifies desired throughput The ratio of garbage collection time to application time is 1 / (1 + n) Example: n=19 means a goal of 5 % of the total time for GC JVM will automatically and dynamically modify the heap sizes in order to achieve the desired goals122 Zdeněk Troníček: Java Performance
  123. 123. GC Monitoring -verbose:gc, -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps, -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintGCApplicationStoppedTime Visual VM, Visual GC jconsole123 Zdeněk Troníček: Java Performance
  124. 124. -verbose:gc, +XX:+PrintGC Occupancy Collection after time collection Minor collection [GC 21088K->20974K(30524K), 0.0048330 secs] Occupancy Generation before capacity collection Major collection [Full GC 29089K->29089K(37692K), 0.0061875 secs]124 Zdeněk Troníček: Java Performance
  125. 125. -XX:+PrintGCDetails Minor collection [GC [DefNew: 4415K->512K(4928K), 0.0036650 secs] 4415K->4285K(15872K), 0.0038643 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] Young generation [GC [<collector>: <starting occupancy1> -> <ending occupancy1> (generation size), <pause time1> secs] <starting occupancy3> -> <ending occupancy3>(heap size), <pause time3> secs] Entire heap125 Zdeněk Troníček: Java Performance
  126. 126. -XX:+PrintGCDetails (cont.) Minor collection with promotion [GC [DefNew: 4906K->503K(4928K), 0.0031987 secs] [Tenured: 12565K->12591K(12608K), 0.0066095 secs] 13080K->13068K(17536K), [Perm : 44K->44K(12288K)], 0.0102299 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] Major collection [Full GC [Tenured: 174774K->174774K(174784K), 0.0226224 secs] 253429K->252856K(253440K), [Perm : 43K->43K(12288K)], 0.0228810 secs] [Times: user=0.03 sys=0.00, real=0.02 secs]126 Zdeněk Troníček: Java Performance
  127. 127. -XX:+PrintGCTimeStamps java -XX:+PrintGC -XX:+PrintGCTimeStamps … Time when Duration of collection started collection 11.875: [GC 12178K->7777K(17588K), 0.0039663 secs] 15.855: [Full GC 8943K->8643K(21312K), 0.0620489 secs]127 Zdeněk Troníček: Java Performance
  128. 128. -XX:+PrintGCApplicationStoppedTime java -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime … 12.777: [Full GC 8217K->6877K(16620K), 0.0466502 secs] Total time for which application threads were stopped: 0.0477718 seconds 13.456: [Full GC 7569K->7280K(16620K), 0.0520881 secs] Total time for which application threads were stopped: 0.0532108 seconds Total time for which application threads were stopped: 0.0001046 seconds Total time for which application threads were stopped: 0.0000757 seconds Application was stopped but no collection started128 Zdeněk Troníček: Java Performance
  129. 129. Common Problems Young generation too small Old generation too small Perm generation too small129 Zdeněk Troníček: Java Performance
  130. 130. Young Generation Too Small Young generation is filled up before temporary objects get unreachable Temporary objects do not die in young generation and are promoted to tenured generation Consequences: garbage collection takes more time due to excessive copying, major collections are frequent130 Zdeněk Troníček: Java Performance
  131. 131. Young Generation Too Small (cont.) Young generation: 4096 KB Eden: 3328 KB Capacity Survivor 0: 384 KB of young Survivor 1: 384 KB generation [GC [DefNew: 3328K->384K(3712K), 0.0045470 secs] 5376K->4416K(32384K), 0.0047182 secs] Change of occupancy in young generation: 3328 – 384 = 2944 KB Change of occupancy over entire heap: 5376 – 4416 = 960 KB Size of promoted objects: 2944 – 960 = 1984 KB Ratio of promoted objects: 1984 / 2944 = 67 %131 Zdeněk Troníček: Java Performance
  132. 132. Old Generation Too Small Manifestation: major collections are frequent and occur at the same time as minor collections Before a minor collection starts, garbage collector checks, if there is enough room for objects that will probably be promoted to the old generation If there is not enough room, a major collection happens132 Zdeněk Troníček: Java Performance
  133. 133. Demo: Old Generation Too Small Run the project in jdk/demo/jfc/Java2D with the following settings: o Initial and maximum heap size 12 MB o Initial and maximum size of young generation 4 MB o Verbose GC o Explicit GC disabled Observe the output133 Zdeněk Troníček: Java Performance
  134. 134. Perm Generation Too Small You get an exception: java.lang.OutOfMemoryError: PermGen space Solution: increase the size of permanent generation134 Zdeněk Troníček: Java Performance
  135. 135. Demo: Perm Generation Too Small Run the project in jdk/demo/jfc/Java2D with the following settings: o Initial and maximum size of permanent generation 3 MB Use the application until OutOfMemoryError appears135 Zdeněk Troníček: Java Performance
  136. 136. GC Tuning Are minor collection pauses too long? Try parallel collector. Are major collection pauses too long? Try concurrent collector.136 Zdeněk Troníček: Java Performance
  137. 137. Heap Size Settings total size committed vs. virtual tenured to young ratio from to Eden virtual tenured virtual perm virtual Eden to survivor space ratio permanent generation137 Zdeněk Troníček: Java Performance
  138. 138. Memory Leak Also known as unintentional object retention (the object that is not needed anymore and should be unreachable is still reachable and cannot be collected)138 Zdeněk Troníček: Java Performance
  139. 139. Basic Types of Memory Leaks A single large object that is unintentionally held in memory Small objects that are accumulated over time (e.g. they are being added to a collection)139 Zdeněk Troníček: Java Performance
  140. 140. Finding Memory Leaks Generations in memory profiler Heap snapshots comparison Heap dumps140 Zdeněk Troníček: Java Performance
  141. 141. Generations in Memory Profiler Object age – the number of garbage collections that the object survived Generations – the number of different object ages for objects of given class (counted over all alive objects on the heap)141 Zdeněk Troníček: Java Performance
  142. 142. Demo: Memory Leak 1. Run the project in demo/MemoryManagement/MemoryLeak 2. Run Visual VM and open the project 3. Monitor the project142 Zdeněk Troníček: Java Performance
  143. 143. Heap Snapshots Comparison Take memory snapshots in different points of application (it is advisable to run garbage collector before) Compare the snapshots (e.g. in NetBeans or in Visual VM) Search for leaking objects within the objects that were identified as new143 Zdeněk Troníček: Java Performance
  144. 144. Demo: Lucky Numbers 1. Open the project in demo/MemoryManagement/LuckyNumbers in NetBeans 2. Profile the project in NetBeans 3. Use memory snapshots comparison to identify the memory leak144 Zdeněk Troníček: Java Performance
  145. 145. Demo: Stop-time Game 1. Open the project in demo/MemoryManagement/StopTimeGame in NetBeans 2. Profile the project in NetBeans 3. Find and explain the memory leak145 Zdeněk Troníček: Java Performance
  146. 146. Lab: Nasty Ellipses 1. Open the project in labs/lab3/NastyEllipses in NetBeans 2. Run the project 3. Check if there is a memory leak and if yes find the cause146 Zdeněk Troníček: Java Performance
  147. 147. Lab: Slow Baker 1. Open the project in labs/lab3/SlowBaker in NetBeans 2. Run the project 3. Check if there is a memory leak and if yes find the cause147 Zdeněk Troníček: Java Performance
  148. 148. References (java.lang.ref) Strong reference – a common Java reference Car c = new Car( id ); Weak reference – a reference that does not prevent garbage collector to collect the object which the reference refers to WeakReference<Car> p = new WeakReference<Car>( c ); Soft reference – like a weak reference, except that the garbage collector is less eager to collect the object which the reference refers to SoftReference<Car> p = new SoftReference<Car>( c ); Phantom reference – a reference that is enqueued after the garbage collector determines that the object which the reference refers to may be reclaimed ReferenceQueue<Car> queue = new ReferenceQueue<Car>(); PhantomReference<Car> p = new PhantomReference<Car>( c, queue );148 Zdeněk Troníček: Java Performance
  149. 149. Object Reachability Strong reachability – an object is strongly reachable if it can be reached without traversing any reference object Soft reachability – an object is softly reachable if it is not strongly reachable and can be reached by traversing a soft reference Weak reachability – an object is weakly reachable if it is neither strongly nor softly reachable and can be reached by traversing a weak reference Phantom reachability – an object is phantom reachable if it is neither strongly, softly, nor weakly reachable, it has been finalized and some phantom reference refers to it149 Zdeněk Troníček: Java Performance
  150. 150. Object Reachability (cont.) created strongly unreachable reachable reclaimed finalized collected150 Zdeněk Troníček: Java Performance
  151. 151. Object Reachability (cont.) softly reachable created strongly unreachable reachable weakly reachable reclaimed phantom finalized collected reachable151 Zdeněk Troníček: Java Performance
  152. 152. Demo: References 1. Open the project in demo/MemoryManagement/References in NetBeans 2. Run the project 3. Use jmap to create heap dump 4. Use VisualVM to run garbage collector 5. Create another heap dump 6. Open heap dumps in VisualVM and compare them152 Zdeněk Troníček: Java Performance
  153. 153. String Immutable sequence of characters Compiler places string literals into the constant pool Strings in constant pool are never garbage collected String.intern() returns the canonical representation Interned strings are garbage collected153 Zdeněk Troníček: Java Performance
  154. 154. WeakHashMap A hash-table based Map implementation with weak keys When a key is no longer in use, the entry is automatically removed by the garbage collector154 Zdeněk Troníček: Java Performance
  155. 155. Demo: WeakHashMap 1. Open the project in demo/MemoryManagement/WeakHashMap in NetBeans 2. Open Demo1 and determine the output 3. Run Demo1 and verify your answer 4. Open Demo2 and determine the output 5. Run Demo2 and verify you answer155 Zdeněk Troníček: Java Performance
  156. 156. Lab: Web Browser 1. Open the project in labs/lab3/WebBrowser in NetBeans 2. Run the project 3. Find the memory leak and its cause156 Zdeněk Troníček: Java Performance
  157. 157. OutOfMemoryError Additional message: Java heap space PermGen space Requested array size exceeds VM limit157 Zdeněk Troníček: Java Performance
  158. 158. Heap Dump We can dump all reachable objects to a file by Visual VM jmap -XX:+HeapDumpOnOutOfMemoryError The file can be eventually analyzed158 Zdeněk Troníček: Java Performance
  159. 159. Lab: Hungry Duke 1. Run HungryDuke.jar with the following settings: Maximum heap size 32 MB Flag HeapDumpOnOutOfMemoryError 2. Wait for OutOfMemoryError 3. Analyze the heap dump in Visual VM 4. Find the cause of memory leak159 Zdeněk Troníček: Java Performance
  160. 160. Questions & Answers Next module: Threads and Synchronization160 Zdeněk Troníček: Java Performance
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×