JavaStudy Network Daehyub Cho JVM  [Java Virtual Machine] Performance Tuning
AGENDA Basic concept of JVM Tuning 1 Hotspot compiler 2 Threading Model 3 Memory Model 4
Basic Concept of JVM Tuning Basic concept of  JVM Tuning
Basic of performance tuning Decide what performance level is “good enough” Test & measurement  Scenario based Stress Tool (Load Runner) Profiling Tool (J probe, etc)  Profile application to find bottlenecks Tuning Application * Middleware [WAS] OS JVM Return to Step 2  [feedback]
JVM Tuning Improve performance about 10~20% Find appropriate parameter for your application Hotspot compile option Thread model option  * GC and memory related option  ** Changing parameter is very dangerous action Need more test and feed back Ref spec.org
Hotspot Compiler Hotspot compiler
JVM Layout Hotspot from JDK 1.3 VM Client Compiler Server Compiler Runtime GC Interpreter Threading & Locking … . JVM Hotspot Compiler
Hotspot compiler JIT (Just-In-Time Compiler) Compile byte code to native code Compile as rules of optimization (Not thinking) At execution/installation Compile byte code to native code Hotspot Compile byte code to native code ‘ Thinking’ to trying find where optimization can take place Adaptive Optimizing in runtime
Hotspot Detection Hotspot detection Method Inlining Dynamic Deoptimization
Hotspot Detection and Method Inlining Literal constants are folded String concatenation is sometimes folded Constant fields are inlined int foo = 9* 10;    int foo = 90; String foo = “Hello “ + (9*10);    String foo = “Hello 90”; public class A{ public static final VALUE=99; } public class B{ static int VALUE2=A.VALUE; } public class B{ static int VALUE2=99; }  When after compiling class B
Hotspot detection / Method Inlining Dead code branches are eliminated public class A{ static final boolean DEBUG = false; public void methodA() if(DEBUG) System.out.println(“DEBUG MODE); System.out.println(“Say Hello”); }// method A }// class A ↓ public class A{ static final boolean DEBUG = false; public void methodA() System.out.println(“Say Hello”); }// method A }// class A
Hotspot Client compiler Java Option : -client Focused on Simple & Fast start up 3 Phase compiler HIR (High Level Intermediate Representation) LIR (Low Level Intermediate Representation) Machine code It focuses on local code quality and does very few global optimizations since those are often the most expensive in terms of compile time It has for inlining any function that has no exception handlers or synchronization and also supports deoptimization for debugging and inlining
Hotspot Server compiler Java Option : -server Focused on optimization SSA (Static Single Assignment)-based IR
Hotspot compiler Option Hotspot compile option -XX:MaxInlineSize=<size> Integer specifying maximum number of bytecode instructions in a method which gets inlined. -XX:FreqInlineSize=<size> Integer specifying maximum number of bytecode instructions in a frequently executed method which gets inlined. -Xint Interpreter only (no JIT compilation) -XX:+PrintCompilation
Threading Threading model
Threading Model Thread Model Java is multi threaded programming language Native thread model from JDK 1.2 Thread mapping (M:N and 1:1) Thread synchronization Java Application Java Thread Operating System Thread Handling Thread Scheduling Lock Mgmt (synchronization) JVM
Solaris M:N Thread Model Java Application Java Thread JVM Solaris OS OS Kernel Solaris Thread LWP Kernel Thread
Solaris M:N Thread Model Solaris M:N Thread Model Thread based synchronization LWP based synchronization Default -XX:-UseLWPSynchronization JDK1.4 -XX:+UseLWPSynchronization Default JDK1.3 Default N/A JDK1.2 LWP based sync Thread based sync
Solaris 1:1 Thread Model Java Application Java Thread JVM Solaris OS OS Kernel Solaris Thread LWP Kernel Thread
Solaris 1:1 Thread Model Solaris 1:1 Thread Model Bound thread Alternate Libthread ※  In Solaris 9, alternate lib thread is default, do not add /usr/lib/lwp to LD_LIBRARY_PATH export LD_LIBRARY_PATH=/usr/lib/lwp -XX:+UseBoundThreads JDK1.4 export LD_LIBRARY_PATH=/usr/lib/lwp -XX:+UseBoundThreads JDK1.3 export LD_LIBRARY_PATH=/usr/lib/lwp N/A JDK1.2 Alternate Libthread* Bound Thread
JVM Performance Test on Solaris < Solaris 8 with JVM 1.3 > See next page graph!! Architecture Cpus Threads Model %diff in throughput (against Standard Model) Sparc 30 400/2000 Standard --- Sparc 30 400/2000 LWP Synchronization  215%/800% Sparc 30 400/2000 Bound Threads  -10%/-80% Sparc 30 400/2000 Alternate One-to-one 275%/900% Sparc 4 400/2000 Standard --- Sparc 4 400/2000 LWP Synchronization  30%/60% Sparc 4 400/2000 Bound Threads  -5%/-45% Sparc 4 400/2000 Alternate One-to-one 30%/50% Sparc 2 400/2000 Standard --- Sparc 2 400/2000 LWP Synchronization  0%/25% Sparc 2 400/2000 Bound Threads  -30%/-40% Sparc 2 400/2000 Alternate One-to-one -10%/0% Intel 4 400/2000 Standard --- Intel 4 400/2000 LWP Synchronization  25%/60% Intel 4 400/2000 Bound Threads  0%/-10% Intel 4 400/2000 Alternate One-to-one 20%/60% Intel 2 400/2000 Standard --- Intel 2 400/2000 LWP Synchronization  15%/45% Intel 2 400/2000 Bound Threads  -10%/-15% Intel 2 400/2000 Alternate One-to-one 15%/35%
JVM Performance Test on Solaris Performance Test Result Graph
Memory Tuning Memory Model
Memory Tuning Garbage Collection JVM Memory Layout Garbage Collection Model Server VM and Client VM Garbage Collection Measurement & Analysis Tuning Garbage Collection
Generational Garbage Collection
JVM Memory Layout New/Young – Recently created object Old – Long lived object Perm – JVM classes and methods Eden Old Perm New/Young Old Used in Application JVM Total Heap Size SS1 SS2
Garbage Collection Garbage Collection Collecting unused java object Cleaning memory Minor GC Collection memory in New/Young generation Major GC (Full GC) Collection memory in Old generation
Minor GC Minor Collection New/Young Generation Copy and Scavenge  Very Fast
Minor GC Eden SS1 SS1 Copy live objects to  Survivor area New Object Garbage Lived Object 1 st  Minor GC Old Old Old
Minor GC 2 nd  Minor GC Old Old Old New Object Garbage Lived Object
Minor GC OLD 3 rd  Minor GC Objects moved old space when they become tenured New Object Garbage Lived Object
Major GC Major Collection Old Generation Mark and compact Slow 1 st   –  goes through the entire heap , marking unreachable objects 2 nd  – unreachable objects are compacted
Major GC Eden SS1 SS2 Eden SS1 SS2 Mark the objects to be removed Eden SS1 SS2 Compact the objects to be removed
Server option versus Client option -X:NewRatio=2 (1.3) , -Xmn128m(1.4), -XX:NewSize=<size> -XX:MaxNewSize=<size>
GC Tuning Parameter Memory Tuning Parameter Perm Size : -XX:MaxPermSize=64m Total Heap Size : -ms512m –mx 512m New Size -XX:NewRatio=2    Old/New Size -XX:NewSize=128m -Xmn128m (JDK 1.4) Survivor Size : -XX:SurvivorRatio=64 (eden/survivor) Heap Ratio -XX:MaxHeapFreeRatio=70 -XX:MinHeapFreeRatio=40 Suvivor Ratio -XX:TargetSurvivorRatio=50
Support for –XX Option Options that begin with  -X  are nonstandard (not guaranteed to be supported on all VM implementations), and are subject to change without notice in subsequent releases of the Java 2 SDK.  Because the  -XX  options have specific system requirements for correct operation and may require privileged access to system configuration parameters, they are not recommended for casual use. These options are also subject to change without notice .
Garbage Collection Model New type of GC Default Collector Parallel GC for young generation -  JDK 1.4 Concurrent GC for old generation -  JDK 1.4   Incremental Low Pause Collector (Train GC)
Parallel GC Parallel GC Improve performance of GC For young generation (Minor GC) More than 4CPU and 256MB Physical memory required threads time gc threads Default GC Parallel GC Young  Generation
Parallel GC Two Parallel Collectors Low-pause : -XX:+UseParNewGC Near real-time or pause dependent application Works with  Mark and compact collector Concurrent old area collector Throughput : -XX:+UseParallelGC Enterprise or throughput oriented application Works only with the mark and compact collector
Parallel GC Throughput Collector – XX:+UseParallelGC -XX:ParallelGCThreads=<desired number> -XX:+UseAdaptiveSizePolicy  Adaptive resizing of the young generation
Parallel GC Throughput Collector AggressiveHeap Enabled By-XX:+AggresiveHeap Inspect machine resources and  attempts to set various parameters  to be optimal for long-running,memory-intensive jobs Useful in more than 4 CPU machine, more than 256M Useful in Server Application Do not use with –ms and –mx  Example) HP Itanium 1.4.2  java -XX:+ServerApp -XX:+AggresiveHeap -Xmn3400m -spec.jbb.JBBmain -propfile Test1
Concurrent GC Concurrent GC Reduce pause time to collect   Old Generation For old generation (Full GC) Enabled by  - XX:+UseConcMarkSweepGC threads time gc threads Default GC Concurrent GC Old Generation
Incremental GC Incremental GC Enabled by –XIncgc (from JDK 1.3) Collect Old generation whenever collect young generation Reduce pause time for collect old generation Disadvantage More frequently young generation GC has occurred. More resource is needed Do not use with –XX:+UseParallelGC and –XX:+UseParNewGC
Incremental GC Incremental GC Minor GC After many time of Minor GC Full GC Minor GC Minor GC Old Generation is collected in Minor GC  Default GC Incremental GC Young  Generation Old Generation
Incremental GC Incremental GC -client –XX:+PrintGCDetails  -Xincgc  –ms32m –mx32m  [GC [DefNew: 540K->35K(576K), 0.0053557 secs][Train: 3495K->3493K(32128K), 0.0043531 secs] 4036K->3529K(32704K), 0.0099856 secs] [GC [DefNew: 547K->64K(576K), 0.0048216 secs][Train: 3529K->3540K(32128K), 0.0058683 secs] 4041K->3604K(32704K), 0.0109779 secs] [GC [DefNew: 575K->64K(576K), 0.0164904 secs] 4116K->3670K(32704K), 0.0169019 secs] [GC [DefNew: 576K->64K(576K), 0.0057541 secs][Train: 3671K->3651K(32128K), 0.0051286 secs] 4182K->3715K(32704K), 0.0113042 secs] [GC [DefNew: 575K->56K(576K), 0.0114559 secs] 4227K->3745K(32704K), 0.0191390 secs] [ Full GC [Train MSC: 3689K->3280K(32128K), 0.0909523 secs] 4038K->3378K(32704K), 0.0910213 secs ] [GC [ DefNew: 502K->64K(576K), 0.0173220 secs ][Train: 3329K->3329K(32128K), 0.0066279 secs] 3782K->3393K(32704K), 0.0325125 secs Young Generation GC Old Generation GC in Minor GC Time Minor GC Full GC Sun JVM 1.4.1 in Windows OS
Best Pause Concurrent GC Best Throughput Parallel GC Better Pause Incremental GC(Train) Better throughput Mark-compact
Garbage Collection Measurement  -verbosegc (All Platform) -XX:+PrintGCDetails ( JDK 1.4) -Xverbosegc (HP)
Garbage Collection Measurement -verbosegc [GC 40549K->20909K(64768K), 0.0484179 secs] [GC 41197K->21405K(64768K), 0.0411095 secs] [GC 41693K->22995K(64768K), 0.0846190 secs] [GC 43283K->23672K(64768K), 0.0492838 secs] [Full GC 43960K->1749K(64768K), 0.1452965 secs] [GC 22037K->2810K(64768K), 0.0310949 secs] [GC 23098K->3657K(64768K), 0.0469624 secs] [GC 23945K->4847K(64768K), 0.0580108 secs] Full GC Total Heap Size GC Time Heap size after GC Heap size before GC
GC Log analysis using AWK script Awk script BEGIN{ printf(&quot;Minor\tMajor\tAlive\tFree\n&quot;); } { if( substr($0,1,4) == &quot;[GC &quot;){ split($0,array,&quot; &quot;); printf(&quot;%s\t0.0\t&quot;,array[3]) split(array[2],barray,&quot;K&quot;) before=barray[1] after=substr(barray[2],3) reclaim=before-after printf(&quot;%s\t%s\n&quot;,after,reclaim) } if( substr($0,1,9) == &quot;[Full GC &quot;){ split($0,array,&quot; &quot;); printf(&quot;0.0\t%s\t&quot;,array[4]) split(array[3],barray,&quot;K&quot;) before = barray[1] after = substr(barray[2],3) reclaim = before - after printf(&quot;%s\t%s\n&quot;,after,reclaim) } next; } % awk –f gc.awk gc.log ※  Usage gc.awk Minor       Major       Alive       Freed 0.0484179   0.0         20909       19640 0.0411095   0.0         21405       19792 0.0846190   0.0         22995       18698 0.0492838   0.0         23672       19611 0.0         0.1452965   1749        42211 0.0310949   0.0         2810        19227 0.0469624   0.0         3657        19441 0.0580108   0.0         4847        19098 gc.log
GC Log analysis using AWK script < GC Time >
GC Log analysis using HPJtune ※  http://www.hp.com/products1/unix/java/java2/hpjtune/index.html
GC Log analysis using AWK script < GC Amount >
Garbage Collection Tuning GC Tuning Find Most Important factor Low pause? Or High performance? Select appropriate GC model (New Model has risk!!) Select “server” or “client” Find appropriate Heap size by reviewing GC log Find ratio of young and old generation
Garbage Collection Tuning GC Tuning Full GC    Most important factor in GC tuning  How frequently ? How long ? Short and Frequently    decrease old space Long and Sometimes    increase old space Short and Sometimes    decrease throughput    by Load balancing Fix Heap size Set “ms” and “mx” as same Remove shrinking and growing overhead Don’t Don’t make heap size bigger than physical memory (SWAP) Don’t make new generation bigger than half the heap
Jmeter / Threads Histogram
Jmeter /Threads Group Histogram
Example
Example 2004-01-08  오후  7:14 2004-01-09  오전  8 시 전후 2004-01-09  오후  7 시 전후 금요일 업무시간 2004-01-10 오전  10 시 전후 2004-01-10 오후  6 시 전후 PEAK TIME 52000~56000 sec 9 시 ~ 1 시간 가량 Before Tuned Old Area
Example Peak Time  시에  Old GC  시간이  4~8 sec 로  이로 인한  Hang 현상 유발이 가능함 Before Tuned GC Time
Example 12 일 03:38A 12 일 05:58P 13 일 07:18A 13 일 09:38P 14 일 11:58A 15 일 01:18A 15 일 03:38P 16 일 05:58A 16 일 07:18P 17 일 08:38A 17 일 10:58P Weekend Mon  Office  Our Tue  Office  Our Thur  Office  Our Fri  Office  Our After AP Tuned GC Time
Example 12 일 03:38A 12 일 05:58P 13 일 07:18A 13 일 09:38P 14 일 11:58A 15 일 01:18A 15 일 03:38P 16 일 05:58A 16 일 07:18P 17 일 08:38A 17 일 10:58P Weekend Mon  Office  Our Tue  Office  Our Thur  Office  Our Fri  Office  Our
Summary
JVM Tuning Summary Determine JVM performance goal Gather statistics on your application Select hotspot compiler Tuning heap Check threading model Feedback
More Tips More Tips
Thread dump Thread dump Enabled by Unix “kill –3 [JAVA PID]” Windows “Ctrl+Break” Snapshot of java application Can profiling “hang-up”, and “slow-down”
Thread dump example &quot;&quot; Thread dump when slowdown in WAS ExecuteThread: '232' for queue: 'default'&quot; daemon prio=5 tid=0x573ca630 nid=0xd2c waiting for monitor entry [0x5cebf000..0x5cebfdb8] at java.util.Hashtable.get(Hashtable.java:314) at java.util.ListResourceBundle.handleGetObject(ListResourceBundle.java:122) at java.util.ResourceBundle.getObject(ResourceBundle.java:371) at java.util.ResourceBundle.getObject(ResourceBundle.java:374) at java.text.DateFormatSymbols.initializeData(DateFormatSymbols.java:483) at java.text.DateFormatSymbols.<init>(DateFormatSymbols.java:99) at java.text.SimpleDateFormat.<init>(SimpleDateFormat.java:275) at java.text.SimpleDateFormat.<init>(SimpleDateFormat.java:264) at XXX.uv.com.cm.CmDateTimeUtil.getCurrentTime(CmDateTimeUtil.java:88) at XXX.uv.com.util.CmLog.setFileLog(CmLog.java:171) at XXX.uv.com.jsp.EjbJspBase.service(EjbJspBase.java:371) at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:265) at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:200) at weblogic.servlet.internal.WebAppServletContext.invokeServlet(WebAppServletContext.java:2546) at weblogic.servlet.internal.ServletRequestImpl.execute(ServletRequestImpl.java:2260) at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:139) at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:120) &quot;ExecuteThread: '231' for queue: 'default'&quot; daemon prio=5 tid=0x573f9a60 nid=0x13a8 waiting for monitor entry [0x5ce7f000..0x5ce7fdb8] at java.util.Hashtable.get(Hashtable.java:314) at java.text.DecimalFormatSymbols.initialize(DecimalFormatSymbols.java:333) at java.text.DecimalFormatSymbols.<init>(DecimalFormatSymbols.java:55) at java.text.NumberFormat.getInstance(NumberFormat.java:565) at java.text.NumberFormat.getInstance(NumberFormat.java:324) at java.text.SimpleDateFormat.initialize(SimpleDateFormat.java:327) at java.text.SimpleDateFormat.<init>(SimpleDateFormat.java:276) at java.text.SimpleDateFormat.<init>(SimpleDateFormat.java:264) at XXX.uv.com.cm.CmDateTimeUtil.getCurrentTime(CmDateTimeUtil.java:88) at XXX.uv.com.cm.CmDateTimeUtil.getCurrentTime(CmDateTimeUtil.java:67) at XXX.uv.com.datastu.DateTime.setCurrentTime(DateTime.java:190) at XXX.uv.com.jsp.EjbJspBase.service(EjbJspBase.java:239) at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:265) at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:200) at weblogic.servlet.internal.WebAppServletContext.invokeServlet(WebAppServletContext.java:2546) at weblogic.servlet.internal.ServletRequestImpl.execute(ServletRequestImpl.java:2260) at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:139) at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:120)
Profiling CPU usage/HP UX HP UX : Glance + Thread Dump HP Glance Press “G” Thread monitoring
Profiling CPU usage/HP UX &quot;Application Manager Thread&quot; prio=8 tid=0x002a6c00 nid=62 lwp_id=15999 waiting o n monitor [0x64bce000..0x64bce4b8] at java.lang.Thread.sleep(Native Method) at weblogic.management.mbeans.custom.ApplicationManager$ApplicationPolle r.run(ApplicationManager.java:1137) CPU Load of Thread 15999 is 17.7% Thread 15999 is working on weblogic.management.mbeans.custom.ApplicationManager (ApplicationManger.java 1137) Glance Thread Monitoring Java Thread Dump
Other tools Profile with Java option Analyze using HP Jmeter Jprobe Stress Test Load Runner MS Stress (Free)
Related URL Java Thread http://java.sun.com/docs/hotspot/threads/threads.htm Java Performance  http://java.sun.com/docs/hotspot/PerformanceFAQ.html Java Thread  http://www.javaworld.com/javaworld/jw-09-1998/jw-09-threads.html Pick up performance with generational gc  http://www.javaworld.com/javaworld/jw-01-2002/jw-0111-hotspotgc.html JVM1.4  GC Tunning  http://java.sun.com/docs/hotspot/gc1.4.2/index.html HP Jmeter,Jtune,Jconfig  http://www.hp.com/products1/unix/java/developers/index.html SPECjvm98 SPECjAppServer2001/2002
Thank you

Jvm Performance Tunning

  • 1.
    JavaStudy Network DaehyubCho JVM [Java Virtual Machine] Performance Tuning
  • 2.
    AGENDA Basic conceptof JVM Tuning 1 Hotspot compiler 2 Threading Model 3 Memory Model 4
  • 3.
    Basic Concept ofJVM Tuning Basic concept of JVM Tuning
  • 4.
    Basic of performancetuning Decide what performance level is “good enough” Test & measurement Scenario based Stress Tool (Load Runner) Profiling Tool (J probe, etc) Profile application to find bottlenecks Tuning Application * Middleware [WAS] OS JVM Return to Step 2 [feedback]
  • 5.
    JVM Tuning Improveperformance about 10~20% Find appropriate parameter for your application Hotspot compile option Thread model option * GC and memory related option ** Changing parameter is very dangerous action Need more test and feed back Ref spec.org
  • 6.
  • 7.
    JVM Layout Hotspotfrom JDK 1.3 VM Client Compiler Server Compiler Runtime GC Interpreter Threading & Locking … . JVM Hotspot Compiler
  • 8.
    Hotspot compiler JIT(Just-In-Time Compiler) Compile byte code to native code Compile as rules of optimization (Not thinking) At execution/installation Compile byte code to native code Hotspot Compile byte code to native code ‘ Thinking’ to trying find where optimization can take place Adaptive Optimizing in runtime
  • 9.
    Hotspot Detection Hotspotdetection Method Inlining Dynamic Deoptimization
  • 10.
    Hotspot Detection andMethod Inlining Literal constants are folded String concatenation is sometimes folded Constant fields are inlined int foo = 9* 10;  int foo = 90; String foo = “Hello “ + (9*10);  String foo = “Hello 90”; public class A{ public static final VALUE=99; } public class B{ static int VALUE2=A.VALUE; } public class B{ static int VALUE2=99; }  When after compiling class B
  • 11.
    Hotspot detection /Method Inlining Dead code branches are eliminated public class A{ static final boolean DEBUG = false; public void methodA() if(DEBUG) System.out.println(“DEBUG MODE); System.out.println(“Say Hello”); }// method A }// class A ↓ public class A{ static final boolean DEBUG = false; public void methodA() System.out.println(“Say Hello”); }// method A }// class A
  • 12.
    Hotspot Client compilerJava Option : -client Focused on Simple & Fast start up 3 Phase compiler HIR (High Level Intermediate Representation) LIR (Low Level Intermediate Representation) Machine code It focuses on local code quality and does very few global optimizations since those are often the most expensive in terms of compile time It has for inlining any function that has no exception handlers or synchronization and also supports deoptimization for debugging and inlining
  • 13.
    Hotspot Server compilerJava Option : -server Focused on optimization SSA (Static Single Assignment)-based IR
  • 14.
    Hotspot compiler OptionHotspot compile option -XX:MaxInlineSize=<size> Integer specifying maximum number of bytecode instructions in a method which gets inlined. -XX:FreqInlineSize=<size> Integer specifying maximum number of bytecode instructions in a frequently executed method which gets inlined. -Xint Interpreter only (no JIT compilation) -XX:+PrintCompilation
  • 15.
  • 16.
    Threading Model ThreadModel Java is multi threaded programming language Native thread model from JDK 1.2 Thread mapping (M:N and 1:1) Thread synchronization Java Application Java Thread Operating System Thread Handling Thread Scheduling Lock Mgmt (synchronization) JVM
  • 17.
    Solaris M:N ThreadModel Java Application Java Thread JVM Solaris OS OS Kernel Solaris Thread LWP Kernel Thread
  • 18.
    Solaris M:N ThreadModel Solaris M:N Thread Model Thread based synchronization LWP based synchronization Default -XX:-UseLWPSynchronization JDK1.4 -XX:+UseLWPSynchronization Default JDK1.3 Default N/A JDK1.2 LWP based sync Thread based sync
  • 19.
    Solaris 1:1 ThreadModel Java Application Java Thread JVM Solaris OS OS Kernel Solaris Thread LWP Kernel Thread
  • 20.
    Solaris 1:1 ThreadModel Solaris 1:1 Thread Model Bound thread Alternate Libthread ※ In Solaris 9, alternate lib thread is default, do not add /usr/lib/lwp to LD_LIBRARY_PATH export LD_LIBRARY_PATH=/usr/lib/lwp -XX:+UseBoundThreads JDK1.4 export LD_LIBRARY_PATH=/usr/lib/lwp -XX:+UseBoundThreads JDK1.3 export LD_LIBRARY_PATH=/usr/lib/lwp N/A JDK1.2 Alternate Libthread* Bound Thread
  • 21.
    JVM Performance Teston Solaris < Solaris 8 with JVM 1.3 > See next page graph!! Architecture Cpus Threads Model %diff in throughput (against Standard Model) Sparc 30 400/2000 Standard --- Sparc 30 400/2000 LWP Synchronization 215%/800% Sparc 30 400/2000 Bound Threads -10%/-80% Sparc 30 400/2000 Alternate One-to-one 275%/900% Sparc 4 400/2000 Standard --- Sparc 4 400/2000 LWP Synchronization 30%/60% Sparc 4 400/2000 Bound Threads -5%/-45% Sparc 4 400/2000 Alternate One-to-one 30%/50% Sparc 2 400/2000 Standard --- Sparc 2 400/2000 LWP Synchronization 0%/25% Sparc 2 400/2000 Bound Threads -30%/-40% Sparc 2 400/2000 Alternate One-to-one -10%/0% Intel 4 400/2000 Standard --- Intel 4 400/2000 LWP Synchronization 25%/60% Intel 4 400/2000 Bound Threads 0%/-10% Intel 4 400/2000 Alternate One-to-one 20%/60% Intel 2 400/2000 Standard --- Intel 2 400/2000 LWP Synchronization 15%/45% Intel 2 400/2000 Bound Threads -10%/-15% Intel 2 400/2000 Alternate One-to-one 15%/35%
  • 22.
    JVM Performance Teston Solaris Performance Test Result Graph
  • 23.
  • 24.
    Memory Tuning GarbageCollection JVM Memory Layout Garbage Collection Model Server VM and Client VM Garbage Collection Measurement & Analysis Tuning Garbage Collection
  • 25.
  • 26.
    JVM Memory LayoutNew/Young – Recently created object Old – Long lived object Perm – JVM classes and methods Eden Old Perm New/Young Old Used in Application JVM Total Heap Size SS1 SS2
  • 27.
    Garbage Collection GarbageCollection Collecting unused java object Cleaning memory Minor GC Collection memory in New/Young generation Major GC (Full GC) Collection memory in Old generation
  • 28.
    Minor GC MinorCollection New/Young Generation Copy and Scavenge Very Fast
  • 29.
    Minor GC EdenSS1 SS1 Copy live objects to Survivor area New Object Garbage Lived Object 1 st Minor GC Old Old Old
  • 30.
    Minor GC 2nd Minor GC Old Old Old New Object Garbage Lived Object
  • 31.
    Minor GC OLD3 rd Minor GC Objects moved old space when they become tenured New Object Garbage Lived Object
  • 32.
    Major GC MajorCollection Old Generation Mark and compact Slow 1 st – goes through the entire heap , marking unreachable objects 2 nd – unreachable objects are compacted
  • 33.
    Major GC EdenSS1 SS2 Eden SS1 SS2 Mark the objects to be removed Eden SS1 SS2 Compact the objects to be removed
  • 34.
    Server option versusClient option -X:NewRatio=2 (1.3) , -Xmn128m(1.4), -XX:NewSize=<size> -XX:MaxNewSize=<size>
  • 35.
    GC Tuning ParameterMemory Tuning Parameter Perm Size : -XX:MaxPermSize=64m Total Heap Size : -ms512m –mx 512m New Size -XX:NewRatio=2  Old/New Size -XX:NewSize=128m -Xmn128m (JDK 1.4) Survivor Size : -XX:SurvivorRatio=64 (eden/survivor) Heap Ratio -XX:MaxHeapFreeRatio=70 -XX:MinHeapFreeRatio=40 Suvivor Ratio -XX:TargetSurvivorRatio=50
  • 36.
    Support for –XXOption Options that begin with -X are nonstandard (not guaranteed to be supported on all VM implementations), and are subject to change without notice in subsequent releases of the Java 2 SDK. Because the -XX options have specific system requirements for correct operation and may require privileged access to system configuration parameters, they are not recommended for casual use. These options are also subject to change without notice .
  • 37.
    Garbage Collection ModelNew type of GC Default Collector Parallel GC for young generation - JDK 1.4 Concurrent GC for old generation - JDK 1.4 Incremental Low Pause Collector (Train GC)
  • 38.
    Parallel GC ParallelGC Improve performance of GC For young generation (Minor GC) More than 4CPU and 256MB Physical memory required threads time gc threads Default GC Parallel GC Young Generation
  • 39.
    Parallel GC TwoParallel Collectors Low-pause : -XX:+UseParNewGC Near real-time or pause dependent application Works with Mark and compact collector Concurrent old area collector Throughput : -XX:+UseParallelGC Enterprise or throughput oriented application Works only with the mark and compact collector
  • 40.
    Parallel GC ThroughputCollector – XX:+UseParallelGC -XX:ParallelGCThreads=<desired number> -XX:+UseAdaptiveSizePolicy Adaptive resizing of the young generation
  • 41.
    Parallel GC ThroughputCollector AggressiveHeap Enabled By-XX:+AggresiveHeap Inspect machine resources and attempts to set various parameters to be optimal for long-running,memory-intensive jobs Useful in more than 4 CPU machine, more than 256M Useful in Server Application Do not use with –ms and –mx Example) HP Itanium 1.4.2 java -XX:+ServerApp -XX:+AggresiveHeap -Xmn3400m -spec.jbb.JBBmain -propfile Test1
  • 42.
    Concurrent GC ConcurrentGC Reduce pause time to collect Old Generation For old generation (Full GC) Enabled by - XX:+UseConcMarkSweepGC threads time gc threads Default GC Concurrent GC Old Generation
  • 43.
    Incremental GC IncrementalGC Enabled by –XIncgc (from JDK 1.3) Collect Old generation whenever collect young generation Reduce pause time for collect old generation Disadvantage More frequently young generation GC has occurred. More resource is needed Do not use with –XX:+UseParallelGC and –XX:+UseParNewGC
  • 44.
    Incremental GC IncrementalGC Minor GC After many time of Minor GC Full GC Minor GC Minor GC Old Generation is collected in Minor GC Default GC Incremental GC Young Generation Old Generation
  • 45.
    Incremental GC IncrementalGC -client –XX:+PrintGCDetails -Xincgc –ms32m –mx32m [GC [DefNew: 540K->35K(576K), 0.0053557 secs][Train: 3495K->3493K(32128K), 0.0043531 secs] 4036K->3529K(32704K), 0.0099856 secs] [GC [DefNew: 547K->64K(576K), 0.0048216 secs][Train: 3529K->3540K(32128K), 0.0058683 secs] 4041K->3604K(32704K), 0.0109779 secs] [GC [DefNew: 575K->64K(576K), 0.0164904 secs] 4116K->3670K(32704K), 0.0169019 secs] [GC [DefNew: 576K->64K(576K), 0.0057541 secs][Train: 3671K->3651K(32128K), 0.0051286 secs] 4182K->3715K(32704K), 0.0113042 secs] [GC [DefNew: 575K->56K(576K), 0.0114559 secs] 4227K->3745K(32704K), 0.0191390 secs] [ Full GC [Train MSC: 3689K->3280K(32128K), 0.0909523 secs] 4038K->3378K(32704K), 0.0910213 secs ] [GC [ DefNew: 502K->64K(576K), 0.0173220 secs ][Train: 3329K->3329K(32128K), 0.0066279 secs] 3782K->3393K(32704K), 0.0325125 secs Young Generation GC Old Generation GC in Minor GC Time Minor GC Full GC Sun JVM 1.4.1 in Windows OS
  • 46.
    Best Pause ConcurrentGC Best Throughput Parallel GC Better Pause Incremental GC(Train) Better throughput Mark-compact
  • 47.
    Garbage Collection Measurement -verbosegc (All Platform) -XX:+PrintGCDetails ( JDK 1.4) -Xverbosegc (HP)
  • 48.
    Garbage Collection Measurement-verbosegc [GC 40549K->20909K(64768K), 0.0484179 secs] [GC 41197K->21405K(64768K), 0.0411095 secs] [GC 41693K->22995K(64768K), 0.0846190 secs] [GC 43283K->23672K(64768K), 0.0492838 secs] [Full GC 43960K->1749K(64768K), 0.1452965 secs] [GC 22037K->2810K(64768K), 0.0310949 secs] [GC 23098K->3657K(64768K), 0.0469624 secs] [GC 23945K->4847K(64768K), 0.0580108 secs] Full GC Total Heap Size GC Time Heap size after GC Heap size before GC
  • 49.
    GC Log analysisusing AWK script Awk script BEGIN{ printf(&quot;Minor\tMajor\tAlive\tFree\n&quot;); } { if( substr($0,1,4) == &quot;[GC &quot;){ split($0,array,&quot; &quot;); printf(&quot;%s\t0.0\t&quot;,array[3]) split(array[2],barray,&quot;K&quot;) before=barray[1] after=substr(barray[2],3) reclaim=before-after printf(&quot;%s\t%s\n&quot;,after,reclaim) } if( substr($0,1,9) == &quot;[Full GC &quot;){ split($0,array,&quot; &quot;); printf(&quot;0.0\t%s\t&quot;,array[4]) split(array[3],barray,&quot;K&quot;) before = barray[1] after = substr(barray[2],3) reclaim = before - after printf(&quot;%s\t%s\n&quot;,after,reclaim) } next; } % awk –f gc.awk gc.log ※ Usage gc.awk Minor       Major       Alive       Freed 0.0484179   0.0         20909       19640 0.0411095   0.0         21405       19792 0.0846190   0.0         22995       18698 0.0492838   0.0         23672       19611 0.0         0.1452965   1749        42211 0.0310949   0.0         2810        19227 0.0469624   0.0         3657        19441 0.0580108   0.0         4847        19098 gc.log
  • 50.
    GC Log analysisusing AWK script < GC Time >
  • 51.
    GC Log analysisusing HPJtune ※ http://www.hp.com/products1/unix/java/java2/hpjtune/index.html
  • 52.
    GC Log analysisusing AWK script < GC Amount >
  • 53.
    Garbage Collection TuningGC Tuning Find Most Important factor Low pause? Or High performance? Select appropriate GC model (New Model has risk!!) Select “server” or “client” Find appropriate Heap size by reviewing GC log Find ratio of young and old generation
  • 54.
    Garbage Collection TuningGC Tuning Full GC  Most important factor in GC tuning How frequently ? How long ? Short and Frequently  decrease old space Long and Sometimes  increase old space Short and Sometimes  decrease throughput  by Load balancing Fix Heap size Set “ms” and “mx” as same Remove shrinking and growing overhead Don’t Don’t make heap size bigger than physical memory (SWAP) Don’t make new generation bigger than half the heap
  • 55.
  • 56.
  • 57.
  • 58.
    Example 2004-01-08 오후 7:14 2004-01-09 오전 8 시 전후 2004-01-09 오후 7 시 전후 금요일 업무시간 2004-01-10 오전 10 시 전후 2004-01-10 오후 6 시 전후 PEAK TIME 52000~56000 sec 9 시 ~ 1 시간 가량 Before Tuned Old Area
  • 59.
    Example Peak Time 시에 Old GC 시간이 4~8 sec 로 이로 인한 Hang 현상 유발이 가능함 Before Tuned GC Time
  • 60.
    Example 12 일03:38A 12 일 05:58P 13 일 07:18A 13 일 09:38P 14 일 11:58A 15 일 01:18A 15 일 03:38P 16 일 05:58A 16 일 07:18P 17 일 08:38A 17 일 10:58P Weekend Mon Office Our Tue Office Our Thur Office Our Fri Office Our After AP Tuned GC Time
  • 61.
    Example 12 일03:38A 12 일 05:58P 13 일 07:18A 13 일 09:38P 14 일 11:58A 15 일 01:18A 15 일 03:38P 16 일 05:58A 16 일 07:18P 17 일 08:38A 17 일 10:58P Weekend Mon Office Our Tue Office Our Thur Office Our Fri Office Our
  • 62.
  • 63.
    JVM Tuning SummaryDetermine JVM performance goal Gather statistics on your application Select hotspot compiler Tuning heap Check threading model Feedback
  • 64.
  • 65.
    Thread dump Threaddump Enabled by Unix “kill –3 [JAVA PID]” Windows “Ctrl+Break” Snapshot of java application Can profiling “hang-up”, and “slow-down”
  • 66.
    Thread dump example&quot;&quot; Thread dump when slowdown in WAS ExecuteThread: '232' for queue: 'default'&quot; daemon prio=5 tid=0x573ca630 nid=0xd2c waiting for monitor entry [0x5cebf000..0x5cebfdb8] at java.util.Hashtable.get(Hashtable.java:314) at java.util.ListResourceBundle.handleGetObject(ListResourceBundle.java:122) at java.util.ResourceBundle.getObject(ResourceBundle.java:371) at java.util.ResourceBundle.getObject(ResourceBundle.java:374) at java.text.DateFormatSymbols.initializeData(DateFormatSymbols.java:483) at java.text.DateFormatSymbols.<init>(DateFormatSymbols.java:99) at java.text.SimpleDateFormat.<init>(SimpleDateFormat.java:275) at java.text.SimpleDateFormat.<init>(SimpleDateFormat.java:264) at XXX.uv.com.cm.CmDateTimeUtil.getCurrentTime(CmDateTimeUtil.java:88) at XXX.uv.com.util.CmLog.setFileLog(CmLog.java:171) at XXX.uv.com.jsp.EjbJspBase.service(EjbJspBase.java:371) at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:265) at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:200) at weblogic.servlet.internal.WebAppServletContext.invokeServlet(WebAppServletContext.java:2546) at weblogic.servlet.internal.ServletRequestImpl.execute(ServletRequestImpl.java:2260) at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:139) at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:120) &quot;ExecuteThread: '231' for queue: 'default'&quot; daemon prio=5 tid=0x573f9a60 nid=0x13a8 waiting for monitor entry [0x5ce7f000..0x5ce7fdb8] at java.util.Hashtable.get(Hashtable.java:314) at java.text.DecimalFormatSymbols.initialize(DecimalFormatSymbols.java:333) at java.text.DecimalFormatSymbols.<init>(DecimalFormatSymbols.java:55) at java.text.NumberFormat.getInstance(NumberFormat.java:565) at java.text.NumberFormat.getInstance(NumberFormat.java:324) at java.text.SimpleDateFormat.initialize(SimpleDateFormat.java:327) at java.text.SimpleDateFormat.<init>(SimpleDateFormat.java:276) at java.text.SimpleDateFormat.<init>(SimpleDateFormat.java:264) at XXX.uv.com.cm.CmDateTimeUtil.getCurrentTime(CmDateTimeUtil.java:88) at XXX.uv.com.cm.CmDateTimeUtil.getCurrentTime(CmDateTimeUtil.java:67) at XXX.uv.com.datastu.DateTime.setCurrentTime(DateTime.java:190) at XXX.uv.com.jsp.EjbJspBase.service(EjbJspBase.java:239) at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:265) at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:200) at weblogic.servlet.internal.WebAppServletContext.invokeServlet(WebAppServletContext.java:2546) at weblogic.servlet.internal.ServletRequestImpl.execute(ServletRequestImpl.java:2260) at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:139) at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:120)
  • 67.
    Profiling CPU usage/HPUX HP UX : Glance + Thread Dump HP Glance Press “G” Thread monitoring
  • 68.
    Profiling CPU usage/HPUX &quot;Application Manager Thread&quot; prio=8 tid=0x002a6c00 nid=62 lwp_id=15999 waiting o n monitor [0x64bce000..0x64bce4b8] at java.lang.Thread.sleep(Native Method) at weblogic.management.mbeans.custom.ApplicationManager$ApplicationPolle r.run(ApplicationManager.java:1137) CPU Load of Thread 15999 is 17.7% Thread 15999 is working on weblogic.management.mbeans.custom.ApplicationManager (ApplicationManger.java 1137) Glance Thread Monitoring Java Thread Dump
  • 69.
    Other tools Profilewith Java option Analyze using HP Jmeter Jprobe Stress Test Load Runner MS Stress (Free)
  • 70.
    Related URL JavaThread http://java.sun.com/docs/hotspot/threads/threads.htm Java Performance http://java.sun.com/docs/hotspot/PerformanceFAQ.html Java Thread http://www.javaworld.com/javaworld/jw-09-1998/jw-09-threads.html Pick up performance with generational gc http://www.javaworld.com/javaworld/jw-01-2002/jw-0111-hotspotgc.html JVM1.4 GC Tunning http://java.sun.com/docs/hotspot/gc1.4.2/index.html HP Jmeter,Jtune,Jconfig http://www.hp.com/products1/unix/java/developers/index.html SPECjvm98 SPECjAppServer2001/2002
  • 71.