JavaStudy Network Daehyub Cho JVM [Java Virtual Machine] Performance Tuning
AGENDA Basic concept of JVM Tuning 1 Hotspot compiler 2 Threading Model 3 Memory Model 4
Basic Concept of JVM Tuning Basic concept of JVM Tuning
Basic of performance tuning
Decide what performance level is “good enough”
Test & measurement
Scenario based
Stress Tool (Load Runner)
Profiling Tool (J probe, etc)
Profile application to find bottlenecks
Tuning
Application *
Middleware [WAS]
OS
JVM
Return to Step 2 [feedback]
JVM Tuning
Improve performance about 10~20%
Find appropriate parameter for your application
Hotspot compile option
Thread model option *
GC and memory related option **
Changing parameter is very dangerous action
Need more test and feed back
Ref spec.org
Hotspot Compiler Hotspot compiler
JVM Layout
Hotspot from JDK 1.3
VM Client Compiler Server Compiler
Runtime
GC
Interpreter
Threading & Locking
… .
JVM Hotspot Compiler
Hotspot compiler
JIT (Just-In-Time Compiler)
Compile byte code to native code
Compile as rules of optimization (Not thinking)
At execution/installation
Compile byte code to native code
Hotspot
Compile byte code to native code
‘ Thinking’ to trying find where optimization can take place
Adaptive Optimizing in runtime
Hotspot Detection
Hotspot detection
Method Inlining
Dynamic Deoptimization
Hotspot Detection and Method Inlining
Literal constants are folded
String concatenation is sometimes folded
Constant fields are inlined
int foo = 9* 10; int foo = 90; String foo = “Hello “ + (9*10); String foo = “Hello 90”; public class A{ public static final VALUE=99; } public class B{ static int VALUE2=A.VALUE; } public class B{ static int VALUE2=99; } When after compiling class B
Hotspot detection / Method Inlining
Dead code branches are eliminated
public class A{ static final boolean DEBUG = false; public void methodA() if(DEBUG) System.out.println(“DEBUG MODE); System.out.println(“Say Hello”); }// method A }// class A ↓ public class A{ static final boolean DEBUG = false; public void methodA() System.out.println(“Say Hello”); }// method A }// class A
Hotspot Client compiler
Java Option : -client
Focused on Simple & Fast start up
3 Phase compiler
HIR (High Level Intermediate Representation)
LIR (Low Level Intermediate Representation)
Machine code
It focuses on local code quality and does very few global optimizations since those are often the most expensive in terms of compile time
It has for inlining any function that has no exception handlers or synchronization and also supports deoptimization for debugging and inlining
Hotspot Server compiler
Java Option : -server
Focused on optimization
SSA (Static Single Assignment)-based IR
Hotspot compiler Option
Hotspot compile option
-XX:MaxInlineSize=<size>
Integer specifying maximum number of bytecode instructions in a method which gets inlined.
-XX:FreqInlineSize=<size>
Integer specifying maximum number of bytecode instructions in a frequently executed method which gets inlined.
Options that begin with -X are nonstandard (not guaranteed to be supported on all VM implementations), and are subject to change without notice in subsequent releases of the Java 2 SDK.
Because the -XX options have specific system requirements for correct operation and may require privileged access to system configuration parameters, they are not recommended for casual use. These options are also subject to change without notice .
Garbage Collection Model
New type of GC
Default Collector
Parallel GC for young generation - JDK 1.4
Concurrent GC for old generation - JDK 1.4
Incremental Low Pause Collector (Train GC)
Parallel GC
Parallel GC
Improve performance of GC
For young generation (Minor GC)
More than 4CPU and 256MB Physical memory required
threads time gc threads Default GC Parallel GC Young Generation
Parallel GC
Two Parallel Collectors
Low-pause : -XX:+UseParNewGC
Near real-time or pause dependent application
Works with
Mark and compact collector
Concurrent old area collector
Throughput : -XX:+UseParallelGC
Enterprise or throughput oriented application
Works only with the mark and compact collector
Parallel GC
Throughput Collector
– XX:+UseParallelGC
-XX:ParallelGCThreads=<desired number>
-XX:+UseAdaptiveSizePolicy
Adaptive resizing of the young generation
Parallel GC
Throughput Collector
AggressiveHeap
Enabled By-XX:+AggresiveHeap
Inspect machine resources and attempts to set various parameters to be optimal for long-running,memory-intensive jobs
threads time gc threads Default GC Concurrent GC Old Generation
Incremental GC
Incremental GC
Enabled by –XIncgc (from JDK 1.3)
Collect Old generation whenever collect young generation
Reduce pause time for collect old generation
Disadvantage
More frequently young generation GC has occurred.
More resource is needed
Do not use with –XX:+UseParallelGC and –XX:+UseParNewGC
Incremental GC
Incremental GC
Minor GC After many time of Minor GC Full GC Minor GC Minor GC Old Generation is collected in Minor GC Default GC Incremental GC Young Generation Old Generation
Incremental GC
Incremental GC
-client –XX:+PrintGCDetails -Xincgc –ms32m –mx32m
[GC [DefNew: 540K->35K(576K), 0.0053557 secs][Train: 3495K->3493K(32128K), 0.0043531 secs] 4036K->3529K(32704K), 0.0099856 secs] [GC [DefNew: 547K->64K(576K), 0.0048216 secs][Train: 3529K->3540K(32128K), 0.0058683 secs] 4041K->3604K(32704K), 0.0109779 secs] [GC [DefNew: 575K->64K(576K), 0.0164904 secs] 4116K->3670K(32704K), 0.0169019 secs] [GC [DefNew: 576K->64K(576K), 0.0057541 secs][Train: 3671K->3651K(32128K), 0.0051286 secs] 4182K->3715K(32704K), 0.0113042 secs] [GC [DefNew: 575K->56K(576K), 0.0114559 secs] 4227K->3745K(32704K), 0.0191390 secs] [ Full GC [Train MSC: 3689K->3280K(32128K), 0.0909523 secs] 4038K->3378K(32704K), 0.0910213 secs ] [GC [ DefNew: 502K->64K(576K), 0.0173220 secs ][Train: 3329K->3329K(32128K), 0.0066279 secs] 3782K->3393K(32704K), 0.0325125 secs Young Generation GC Old Generation GC in Minor GC Time Minor GC Full GC Sun JVM 1.4.1 in Windows OS
Best Pause Concurrent GC Best Throughput Parallel GC Better Pause Incremental GC(Train) Better throughput Mark-compact
Garbage Collection Measurement
-verbosegc (All Platform)
-XX:+PrintGCDetails ( JDK 1.4)
-Xverbosegc (HP)
Garbage Collection Measurement
-verbosegc
[GC 40549K->20909K(64768K), 0.0484179 secs] [GC 41197K->21405K(64768K), 0.0411095 secs] [GC 41693K->22995K(64768K), 0.0846190 secs] [GC 43283K->23672K(64768K), 0.0492838 secs] [Full GC 43960K->1749K(64768K), 0.1452965 secs] [GC 22037K->2810K(64768K), 0.0310949 secs] [GC 23098K->3657K(64768K), 0.0469624 secs] [GC 23945K->4847K(64768K), 0.0580108 secs] Full GC Total Heap Size GC Time Heap size after GC Heap size before GC
GC Log analysis using HPJtune ※ http://www.hp.com/products1/unix/java/java2/hpjtune/index.html
GC Log analysis using AWK script < GC Amount >
Garbage Collection Tuning
GC Tuning
Find Most Important factor
Low pause? Or High performance?
Select appropriate GC model (New Model has risk!!)
Select “server” or “client”
Find appropriate Heap size by reviewing GC log
Find ratio of young and old generation
Garbage Collection Tuning
GC Tuning
Full GC Most important factor in GC tuning
How frequently ? How long ?
Short and Frequently decrease old space
Long and Sometimes increase old space
Short and Sometimes decrease throughput by Load balancing
Fix Heap size
Set “ms” and “mx” as same
Remove shrinking and growing overhead
Don’t
Don’t make heap size bigger than physical memory (SWAP)
Don’t make new generation bigger than half the heap
Jmeter / Threads Histogram
Jmeter /Threads Group Histogram
Example
Example 2004-01-08 오후 7:14 2004-01-09 오전 8 시 전후 2004-01-09 오후 7 시 전후 금요일 업무시간 2004-01-10 오전 10 시 전후 2004-01-10 오후 6 시 전후 PEAK TIME 52000~56000 sec 9 시 ~ 1 시간 가량 Before Tuned Old Area
Example Peak Time 시에 Old GC 시간이 4~8 sec 로 이로 인한 Hang 현상 유발이 가능함 Before Tuned GC Time
Example 12 일 03:38A 12 일 05:58P 13 일 07:18A 13 일 09:38P 14 일 11:58A 15 일 01:18A 15 일 03:38P 16 일 05:58A 16 일 07:18P 17 일 08:38A 17 일 10:58P Weekend Mon Office Our Tue Office Our Thur Office Our Fri Office Our After AP Tuned GC Time
Example 12 일 03:38A 12 일 05:58P 13 일 07:18A 13 일 09:38P 14 일 11:58A 15 일 01:18A 15 일 03:38P 16 일 05:58A 16 일 07:18P 17 일 08:38A 17 일 10:58P Weekend Mon Office Our Tue Office Our Thur Office Our Fri Office Our
Summary
JVM Tuning Summary
Determine JVM performance goal
Gather statistics on your application
Select hotspot compiler
Tuning heap
Check threading model
Feedback
More Tips More Tips
Thread dump
Thread dump
Enabled by
Unix “kill –3 [JAVA PID]”
Windows “Ctrl+Break”
Snapshot of java application
Can profiling “hang-up”, and “slow-down”
Thread dump example
""
Thread dump when slowdown in WAS
ExecuteThread: '232' for queue: 'default'" daemon prio=5 tid=0x573ca630 nid=0xd2c waiting for monitor entry [0x5cebf000..0x5cebfdb8] at java.util.Hashtable.get(Hashtable.java:314) at java.util.ListResourceBundle.handleGetObject(ListResourceBundle.java:122) at java.util.ResourceBundle.getObject(ResourceBundle.java:371) at java.util.ResourceBundle.getObject(ResourceBundle.java:374) at java.text.DateFormatSymbols.initializeData(DateFormatSymbols.java:483) at java.text.DateFormatSymbols.<init>(DateFormatSymbols.java:99) at java.text.SimpleDateFormat.<init>(SimpleDateFormat.java:275) at java.text.SimpleDateFormat.<init>(SimpleDateFormat.java:264) at XXX.uv.com.cm.CmDateTimeUtil.getCurrentTime(CmDateTimeUtil.java:88) at XXX.uv.com.util.CmLog.setFileLog(CmLog.java:171) at XXX.uv.com.jsp.EjbJspBase.service(EjbJspBase.java:371) at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:265) at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:200) at weblogic.servlet.internal.WebAppServletContext.invokeServlet(WebAppServletContext.java:2546) at weblogic.servlet.internal.ServletRequestImpl.execute(ServletRequestImpl.java:2260) at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:139) at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:120) "ExecuteThread: '231' for queue: 'default'" daemon prio=5 tid=0x573f9a60 nid=0x13a8 waiting for monitor entry [0x5ce7f000..0x5ce7fdb8] at java.util.Hashtable.get(Hashtable.java:314) at java.text.DecimalFormatSymbols.initialize(DecimalFormatSymbols.java:333) at java.text.DecimalFormatSymbols.<init>(DecimalFormatSymbols.java:55) at java.text.NumberFormat.getInstance(NumberFormat.java:565) at java.text.NumberFormat.getInstance(NumberFormat.java:324) at java.text.SimpleDateFormat.initialize(SimpleDateFormat.java:327) at java.text.SimpleDateFormat.<init>(SimpleDateFormat.java:276) at java.text.SimpleDateFormat.<init>(SimpleDateFormat.java:264) at XXX.uv.com.cm.CmDateTimeUtil.getCurrentTime(CmDateTimeUtil.java:88) at XXX.uv.com.cm.CmDateTimeUtil.getCurrentTime(CmDateTimeUtil.java:67) at XXX.uv.com.datastu.DateTime.setCurrentTime(DateTime.java:190) at XXX.uv.com.jsp.EjbJspBase.service(EjbJspBase.java:239) at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:265) at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:200) at weblogic.servlet.internal.WebAppServletContext.invokeServlet(WebAppServletContext.java:2546) at weblogic.servlet.internal.ServletRequestImpl.execute(ServletRequestImpl.java:2260) at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:139) at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:120)
Profiling CPU usage/HP UX
HP UX : Glance + Thread Dump
HP Glance Press “G” Thread monitoring
Profiling CPU usage/HP UX
"Application Manager Thread" prio=8 tid=0x002a6c00 nid=62 lwp_id=15999 waiting o n monitor [0x64bce000..0x64bce4b8] at java.lang.Thread.sleep(Native Method) at weblogic.management.mbeans.custom.ApplicationManager$ApplicationPolle r.run(ApplicationManager.java:1137) CPU Load of Thread 15999 is 17.7% Thread 15999 is working on weblogic.management.mbeans.custom.ApplicationManager (ApplicationManger.java 1137) Glance Thread Monitoring Java Thread Dump
0 comments
Post a comment