6. @aragozin#Devoxx #WhyJavaSlow
What exactly slow means – clarify your KPIs
Business transaction ≠ Page
Business transaction ≠ SQL transaction
Page ≠ HTTP request
HTTP request ≠ SQL transaction
Your system is a black box
Do you think you know how it works?
You do not!
Profiling produces three types of data
Lies – incorrectly interpreted data
Darn lies – incorrectly measured data
Statistics – data which will help to fix your system
7. @aragozin#Devoxx #WhyJavaSlow
What types of bottlenecks with can find in Java?
CPU bound
Single core bottleneck
CPU starvation
Thread contention
Memory and GC related
Frequent young GC
Abnormally long GC pauses
Frequent full GC
8. @aragozin#Devoxx #WhyJavaSlow
Single core bottleneck
Certain singleton thread consumes 100% CPU
CPU starvation
Number of threads compete for physical cores
Thread nor sleeps neither waits but CPU usage is far below 100%
Thread contention
Thread spends considerable time in synchronization (accruing, waiting)
Frequent young GC
Frequent young GC – consume handful of CPU budget
Caused by intensive memory allocation in application code
9. @aragozin#Devoxx #WhyJavaSlow
Thread CPU usage
Number CPU cycles spend on this thread (translated into time or percentage)
Calculated by OS (User + Kernel)
Single thread can consume 100% at max
Java thread in RUNNABLE state
Not BLOCKED, WAITING, SLEEPING or PARKED
Thread in blocking socket read is RUNNABLE
Java RUNNABLE
may not be runnable from OS prospective
may be runnable but not on CPU
may actually be running on CPU (counted to thread CPU usage)
17. @aragozin#Devoxx #WhyJavaSlow
Tracking CPU hot spots using thread sampling
PRO
Low overhead
No upfront configuration
Can identify, CPU hot spots, IO hot spots, contention
Data are comparable across runs
CON
Limited precision
No real method execution time measured
Safe point bias
Destabilization is limited to static program structure – methods
18. @aragozin#Devoxx #WhyJavaSlow
Instrumentation profiling
Modifies byte code at runtime
Modifies behavior of program
Affects JIT compilation
Probe can access method arguments
When instrumentation should/can be used?
You narrowed down problem area
You need more context to find root cause
19. @aragozin#Devoxx #WhyJavaSlow
BTrace
Open Source
Byte code instrumentation
Probes are coded Java
CLI and API
BTrace script
@Property
Profiler prof = Profiling.newProfiler();
@OnMethod(clazz = "org.jboss.seam.Component",
method = "/(inject|disinject|outject)/")
void entryByMethod2(@ProbeClassName String className,
@ProbeMethodName String methodName, @Self Object component) {
if (component != null) {
Field nameField = field(classOf(component), "name", true);
if (nameField != null) {
String name = (String)get(nameField, component);
Profiling.recordEntry(prof, concat("org.jboss.seam.Component.",
concat(methodName, concat(":", name))));
}
}
}
@OnMethod(clazz = "org.jboss.seam.Component",
method = "/(inject|disinject|outject)/",
location = @Location(value = Kind.RETURN))
void exitByMthd2(@ProbeClassName String className,
@ProbeMethodName String methodName, @Self Object component,
@Duration long duration) {
if (component != null) {
Field nameField = field(classOf(component), "name", true);
if (nameField != null) {
String name = (String)get(nameField, component);
Profiling.recordExit(prof, concat("org.jboss.seam.Component.",
concat(methodName, concat(":", name))), duration);
}
}
}
https://github.com/jbachorik/btrace2
20. @aragozin#Devoxx #WhyJavaSlow
Garbage Collection is one to blame?
Enable GC logs to see whole picture
-XX:+PrintGCDetails
-XX:+PrintReferenceGC
Common problems
JVM has not enough memory
Intensive allocation of short live object by application
Reference problems
Large object resurrection by finalizes
Too many references
21. @aragozin#Devoxx #WhyJavaSlow
-XX:InitialTenuringThreshold=8
Initial value for tenuring threshold (number of collections
before object will be promoted to old space)
-XX:+UseTLAB Use thread local allocation blocks in eden
-XX:MaxTenuringThreshold=15
Max value for tenuring threshold
-XX:PretenureSizeThreshold=2m Max object size
allowed to be allocated in young space (large objects will be
allocated directly in old space). Thread local allocation
bypasses this check, so if TLAB is large enough object
exciding size threshold still may be allocated in young space.
-XX:+AlwaysTenure Promote all objects surviving
young collection immediately to tenured space
(equivalent of -XX:MaxTenuringThreshold=0)
-XX:+NeverTenure Objects from young space
will never get promoted to tenured space unless survivor
space is not enough to keep them
-XX:+ResizeTLAB Let JVM resize TLABs per thread
-XX:TLABSize=1m Initial size of thread’s TLAB
-XX:MinTLABSize=64k Min size of TLAB
-XX:+UseCMSInitiatingOccupancyOnly
Only use predefined occupancy as only criterion for starting
a CMS collection (disable adaptive behaviour)
-XX:CMSInitiatingOccupancyFraction=70
Percentage CMS generation occupancy to start a CMS cycle.
A negative value means that CMSTriggerRatio is used.
-XX:CMSBootstrapOccupancy=50
Percentage CMS generation occupancy at which to initiate
CMS collection for bootstrapping collection stats.
-XX:CMSTriggerRatio=70
Percentage of MinHeapFreeRatio in CMS generation that is
allocated before a CMS collection cycle commences.
-XX:CMSWaitDuration=30000
Once CMS collection is triggered, it will wait for next young
collection to perform initial mark right after. This parameter
specifies how long CMS can wait for young collection
-XX:+CMSScavengeBeforeRemark
Force young collection before remark phase
-XX:+CMSScheduleRemarkEdenSizeThreshold
If Eden used is below this value, don't try to schedule remark
-XX:CMSScheduleRemarkEdenPenetration=20
Eden occupancy % at which to try and schedule remark pause
-XX:CMSScheduleRemarkSamplingRatio=4
StartsamplingEdentopatleastbeforeyounggenerationoccupancy
reaches1/ofthesizeatwhichweplantoscheduleremark
-XX:+CMSIncrementalMode
Enable incremental CMS mode. Incremental mode was meant for
severs with small number of CPU, but may be used on multicore
servers to benefit from more conservative initiation strategy.
-XX:+CMSClassUnloadingEnabled
If not enabled, CMS will not clean permanent space. You
may need to enable it for containers such as JEE or OSGi.
-XX:ConcGCThreads=2
Number of parallel threads used for concurrent phase.
-XX:ParallelGCThreads=16
Number of parallel threads used for stop-the-world phases.
-XX:+DisableExplicitGC
JVM will ignore application calls to System.gc()
-XX:+ExplicitGCInvokesConcurrent
Let System.gc() trigger concurrent collection instead of full GC
-XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses
Same as above but also triggers permanent space collection.
-XX:PrintCMSStatistics=1
Print additional CMS statistics. Very verbose if n=2.
-XX:+PrintCMSInitiationStatistics
Print CMS initiation details
-XX:+CMSDumpAtPromotionFailure
Dump useful information about the state of the CMS old
generation upon a promotion failure
-XX:+CMSPrintChunksInDump (with optin above)
Add more detailed information about the free chunks
-XX:+CMSPrintObjectsInDump (with optin above)
Add more detailed information about the allocated objects
by Alexey Ragozin – http://blog.ragozin.info
HotSpot JVM options cheatsheet
Young space tenuring
Thread local allocation
Parallel processing
-XX:+CMSOldPLABMin=16 -XX:+CMSOldPLABMax=1024
Min and max size of CMS gen PLAB caches per worker per block size
CMS initiating options
CMS Stop-the-World pauses tuning
Misc CMS options
CMS Diagnostic options
- Options for “deterministic” CMS, they disable some heuristics and
require careful validation
Concurrent Mark Sweep (CMS)
All concrete numbers in JVM options in this card are for illustrational purposes only!
-XX:+ParallelRefProcEnabled Enable parallel
processing of references during GC pause
-XX:SoftRefLRUPolicyMSPerMB=1000 Factor
for calculating soft reference TTL based on free heap size
-XX:OnOutOfMemoryError=… Command to be executed
in case of out of memory.
E.g. “kill -9 %p” on Unix or “taskkill /F /PID %p” on Windows.
-XX:G1HeapRegionSize=32m Size of heap region
-XX:MaxGCPauseMillis=500 Target GC pause duration.
G1 is not deterministic, so no guaranties for GC pause to satisfy this limit.
-XX:G1ReservePercent=10 Percentage of heap to keep free.
Reserved memory is used as last resort to avoid promotion failure.
-XX:G1ConfidencePercent=50 Confidence level
for MMU/pause prediction
-XX:G1HeapWastePercent=10 If garbage level is below
threshold, G1 will not attempt to reclaim memory further
-XX:G1MixedGCCountTarget=8 Target number of mixed
collections after a marking cycle
-XX:InitiatingHeapOccupancyPercent=45
Percentage of (entire) heap occupancy to trigger concurrent GC
Garbage First (G1)
-XX:+CMSParallelRemarkEnabled
Whether parallel remark is enabled (enabled by default)
-XX:+CMSParallelSurvivorRemarkEnabled
Whether parallel remark of survivor space enabled,
effective only with option above (enabled by default)
-XX:+CMSConcurrentMTEnabled
Use multiple threads for concurrent phases.
CMS Concurrency options
-XX:+CMSParallelInitialMarkEnabled
Whether parallel initial mark is enabled (enabled by default)
-XX:CMSTriggerInterval=60000 Periodically triggers_
CMS collection. Useful for deterministic object finalization.
GC options cheat sheet download at
http://blog.ragozin.info/2016/10/hotspot-jvm-garbage-collection-options.html
Young collector Old collectior JVM Flags
Serial (DefNew)
Parallel scavenge (PSYoungGen)
Parallel scavenge (PSYoungGen)
Parallel (ParNew)
Serial (DefNew)
Parallel (ParNew)
Serial Mark Sweep Compact
Serial Mark Sweep Compact (PSOldGen)
Parallel Mark Sweep Compact (ParOldGen)
Concurrent Mark Sweep
Concurrent Mark Sweep
Serial Mark Sweep Compact
-XX:+UseSerialGC
-XX:+UseParallelGC
-XX:+UseParallelOldGC
-XX:+UseParNewGC
-XX:-UseParNewGC1
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:+UseG1GCGarbage First (G1)
1 - Notice minus before UseParNewGC, which is explicitly disables parallel mode
-verbose:gc or -XX:+PrintGC Print basic GC info
-XX:+PrintGCDetails Print more details GC info
-XX:+PrintGCTimeStamps Print timestamps for each GC
event (seconds count from start of JVM)
-XX:+PrintGCDateStamps Print date stamps at garbage
collection events: 2011-09-08T14:20:29.557+0400: [GC...
-Xloggc:<file> RedirectsGCoutputtoafileinsteadofconsole
-XX:+PrintTLAB Print TLAB allocation statistics
-XX:+PrintReferenceGC Print times for special
(weak, JNI, etc) reference processing during STW pause
-XX:+PrintJNIGCStalls Reports if GC is waiting for
native code to unpin object in memory
-XX:+PrintClassHistogramAfterFullGC
Prints class histogram after full GC
-XX:+PrintClassHistogramBeforeFullGC
Prints class histogram before full GC
-XX:+UseGCLogFileRotation Enable GC log rotation
-XX:GCLogFileSize=512m Size threshold for GC log file
-XX:NumberOfGCLogFiles=5 Number GC log files
-XX:+PrintGCCause Add cause of GC in log
-XX:+PrintHeapAtGC Print heap details on GC
-XX:+PrintAdaptiveSizePolicy
Print young space sizing decisions
-XX:+PrintHeapAtSIGBREAK Print heap details on signal
-XX:+PrintPromotionFailure
Print additional information for promotion failure
-XX:+PrintPLAB Print survivor PLAB details
-XX:+PrintOldPLAB Print old space PLAB details
-XX:+PrintGCTaskTimeStamps Print timestamps for
individual GC worker thread tasks (very verbose)
-XX:+PrintGCApplicationStoppedTime
Print summary after each JVM safepoint (including non-GC)
-XX:+PrintGCApplicationConcurrentTime
Print time for each concurrent phase of GC
GC Log rotation
More logging options
-XX:+PrintTenuringDistribution Print detailed
demography of young space after each collection
GC log detail options
-Xms256m or -XX:InitialHeapSize=256m
Initial size of JVM heap (young + old)
-Xmx2g or -XX:MaxHeapSize=2g
Max size of JVM heap (young + old)
-XX:NewSize=64m
-XX:MaxNewSize=64m
Absolute (initial and max) size of
young space (Eden + 2 Survivours)
-XX:NewRatio=3 Alternative way to specify size
of young space. Sets ratio of young vs old space
(e.g. -XX:NewRatio=2 means that young space will be 2 time
smaller than old space, i.e. 1/3 of heap size).
-XX:SurvivorRatio=15 Sets size of single survivor space
relative to Eden space size
(e.g. -XX:NewSize=64m -XX:SurvivorRatio=6 means that each
Survivor space will be 8m and Eden will be 48m).
-XX:MetaspaceSize=512m
-XX:MaxMetaspaceSize=1g
Initial and max size of
JVM’s metaspace space
-Xss256k (size in bytes) or
-XX:ThreadStackSize=256 (size in Kbytes)
Thread stack size
-XX:MaxDirectMemorySize=2g Maximum amount
of memory available for NIO off-heap byte buffers
- Highly recommended option
Memory sizing options
by Alexey Ragozin – http://blog.ragozin.info
Available combinations of garbage collection algorithms in HotSpot JVM
- Highly recommended option
HotSpot JVM options cheatsheetAll concrete numbers in JVM options in this card are for illustrational purposes only!
Java Process Memory
JVM Memory
Java Heap
Non-JVMMemory
(nativelibraries)
Non-Heap
Young Gen
OldGen
Eden
Survivor0
Survivor1
ThreadStacks
Metaspace
CompressedClassSpace
CodeCache
NIODirectBuffers
OtherJVMMemmory
-Xms/-Xmx
-XX:CompressedClassSpaceSize=1g Memory reserved
for compressed class space (64bit only)
-XX:InitialCodeCacheSize=256m
-XX:ReservedCodeCacheSize=512m
Initial size and max
size of code cache area