• Save
Performance tuning jvm
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Performance tuning jvm

on

  • 2,953 views

This presentation was given to the system adminstration team to give them an idea of how GC works and what to look for when there is abottleneck and troubles.

This presentation was given to the system adminstration team to give them an idea of how GC works and what to look for when there is abottleneck and troubles.

Statistics

Views

Total Views
2,953
Views on SlideShare
2,952
Embed Views
1

Actions

Likes
9
Downloads
0
Comments
0

1 Embed 1

http://www.docseek.net 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Performance tuning jvm Presentation Transcript

  • 1.
    • Performance Tuning JVM - a practical approach
    • Prem Kuppumani
    • August 2011
  • 2. JAVA JVM BASICS
    • JDK vs. JRE
      • JRE runs the executables. Small footprint. Recommended in production
      • JDK =JRE + javac + tools + debuggers + dev libraries.
      • JRE main components  JVM + JAVA API
      • JVM components  Class loader + byte code verifier + GC + Security mgr + execution engine + JIT code generator
  • 3. JAVA object
    • What is an object?
      • Object gives properties and behavior.
      • unique properties or state or data + behavior (method) + reusable benefit.
  • 4. Fundamentals and Terminology
    • GC task is to search unreachable objects and reclaim memory.
      • LIVE, GARBAGE and ROOT
      • Garbage is not reachable by application roots : (local variable on stack, thread stacks, registers, static objects, static fields and class variable refs.)
      • Anything not visited is unreachable is GARBAGE
        • Advantage: More reliable, no intentional memory leak.
        • Disadvantage: Stops and Pauses. Consumes resources.
  • 5. GC Algorithms.
    • Different methods and algorithms and technical terms.
      • Mark & Sweep, Mark & Compact, Copying.
        • Mark & Sweep GC
          • Mark does depth first search (DFS) from every root, marks all live objects.
          • Sweep phase each object not marked has its memory reclaimed.
        • Mark & Compact
          • Additionally this does compaction.
          • Avoids fragmentation.
          • Algorithms improved by 3 ways concurrency, parallelization and generational collection.
  • 6. Generational GC
        • Copying GC
          • Faster than M&S because only one phase.
        • Generational GC.
          • Young (short lived) and old (long-lived) objects in separate locations.
          • Most (80% to 90%) instantiated objects are short-lived, and few connections between long-lived objects to short-lived objects.
  • 7. Minor and Major GC
    • Minor Garbage Collection (scavenge)
      • When eden space is filled gc is invoked. Frequent.
    • Major Garbage Collection.
      • When tenured space is filled Full GC is invoked. Mark & Sweep method. Infrequent.
    • Different generations:
      • Young  Eden and Survivor space  S0 & S1 Virtual
      • Tenured  Old and virtual
      • Permanent and virtual
  • 8. JVM GC Tuning
    • Why performance tuning?
        • Wide and diverse range of apps from applets to web services on large servers.
        • There are multiple garbage collectors designed for different requirements.
      • Ergonomics
        • Introduced in java 5.0
        • Automatic choosing of GC algorithm.
        • little or no tuning of command line options needed, by choosing GC, heap size and runtime compiler.
  • 9. JVM GC Tuning
    • Generations
      • Primitive GCs examine every live object.
      • Generational collection exploits the several empirical observed behavior to minimize the work required to reclaim memory space.
      • Weak generational hypothesis which says most objects are short lived.
    • Performance Considerations
      • Throughput - % of time not spent in GC.
      • Pauses – times when application not responding.
    • Sizing the Generations
      • Total Heap: -Xms=-Xmx or not?
      • Young gen: -XXNewRatio=3 or NewSize and MaxSize
      • Survivor Space: -XX: SurvivorRatio=6 , Use –XX:+PrintTenuringDistribution
  • 10. JVM GC Tuning
    • Different Collector options and choosing the right one.
      • Give JVM a chance, adjust the heap size to improve.
        • -XX:+UseSerialGC Single threaded, relatively efficient and small data sets.
        • -XX:+UseParallelGC (throughput collector) multithreaded, med. to large data sets.
        • -XX:+UseParallelOldGC parallel compaction in old space. Better scalable.
        • -XX:+UseConcMarkSweepGC (low pause), comparatively less throughput, chances of fragmentation. One or two cores use incremental mode.
        • -XX:+UseTrainGC train low pause. No more in development.
        • G1 - introduced lately in 1.6 - uses page densities, picks sparse pages and collects it and moves popular objects which is connected to so many other objects. Goal is to have 0 flags.
    • Parallel Collector.
      • Characters: throughput, generational, multithreaded. Sync overhead.
      • -XX:ParallelGCThreads=<N> . Too many threads may cause fragments.
      • Ergonomics: Auto tune based on…the following order
        • Max GC pause time -XX:MaxGCPauseMillis=<N>
        • Throughput -XX:GCTimeRatio=<N> 1/(1+ <N>)
        • Heap size -Xmx<N>
  • 11. JVM GC Tuning
      • Young and Old gen adjustments.
        • -XX:YoungGenerationSizeIncrement=<Y> XX:TenuredGenerationSizeIncrement=<T> XX:AdaptiveSizeDecrementScaleFactor=<D>
        • Growth increment is X% the shrinking decrement is X/D%
        • For max pause time goal, size of one generation is shrunk at a time.
        • For throughput goal, size of both generations are increased.
    • Parallel Compaction.
      • Is done with marking phase, summary phase and compaction phase .
      • Objects are not moved around in dense prefix region.
    • The Concurrent Collector.
      • Characters: low pause, generational, multithreaded.
      • Uses a separate GC threads to trace live objects, concurrently.
      • 1 st phase: initial mark STW, single thread marks the first level. STW.
      • 2 nd phase: concurrent mark drills deep, longer, multi-threaded (trace), single threaded (retrace). no STW.
      • 3 rd phase: remark tracing bulk of live objects that changed, concurrent, multithreaded. STW.
      • 4 th phase: c oncurrent sweep app runs. Single threaded. No compaction. No STW.
      • Multiple pointers for free memory categorized by size. Keeps count of the requested lengths to determine popular sized objects.
      • Minor collections can interleave with on-going major collection. STW.
  • 12. JVM GC Tuning
    • Cont …
      • Tradeoff is processing time, which will otherwise be used by application.
      • Concurrent mode failure: inability to complete concurrent collection.
      • Floating Garbage: new Garbage that happens while collector is in action.
      • Tuning options for CMS:
        • -XX:CMSInitiatingOccupancyFraction=<N>
      • Scheduling Pauses: Concurrent collector attempts to schedule a remark pause between the previous and next young gen pauses.
      • Incremental mode: divides the work done concurrently by the collector into small chunks of time which are scheduled between young generation collections.
      • Dutycycle and Auto Pacing: controls the amt of work allowed to do. Auto pacing adjusts based on collected stats.
        • -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10
  • 13. JVM OS related tuning
    • Other tuning areas.
      • Network tcp tuning
        • net.core.rmem_max = 33554432 net.core.wmem_max = 33554432
        • Ifconfig ethX mtu 9000 (test first!)
      • OS memory tuning –XX:+UseLargePages –XX:+LargePageSizeInBytes=<xm>
      • Filesystem tuning: noatime, nodiratime
      • I/O scheduler: noop or deadline
      • cpuaffinity and OS stack size.
    • Reading a GC log.
      • Turn on –verbose:gc –XX:+PrintGCDetails
      • Look at the live example. Next slide for explanation.
  • 14. GC Log
    • GC log reading.
      • Sample: 2011-08-15T14:03:59.324-0400: 13.572: [GC [1 CMS-initial-mark: 199434K(3481600K)] 203546K(3686336K), 0.0027280 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]
      • 2011-08-15T14:03:59.327-0400: 13.575: [CMS-concurrent-mark-start]
      • 2011-08-15T14:03:59.508-0400: 13.757: [CMS-concurrent-mark: 0.112/0.181 secs] [Times: user=0.69 sys=0.22, real=0.18 secs]
      • 2011-08-15T14:03:59.508-0400: 13.757: [CMS-concurrent-preclean-start]
      • 2011-08-15T14:03:59.522-0400: 13.771: [CMS-concurrent-preclean: 0.014/0.014 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
      • 2011-08-15T14:03:59.522-0400: 13.771: [CMS-concurrent-abortable-preclean-start]
      • Total time for which application threads were stopped: 0.0004990 seconds
      • 2011-08-15T14:04:00.234-0400: 14.483: [CMS-concurrent-abortable-preclean: 0.113/0.712 secs] [Times: user=1.36 sys=0.20, real=0.71 secs]
      • 2011-08-15T14:04:00.236-0400: 14.484: [GC[YG occupancy: 106943 K (204736 K)]14.484: [Rescan (parallel) , 0.0269510 secs]14.511: [weak refs processing, 0.0858930 secs]14.597: [class unloading, 0.0061610 secs]14.604: [scrub symbol & string tables, 0.0060430 secs] [1 CMS-remark: 199434K(3481600K)] 306377K(3686336K), 0.1282460 secs] [Times: user=0.29 sys=0.00, real=0.13 secs]
      • Total time for which application threads were stopped: 0.1296990 seconds
      • 2011-08-15T14:04:00.365-0400: 14.613: [CMS-concurrent-sweep-start]
      • Total time for which application threads were stopped: 0.0013380 seconds2011-08-15T14:04:00.610-0400: 14.858: [CMS-concurrent-sweep: 0.221/0.245 secs] [Times: user=0.63 sys=0.06, real=0.24 secs]
      • 2011-08-15T14:04:00.610-0400: 14.858: [CMS-concurrent-reset-start]
      • 2011-08-15T14:04:00.689-0400: 14.937: [CMS-concurrent-reset: 0.079/0.079 secs] [Times: user=0.09 sys=0.08, real=0.08 secs]
      • Total time for which application threads were stopped: 0.0008140 seconds
      • 2011-08-15T14:04:00.872-0400: 15.120: [GC [1 CMS-initial-mark: 191277K(3481600K)] 388680K(3686336K), 0.2749030 secs] [Times: user=0.27 sys=0.00, real=0.28 secs]
      • Total time for which application threads were stopped: 0.2757330 seconds
      • 2011-08-15T14:04:01.147-0400: 15.395: [CMS-concurrent-mark-start]
    STW STW CMSScheduleRemarkEdenSizeThreshold CMSScheduleRemarkEdenPenetration
  • 15. Diagnostic approach
    • Tenure distribution
      • -XX:+PrintTenuringDistribution -XX:TargetSurvivorRatio= <x> -XX: MaxTenuringThreshold=<x>
      • Threshold accounts for number of times an object is copied before it is tenured.
        • (survivor_capacity * TargetSurvivorRatio) / 100 * sizeof(a pointer)
      • Example: 1125.353: [GC 1125.353: [ParNew
      • Desired survivor size 86232268 bytes, new threshold 6 (max 15)
      • - age 1: 50754696 bytes, 50754696 total
      • - age 2: 12147696 bytes, 62902392 total
      • - age 3: 12295552 bytes, 75197944 total
      • - age 4: 6537136 bytes, 81735080 total
      • - age 5: 2435944 bytes, 84171024 total
      • - age 6: 3013488 bytes, 87184512 total
      • - age 7: 627368 bytes, 87811880 total
      • - age 8: 999536 bytes, 88811416 total
      • - age 9: 924656 bytes, 89736072 total
      • - age 10: 1811480 bytes, 91547552 total
      • : 554848K->89528K(561792K), 0.5317388 secs] 607743K->146164K(1217152K) icms_dc=18 , 0.5326526 secs]
  • 16. Diagnostic approach (cont…)
    • Sizing the young generation. Increase or decrease the –Xmn<x>m Example ( courtesy www.oracle.com): Before sizing newgen .
    • [GC [DefNew: 4032K->64K(4032K), 0.0429742 secs] 9350K->7748K(32704K), 0.0431096 secs]
    • [GC [DefNew: 4032K->64K(4032K), 0.0403446 secs] 11716K->10121K(32704K), 0.0404867 secs]
    • [GC [DefNew: 4032K->64K(4032K), 0.0443969 secs] 14089K->12562K(32704K), 0.0445251 secs]
    • ========================================================
    • After sizing newgen .
    • [GC [DefNew: 8128K->64K(8128K), 0.0453670 secs] 13000K->7427K(32704K), 0.0454906 secs]
    • [GC [DefNew: 8128K->64K(8128K), 0.0388632 secs] 15491K->9663K(32704K), 0.0390013 secs]
    • [GC [DefNew: 8128K->64K(8128K), 0.0388610 secs] 17727K->11829K(32704K), 0.0389919 secs]
    • ========================================================
    • Gone overboard .
    • [GC [DefNew: 16000K->16000K(16192K), 0.0000574 secs][Tenured: 2973K->2704K(16384K), 0.1012650 secs] 18973K->2704K(32576K), 0.1015066 secs]
    • [GC [DefNew: 16000K->16000K(16192K), 0.0000518 secs][Tenured: 2704K->2535K(16384K), 0.0931034 secs] 18704K->2535K(32576K), 0.0933519 secs]
    • [GC [DefNew: 16000K->16000K(16192K), 0.0000498 secs][Tenured: 2535K->2319K(16384K), 0.0860925 secs] 18535K->2319K(32576K), 0.0863350 secs]
    young Entire heap
  • 17. Diagnostic approach (cont…)
    • How to determine if the OLD gen is big or small?
    • Example (courtesy www.oracle.com ):
    • ----------------------------------------------------------------------------------------------------------------------------------------------------- For the 32MB heap collections happen 10s to11s apart.
    • 111 .042: [GC 111.042: [DefNew: 8128K->8128K(8128K), 0.0000505 secs]111.042: [Tenured: 18154K->2311K(24576K), 0.1290354 secs] 26282K->2311K(32704K), 0.1293306 secs]
    • 122 .463: [GC 122.463: [DefNew: 8128K->8128K(8128K), 0.0000560 secs]122.463: [Tenured: 18630K->2366K(24576K), 0.1322560 secs] 26758K->2366K(32704K), 0.1325284 secs]
    • 133.896: [GC 133.897: [DefNew: 8128K->8128K(8128K), 0.0000443 secs]133.897: [Tenured: 18240K->2573K(24576K), 0.1340199 secs] 26368K->2573K(32704K), 0.1343218 secs]
    • 144.112: [GC 144.112: [DefNew: 8128K->8128K(8128K), 0.0000544 secs]144.112: [Tenured: 16564K->2304K(24576K), 0.1246831 secs] 24692K->2304K(32704K), 0.1249602 secs]
    • -----------------------------------------------------------------------------------------------------------------------------------------------------
    • For the 64 Mbyte heap the major collections are occurring about every 30 seconds.
    • 90.597 : [GC 90.597: [DefNew: 8128K->8128K(8128K), 0.0000542 secs]90.597: [Tenured: 49841K->5141K(57344K), 0.2129882 secs] 57969K->5141K(65472K), 0.2133274 secs]
    • 120.899 : [GC 120.899: [DefNew: 8128K->8128K(8128K), 0.0000550 secs]120.899: [Tenured: 50384K->2430K(57344K), 0.2216590 secs] 58512K->2430K(65472K), 0.2219384 secs]
    • 153.968: [GC 153.968: [DefNew: 8128K->8128K(8128K), 0.0000511 secs]153.968: [Tenured: 51164K->2309K(57344K), 0.2193906 secs] 59292K->2309K(65472K), 0.2196372 secs]
    • Conclusion : bigger heap better throughput, smaller heap is low pause time.
  • 18. Diagnostic approach (cont…)
    • Now make the YOUNG gen is bigger, by increasing the heap to 256MB and 64MB young gen size.
    • [GC [DefNew: 64575K->959K(64576K), 0.0457646 secs] 196016K->133633K(261184K), 0.0459067 secs]
    • [GC [DefNew: 64575K->64575K(64576K), 0.0000573 secs][Tenured: 132673K->5437K(196608K), 0.4959855 secs] 197249K->5437K(261184K), 0.4962533 secs]
    • [GC [DefNew: 63616K->959K(64576K), 0.0360258 secs] 69053K->7600K(261184K), 0.0361663 secs]
    • After tuning if the minor GC pauses are high try -XX:+UseParallelGC followed by -XX:+ UseAdaptiveSizing . Alternatively try using –XX:+UseParNewGC.
    • If you want to address scalability use -XX:+UseParallelOldGC.
    • After tuning if the major GC pauses are high try –XX:+UseConcMarkSweepGC with and without –XX:+UseParNewGC .
    • To reduce the pause times further (especially for 1 or 2 core boxes) try adding i-cms. - XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:CMSIncrementalDutyCycleMin=0 -XX:+CMSIncrementalDutyCycle=10
  • 19. Tuning guidelines
    • Dos:
      • Start with a benchmark or baseline.
      • Find and tune only the bottlenecks.
      • Change one variable at a time and run a test to record.
    • Don’ts:
      • Don’t tune without base lining or benchmarking.
    • Performance trade-off
      • Tuning one parameter may cause another bottleneck
    • Performance metrics
      • Throughput -% of total time not spent in garbage collection
      • Overhead – % of time spent in GC.
      • Pause time – duration of app not responding while GC
      • GC Frequency – how often GC is initiated.
      • Footprint – heap size
  • 20. Coarse tuning shortcuts and tips
    • Serial GC suitable for small data sets.
    • Throughput and low pause meant for medium to large data sets.
    • General rule for sizing.
      • Allocate 20% to 35% for young space.
      • Stateless needs more new gen space.
      • Stateful needs more tenured space.
    • If you see Full GC (tenured space) happening too frequent adjust the –Xmn to a smaller value.
    • Adjust the heap space – bigger for throughput – smaller for low pause.
  • 21. Brain dump Tuning Tips
    • Sizing
    • -Xmx == -Xms or not ?
    • young Gen: use -Xmn for more controlled and expected and predictable performance
    • Choose a GC
    • Serial - new gen and old gen uses serial algorithm.
    • Parallel GC (default) - Parallel scavenging + Serial old gen algorithm.
    • UseParallelOldGC : Parallel scavenge + Parallel Old
    • UseCMS: Parallel newgen, CMS old, Serial OLD
    • G1 - introduced lately in 1.6 - uses page densities picks sparse pages and collects it and moves popular objects which is connected to so many other objects. Goal is to have 0 flags.
    • How to read GC logs.
    • When you see &quot;Full GC&quot; its STW.
    • Initial mark, Rescan/WeakRef/Remark triggers STW
    • Promotion failures and CMF
    • Tuning CMS
    • Avoid promotion too frequent, to avoid fragmentation.
    • Use TenuringThreshold - Avoid situation of never tenure.
    • Size the generations
    • Minimize GC times are a function of Live set
    • Old gen should host long lived state comfortably.
    • Avoid CMS Initiating heuristic -XX:+UseCMSInitiationOccupancyOnly
    • Use Concurrent
    • GC Threads
    • Parallelize on multicore processors.
    • -XX:parallelGCThreads=6
    • Strategy A: Tune min GCs & let application data die in eden
    • Fragmentation
    • Performance degrades over time
  • 22. Summary & Refs & Resources
    • Remember whatever option that we introduce in jvm tuning is only a suggestion and its not guaranteed to follow.
    • Some tools for evaluation
      • jmap (Solaris and Linux only) prints memory related stats for running jvm or core file.
      • jstat information on performance and resource consumptions of running application. Particularly for heap sizing and garbage collection.
      • HPROF: Heap Profiler presents CPU usage, heap stats and dump states of monitors and threads. Useful for analyzing performance, lock contention and memory leaks.
      • HAT: Heap Analysis Tool for debugging unintentional object retention .
    • The above presentation explains the way I understood GC and if there is any correction to it, please send email to [email_address]
    • Refs and Resources.
      • https://java.sun.com/j2se/reference/whitepapers
      • http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html