Scaling Up Java™
Applications On Windows®
Servers
Juarez Junior
Systems Architect – Unisys Global Outsourcing
Presentation Scope.
  We will focus on
   • Systems using 8 or more 32-bit Intel CPUs
   • Systems running
      - Microsoft Windows 2000
      - Microsoft Windows Server 2003
   • HotSpot™-based Java virtual machines
      - Sun JVM 1.4.x
      - Unisys JVM 1.4.1
   • Command line options
  Goal: Identify what you can do to
  boost the performance of Java
  applications without rewriting.
Agenda.
  Java and Scalability
  Cache Invalidation
  Process and Thread Affinity
  Java Heap and Heap Sizing
  Conclusions
Agenda.
  Java and Scalability
  Cache Invalidation
  Process and Thread Affinity
  Java Heap and Heap Sizing
  Conclusions
Definition of Scalability.
    The ability to reliably increase performance by
    increasing numbers of resources
•   Scale out                       •   Scale up
    Increase the number of              Increase the number of CPUs
    systems available to process        and other system resources to
    the workload, all systems           process the workload, all in one
    operating semi-independently.       system (single OS instance).
Scaling Up Java.
                14
                12
 Scale Factor



                10
                 8                                Untuned
                 6                                Tuned
                 4
                 2
                 0
                         11

                              16

                                   21

                                        26

                                             31
                1

                     6




                               CPUs
Agenda.
  Java and Scalability
  Cache Invalidation
  Process and Thread Affinity
  Java Heap and Heap Sizing
  Conclusions
Cache Invalidation.
  Consider two threads in same application accessing
  same data
   • Data is loaded into cache on each processor   Memory
   • One thread updates data                                           Data
       - Data in other cache becomes invalid
  Now consider many threads running on many




                                                                          X
  processors                                                  Data            Data
   • One update can invalidate data in 31 caches
                                                              Cache           Cache
  Similar story with a shared cache
   • External cache shared among 4 processors                    CPU          CPU
  False sharing
   • Threads have independent data that resides in adjacent
     memory
   • Caches are loaded based on cache line size (16-128 bytes)
Agenda.
  Java and Scalability
  Cache Invalidation
  Process and Thread Affinity
  Java Heap and Heap Sizing
  Conclusions
Process and Thread Affinity.
  Contain process/threads to run on
                                                       CPU         CPU
  certain processors
   • Keep threads within processor group if external
    cache                                              CPU         CPU

  Process affinity                                     App1        App2
   • Useful if threads share many common objects
       - J2EE-based application servers                CPU         CPU
  Thread affinity
   • Useful if threads are independent                 CPU         CPU
   • Useful in scale-up scenario
                                                          App3
                                                              OS
Thread Affinity.
                35
                30
 Scale Factor



                25                                No
                20                                Thread
                                                  Affinity
                15                                Thread
                10                                Affinity
                 5
                 0
                         11

                              16

                                   21

                                        26

                                             31
                1

                     6




                               CPUs
Process Affinity.




                    Demo
Process Affinity For Application Servers.
  Assign 2 – 4 CPUs per application server instance
   • Cluster the instances
   • Load balance among them
  Issue:
   FACT: Affinity is set by process name or id
   FACT: Process id is assigned at runtime
   FACT: Process name for every instance is java.exe
   PROBLEM: How do you assign affinity for each instance?
   SOLUTION: Make copies of java.exe (e.g. java01.exe)
               - assign affinity for each copy
               - modify batch files to use copies
Agenda.
  Java and Scalability
  Cache Invalidation
  Process and Thread Affinity
  Java Heap and Heap Sizing
  Conclusions
Java Heap.
  Young Generation
   • Eden
       - Where new objects are created
   • Survivor spaces
       - Where garbage collector places objects that are still in use

  Old Generation
   • Where tenured objects are placed
  References
   • http://java.sun.com/docs/hotspot/gc/ (1.3.1)
   • http://java.sun.com/docs/hotspot/gc1.4.2/
Young Generation Sizing.
  Goal: 1000 business transactions in 12 minutes
  Heap set to: -Xms1024m –Xmx1024m
  18 minute run, 472 seconds (7.8 minutes) in garbage collection
Young Generation Sizing.
-Xms1024m –Xmx1024m -XX:NewSize=300m –XX:MaxNewSize=300m
12 minute run, 25 seconds in garbage collection
 • 70 collections instead of 8000 (no major/full collections)
 • Average collection freed 240MB instead of just 1MB
Survivor Ratio.
  -XX:SurvivorRatio=ratio
   • Eden-size = survivor-size * ratio
   • Eden-size + 2 * survivor-ratio = young-gen-size
  Example:
   • -XX:NewSize=200M –XX:SurvivorRatio=8
   • Eden: 160MB, each Survivor Space: 20MB
  Use –XX:PrintHeapAtGC to monitor
   • Heap after GC invocations=11:
     new generation   total 18432K, used 1604K
     eden space 16384K,    0% used
     from space   2048K, 78% used
     to   space   2048K,   0% used
  Aim for 90-95% usage after collection
   • Typically, with large young generations, use larger ratio (e.g. 32)
Thread-Local Allocation Blocks.
 Each thread gets its own
 piece of Eden                                 Eden

 -XX:UseTLAB                            CPU
                                         Thread 1
 -XX:TLABSize=sizek
  • Common sizes: 64, 128, 256          CPU
                                         Thread 2
  • tlab-size * #-threads < eden-size
                                        CPU
 Use parallel collection                 Thread 3
  • -XX:+UseParallelGC
                                        CPU
 Align TLABs to cache line               Thread 4
 size
Agenda.
  Java and Scalability
  Cache Invalidation
  Process and Thread Affinity
  Java Heap and Heap Sizing
  Conclusions
Pop Quiz
  State 5 things you can do to improve the
  performance of a Java application
   • Set process affinity
   • Use thread affinity, if appropriate
   • Set the young generation size (1/3 to 1/4 size of heap)
       - -XX:NewSize=sizem –XX:MaxNewSize=sizem
       - Try heap of 800MB, young generation of 200MB
   • Set the survivor ratio
       - -XX:SurvivorRatio=ratio
   • Use thread-local allocation blocks
       - -XX:+UseTLAB –XX:TLABSize=sizek
Questions?

Scaling up java applications on windows

  • 1.
    Scaling Up Java™ ApplicationsOn Windows® Servers Juarez Junior Systems Architect – Unisys Global Outsourcing
  • 2.
    Presentation Scope. We will focus on • Systems using 8 or more 32-bit Intel CPUs • Systems running - Microsoft Windows 2000 - Microsoft Windows Server 2003 • HotSpot™-based Java virtual machines - Sun JVM 1.4.x - Unisys JVM 1.4.1 • Command line options Goal: Identify what you can do to boost the performance of Java applications without rewriting.
  • 3.
    Agenda. Javaand Scalability Cache Invalidation Process and Thread Affinity Java Heap and Heap Sizing Conclusions
  • 4.
    Agenda. Javaand Scalability Cache Invalidation Process and Thread Affinity Java Heap and Heap Sizing Conclusions
  • 5.
    Definition of Scalability. The ability to reliably increase performance by increasing numbers of resources • Scale out • Scale up Increase the number of Increase the number of CPUs systems available to process and other system resources to the workload, all systems process the workload, all in one operating semi-independently. system (single OS instance).
  • 6.
    Scaling Up Java. 14 12 Scale Factor 10 8 Untuned 6 Tuned 4 2 0 11 16 21 26 31 1 6 CPUs
  • 7.
    Agenda. Javaand Scalability Cache Invalidation Process and Thread Affinity Java Heap and Heap Sizing Conclusions
  • 8.
    Cache Invalidation. Consider two threads in same application accessing same data • Data is loaded into cache on each processor Memory • One thread updates data Data - Data in other cache becomes invalid Now consider many threads running on many X processors Data Data • One update can invalidate data in 31 caches Cache Cache Similar story with a shared cache • External cache shared among 4 processors CPU CPU False sharing • Threads have independent data that resides in adjacent memory • Caches are loaded based on cache line size (16-128 bytes)
  • 9.
    Agenda. Javaand Scalability Cache Invalidation Process and Thread Affinity Java Heap and Heap Sizing Conclusions
  • 10.
    Process and ThreadAffinity. Contain process/threads to run on CPU CPU certain processors • Keep threads within processor group if external cache CPU CPU Process affinity App1 App2 • Useful if threads share many common objects - J2EE-based application servers CPU CPU Thread affinity • Useful if threads are independent CPU CPU • Useful in scale-up scenario App3 OS
  • 11.
    Thread Affinity. 35 30 Scale Factor 25 No 20 Thread Affinity 15 Thread 10 Affinity 5 0 11 16 21 26 31 1 6 CPUs
  • 12.
  • 13.
    Process Affinity ForApplication Servers. Assign 2 – 4 CPUs per application server instance • Cluster the instances • Load balance among them Issue: FACT: Affinity is set by process name or id FACT: Process id is assigned at runtime FACT: Process name for every instance is java.exe PROBLEM: How do you assign affinity for each instance? SOLUTION: Make copies of java.exe (e.g. java01.exe) - assign affinity for each copy - modify batch files to use copies
  • 14.
    Agenda. Javaand Scalability Cache Invalidation Process and Thread Affinity Java Heap and Heap Sizing Conclusions
  • 15.
    Java Heap. Young Generation • Eden - Where new objects are created • Survivor spaces - Where garbage collector places objects that are still in use Old Generation • Where tenured objects are placed References • http://java.sun.com/docs/hotspot/gc/ (1.3.1) • http://java.sun.com/docs/hotspot/gc1.4.2/
  • 16.
    Young Generation Sizing. Goal: 1000 business transactions in 12 minutes Heap set to: -Xms1024m –Xmx1024m 18 minute run, 472 seconds (7.8 minutes) in garbage collection
  • 17.
    Young Generation Sizing. -Xms1024m–Xmx1024m -XX:NewSize=300m –XX:MaxNewSize=300m 12 minute run, 25 seconds in garbage collection • 70 collections instead of 8000 (no major/full collections) • Average collection freed 240MB instead of just 1MB
  • 18.
    Survivor Ratio. -XX:SurvivorRatio=ratio • Eden-size = survivor-size * ratio • Eden-size + 2 * survivor-ratio = young-gen-size Example: • -XX:NewSize=200M –XX:SurvivorRatio=8 • Eden: 160MB, each Survivor Space: 20MB Use –XX:PrintHeapAtGC to monitor • Heap after GC invocations=11: new generation total 18432K, used 1604K eden space 16384K, 0% used from space 2048K, 78% used to space 2048K, 0% used Aim for 90-95% usage after collection • Typically, with large young generations, use larger ratio (e.g. 32)
  • 19.
    Thread-Local Allocation Blocks. Each thread gets its own piece of Eden Eden -XX:UseTLAB CPU Thread 1 -XX:TLABSize=sizek • Common sizes: 64, 128, 256 CPU Thread 2 • tlab-size * #-threads < eden-size CPU Use parallel collection Thread 3 • -XX:+UseParallelGC CPU Align TLABs to cache line Thread 4 size
  • 20.
    Agenda. Javaand Scalability Cache Invalidation Process and Thread Affinity Java Heap and Heap Sizing Conclusions
  • 21.
    Pop Quiz State 5 things you can do to improve the performance of a Java application • Set process affinity • Use thread affinity, if appropriate • Set the young generation size (1/3 to 1/4 size of heap) - -XX:NewSize=sizem –XX:MaxNewSize=sizem - Try heap of 800MB, young generation of 200MB • Set the survivor ratio - -XX:SurvivorRatio=ratio • Use thread-local allocation blocks - -XX:+UseTLAB –XX:TLABSize=sizek
  • 22.