Daryl Maier (IBM Canada Lab), Anil Kumar (Intel Corporation)1st October, 2012Java Application Design Practices to AvoidWhe...
Important Disclaimers    THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.    WH...
Introduction to the speakers    Daryl Maier       – 12 years experience developing and deploying Java SDKs at IBM Canada L...
CreditsThe contents of this presentation were jointly produced with                         Elena Sayapina. Java Performan...
What this talk is about…    Learn what contributes to higher transactional response times within a Java application    How...
Service Level Agreements    SLA == Service Level Agreement     – A commitment to provide a service that meets a prescribed...
Response time    Measure of time needed to complete a transaction in response to a request to do work    Lower response ti...
How do you measure response time?     Be sure what you’re measuring is the response time you’re interested in             ...
How do you measure response time?     Be sure what you’re measuring is the response time you’re interested in             ...
How do you measure response time?     Be sure what you’re measuring is the response time you’re interested in             ...
How do you measure response time?     Be sure what you’re measuring is the response time you’re interested in             ...
How do you measure response time?     Make sure your timing measurement isn’t part of the response time!     Be aware of a...
How do you measure response time?                             Sample of transaction response times                        ...
Influences on response time are not localized         Application         Framework                                      Y...
SPECjbb2012     Next generation Java business logic benchmark from SPEC     Business model is a supermarket supply chain: ...
Application design influences response time                                  • design for scalability            Applicati...
Design for scalability     Scalability : the ability to increase throughput as more resources are applied     Prepare your...
Use the java/util/concurrent package     j/u/c introduced in Java 5, additional features in Java 6/7     Contains building...
Avoid unnecessary Java synchronization     Required for correctness so it can’t always be done     Built-in Java synchroni...
Avoid excessive object allocations     Understand the effect of object creation on the heap and the strain on garbage coll...
Case study: SPECjbb2012     Example of design choices around receipt storage in the benchmark                             ...
Case study: SPECjbb2012     Example of design choices where background tasks become more heavy      – Increase in backgrou...
Reduce data access latency     Often a problem in client/server systems     Cache data locally to avoid remote communicati...
Case study: SPECjbb2012Performance effects of caching supermarket data over not caching it                                ...
Application frameworks        Application                         • application containers (e.g., application            F...
Java virtual machine tuning         Application         Framework                              • garbage collection       ...
Java virtual machine architecture     User Code               Debugger                Profilers                     Java A...
Garbage collection     Determine the best garbage collection policy to use for your application      – Often a response ti...
Case study: SPECjbb2012     Example showing the effect of different GC policies and heap tunings                          ...
64-bit addressing     Heap addressability beyond 32-bits (> 3.5GB)      – Common for applications with large in-memory wor...
Operating system tuning        Application        Framework         Java VM                            • large pages      ...
Large data and code pages     OS paging architecture requires memory addresses to be mapped to more granular “pages”     t...
Case study: SPECjbb2012     Example showing the effect of large pages                                                 • In...
Thread scheduling     Context switches      – Voluntary (e.g., preemption during locking)      – Involuntary (e.g., too ma...
Hardware tuning        Application        Framework         Java VM     Operating System                        • power ma...
Hardware tuning     Power management     Insufficient resources       – Physical memory, amount and latency       – I/O st...
Know your Intel® Xeon® Processor Family37                                        © 2012 IBM Corporation
Know your Intel® Xeon® Processor SKU:38                                      © 2012 IBM Corporation
Case study: SPECjbb2012     Example showing the effect of 8 cores vs. 4 cores      – Assumes application leveraging parall...
Leveraging your hardware topology     Understand the underlying hardware topology to reduce latency and increase throughpu...
Evaluating your response time Even though you may be achieving an acceptable SLA are there tell-tale signs that you could ...
Questions?42           © 2012 IBM Corporation
References     Get Products and Technologies      – IBM Java Runtimes and SDKs:          • https://www.ibm.com/developerwo...
Copyright and Trademarks© IBM Corporation 2012. All Rights Reserved.IBM, the IBM logo, and ibm.com are trademarks or regis...
SPECjbb2012 architecture Single Application Set                Multi-Application Set                                      ...
SPECjbb2012 architecture                            SP 1   SP 2                     SM 2                                  ...
Be aware of the impact of logging and tracing     Tracing and logging events from your application can have hidden costs  ...
Upcoming SlideShare
Loading in …5
×

Con5388 maier

453 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Con5388 maier

  1. 1. Daryl Maier (IBM Canada Lab), Anil Kumar (Intel Corporation)1st October, 2012Java Application Design Practices to AvoidWhen Dealing with Sub-100 ms SLAs © 2012 IBM Corporation
  2. 2. Important Disclaimers THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILST EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS-IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESSED OR IMPLIED. ALL PERFORMANCE DATA INCLUDED IN THIS PRESENTATION HAVE BEEN GATHERED IN A CONTROLLED ENVIRONMENT. YOUR OWN TEST RESULTS MAY VARY BASED ON HARDWARE, SOFTWARE, OR INFRASTRUCTURE DIFFERENCES. ALL DATA INCLUDED IN THIS PRESENTATION ARE MEANT TO BE USED ONLY AS A GUIDE. IN ADDITION, THE INFORMATION CONTAINED IN THIS PRESENTATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM, WITHOUT NOTICE. IBM AND ITS AFFILIATED COMPANIES SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF: – CREATING ANY WARRANT OR REPRESENTATION FROM IBM, ITS AFFILIATED COMPANIES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS2 © 2012 IBM Corporation
  3. 3. Introduction to the speakers Daryl Maier – 12 years experience developing and deploying Java SDKs at IBM Canada Lab – Recent work focus: • X86 Java just-in-time compiler development and performance • Java benchmarking – Contact: maier@ca.ibm.com Anil Kumar – 10 years experience in server Java performance ensuring best customer experience on all Intel Architecture based platforms – Contact: anil.kumar@intel.com3 © 2012 IBM Corporation
  4. 4. CreditsThe contents of this presentation were jointly produced with Elena Sayapina. Java Performance / Intel Intel and IBM collaborate to ensure the best user experience across all Intel Architecture based platforms.4 © 2012 IBM Corporation
  5. 5. What this talk is about… Learn what contributes to higher transactional response times within a Java application How to measure response time Java application design practices that lead to lower response times How to tune the environment in which your application runs for better response time How to determine if you can achieve an even better response time Lots of practical examples5 © 2012 IBM Corporation
  6. 6. Service Level Agreements SLA == Service Level Agreement – A commitment to provide a service that meets a prescribed level of performance – Can be informal or contractually obligated CPU Response Time Availability Storage Concurrent ? Users6 © 2012 IBM Corporation
  7. 7. Response time Measure of time needed to complete a transaction in response to a request to do work Lower response times generally have positive effects Different perceptions of response time: user interface, real time event, service level commitments, … Isn’t improving response time simply a matter of increasing throughput? Not necessarily…7 © 2012 IBM Corporation
  8. 8. How do you measure response time? Be sure what you’re measuring is the response time you’re interested in Transaction A Requests Responses Transaction B Request Transaction Response Transaction C Queue Queue Queue Executor Thread Pool8 © 2012 IBM Corporation
  9. 9. How do you measure response time? Be sure what you’re measuring is the response time you’re interested in Transaction A Requests Responses Transaction B Request Transaction Response Transaction C Queue Queue Queue Executor Thread Pool Measuring response time from request made to response received?9 © 2012 IBM Corporation
  10. 10. How do you measure response time? Be sure what you’re measuring is the response time you’re interested in Transaction A Requests Responses Transaction B Request Transaction Response Transaction C Queue Queue Queue Executor Thread Pool Measuring response time from transaction submitted to response received?10 © 2012 IBM Corporation
  11. 11. How do you measure response time? Be sure what you’re measuring is the response time you’re interested in Transaction A Requests Responses Transaction B Request Transaction Response Transaction C Queue Queue Queue Executor Thread Pool Measure time to complete the transaction?11 © 2012 IBM Corporation
  12. 12. How do you measure response time? Make sure your timing measurement isn’t part of the response time! Be aware of accuracy and precision of Java timing methods – System.nanotime() – System.currentTimeMillis() – …and don’t use too many timers! Beware of clock skew in virtual environments – May need to keep time on an external system12 © 2012 IBM Corporation
  13. 13. How do you measure response time? Sample of transaction response times for an IR of 3000 ops/sec. Most long transactions above 95th percentile.13 © 2012 IBM Corporation
  14. 14. Influences on response time are not localized Application Framework You must design and tune the entire stack in order to achieve your Java VM response time targets Operating System Hardware14 © 2012 IBM Corporation
  15. 15. SPECjbb2012 Next generation Java business logic benchmark from SPEC Business model is a supermarket supply chain: headquarters, supermarkets, suppliers Scalable, self-injecting workload with multiple supported configurations Customer relevant technologies: security, XML, JDK 7 features Metrics: max-jOPs (throughput) and critical-jOPs (response time) Will be used for case studies in this presentation15 © 2012 IBM Corporation
  16. 16. Application design influences response time • design for scalability Application • eliminate serial bottlenecks • use appropriate JCL packages Framework • avoid needless synchronization • avoid excessive object allocations Java VM • cache data locally • use non-blocking I/O Operating System • be careful with logging and tracing Hardware16 © 2012 IBM Corporation
  17. 17. Design for scalability Scalability : the ability to increase throughput as more resources are applied Prepare your application to run on modern multi-core architectures Create more parallelism in your application and eliminate serial bottlenecks – Change algorithms Organize your application into parallel tasks – Leverage TaskExecutor framework for high-level tasks – Consider ForkJoin in Java 7 for fine-grained task decomposition17 © 2012 IBM Corporation
  18. 18. Use the java/util/concurrent package j/u/c introduced in Java 5, additional features in Java 6/7 Contains building blocks for developing scalable applications – Uses state-of-the-art concurrency algorithms using non-blocking sync algorithms – More variety in locking operations (Lock interface, multiple Conditions) – Atomic variables (atomic math ops such as increment, test-and-set) – Concurrent collections – Coarse and fine-grained task management Use j/u/c classes as base classes for new data structures Optimized by modern JVMs18 © 2012 IBM Corporation
  19. 19. Avoid unnecessary Java synchronization Required for correctness so it can’t always be done Built-in Java synchronization is coarse grained and can inhibit scalability – Useful when true mutual exclusion is the goal – JVMs can help Strongly consider using j/u/c for finer-grained locking – Building blocks for scalable locking Eliminate contended locks Use volatile fields when appropriate – No locking – May be suitable for single writer, multiple-reader (e.g., time stamps)19 © 2012 IBM Corporation
  20. 20. Avoid excessive object allocations Understand the effect of object creation on the heap and the strain on garbage collection Consider hoisting allocations from loops Consider using weak/soft references when appropriate – Useful for caches, object metadata, or easily rematerializable data Be aware of immutable classes that implicitly return new objects – e.g., BigDecimal, Integer20 © 2012 IBM Corporation
  21. 21. Case study: SPECjbb2012 Example of design choices around receipt storage in the benchmark • Some impact on throughput • No impact on median response time • Significant impact on 99th-percentile response time21 © 2012 IBM Corporation
  22. 22. Case study: SPECjbb2012 Example of design choices where background tasks become more heavy – Increase in background task of Data Mining (DM) • Some impact on throughput • No impact on median response time • Significant impact on 99th-percentile response time22 © 2012 IBM Corporation
  23. 23. Reduce data access latency Often a problem in client/server systems Cache data locally to avoid remote communication – Particularly effective with data unlikely to change Pitfall : Tradeoff between caching too much to improve remote access latency and accumulating too much that strains garbage collection – an example of where local benefits to throughput have broader negative effects Use Java NIO (Java SE 1.4) and NIO2 (Java SE 7) – Can leverage high performance features Carefully consider non-blocking, unbounded data structures (e.g., ConcurrentLinkedQueue)23 © 2012 IBM Corporation
  24. 24. Case study: SPECjbb2012Performance effects of caching supermarket data over not caching it • Throughput reduces by half • Minor impact on median response time • Some impact on 99th-percentile response time24 © 2012 IBM Corporation
  25. 25. Application frameworks Application • application containers (e.g., application Framework servers, Eclipse) • 3rd party packages (e.g., Apache Java VM commons), Grizzly • understand thread management and local caching policies Operating System Hardware25 © 2012 IBM Corporation
  26. 26. Java virtual machine tuning Application Framework • garbage collection Java VM • heap tuning • 64-bit addressing Operating System Hardware26 © 2012 IBM Corporation
  27. 27. Java virtual machine architecture User Code Debugger Profilers Java Application Code Java API JVMTI JSE6 JSE6 Harmony User e.g. Java6/Java7 Classes Classes Classes Natives GC / JIT / Class Lib. Natives Java Native Interface (JNI) Core VM (Interpreter, Verifier, Stack Walker) Java Runtime Trace & Dump Engines Environment e.g. J9 R26 Port Library (Files, Sockets, Memory) Thread Library Operating AIX Linux Windows z/OS Systems / PPC-32 x86-32 PPC-32 zArch-31 x86-32 zArch-31 Architecture PPC-64 x86-64 PPC-64 zArch-64 x86-64 zArch-64 = User Code = Java Platform API = VM-aware = Core VM27 © 2012 IBM Corporation
  28. 28. Garbage collection Determine the best garbage collection policy to use for your application – Often a response time vs. throughput tradeoff Most GC policies involve a “stop-the-world” phase that works against response times – “throughput” policies tend to incur longer pauses but fewer interruptions – “concurrent” policies lower average pause times by completing some tasks concurrently – “balanced” policies carve heap into regions to improve parallelism and reduce pauses Tune your heap parameters -verbose:gc to correlate GC events with application events28 © 2012 IBM Corporation
  29. 29. Case study: SPECjbb2012 Example showing the effect of different GC policies and heap tunings • Small throughput reduction from ConMarkSweep • • No impact on median response time • ConMarkSweep 99th-percentile response time higher but consistent29 © 2012 IBM Corporation
  30. 30. 64-bit addressing Heap addressability beyond 32-bits (> 3.5GB) – Common for applications with large in-memory working set (e.g., databases, object caches) 64-bit addressing is a less efficient representation than 32-bit – Cache & TLB effects stress hardware Solution: build a 64-bit JVM with near 32-bit efficiency – Use 32-bit values (offsets) to represent object fields – With scaling, between 4 GB and 32 GB can be addressed Enable with –XX:+UseCompressedOops or -Xcompressedrefs30 © 2012 IBM Corporation
  31. 31. Operating system tuning Application Framework Java VM • large pages Operating System • thread scheduling Hardware31 © 2012 IBM Corporation
  32. 32. Large data and code pages OS paging architecture requires memory addresses to be mapped to more granular “pages” that are mapped to physical memory – Translation Lookaside Buffers (TLBs) – Using larger page sizes increases TLB effectiveness Large pages must be enabled by the OS – BUT require enough physical pages to be allocated together to be most effective Modern JVMs place both heap and compiled code in large pages -Xlp (J9) or –XX:+UseLargePages (HotSpot)32 © 2012 IBM Corporation
  33. 33. Case study: SPECjbb2012 Example showing the effect of large pages • Increase throughput by ~13% • No impact on median response time • Helps in keeping 99th-percentile response time lower at higher load33 © 2012 IBM Corporation
  34. 34. Thread scheduling Context switches – Voluntary (e.g., preemption during locking) – Involuntary (e.g., too many active threads) Watch for thread migration34 © 2012 IBM Corporation
  35. 35. Hardware tuning Application Framework Java VM Operating System • power management Hardware • BIOS settings35 © 2012 IBM Corporation
  36. 36. Hardware tuning Power management Insufficient resources – Physical memory, amount and latency – I/O storage latency • RAID • SSDs – Network I/O bandwidth Tune your BIOS settings carefully – Hyperthreading – Prefetching – Power management36 © 2012 IBM Corporation
  37. 37. Know your Intel® Xeon® Processor Family37 © 2012 IBM Corporation
  38. 38. Know your Intel® Xeon® Processor SKU:38 © 2012 IBM Corporation
  39. 39. Case study: SPECjbb2012 Example showing the effect of 8 cores vs. 4 cores – Assumes application leveraging parallelism of multiple cores • Increases throughput by ~100% • No impact on median response time • 8 cores deliver much lower 99th-percentile response39 © 2012 IBM Corporation
  40. 40. Leveraging your hardware topology Understand the underlying hardware topology to reduce latency and increase throughput For NUMA, affinitize JVMs to core/memory subsets to improve performance – Improve NUMA performance – Optimize the cache hierarchy of the underlying processors • Increases throughput by ~12% • No impact on median response time • Much lower 99th-percentile response40 © 2012 IBM Corporation
  41. 41. Evaluating your response time Even though you may be achieving an acceptable SLA are there tell-tale signs that you could be achieving even better? – Lack of multi-threadedness in your application – Lock contention – Low CPU utilization – Excessive time (>10%) being spent in OS kernel Tooling to help diagnose response time issues – IBM HealthCenter – What is my JVM doing? Is everything ok? – Why is my application running slowly? Why is it not scaling? – Am I using the right options? – Garbage Collector and Memory Visualizer • Online analysis of heap usage, pause times, many others – Memory Analyzer • Offline tool providing insight into Java heaps © 2012 IBM Corporation
  42. 42. Questions?42 © 2012 IBM Corporation
  43. 43. References Get Products and Technologies – IBM Java Runtimes and SDKs: • https://www.ibm.com/developerworks/java/jdk/ – IBM Monitoring and Diagnostic Tools for Java: • https://www.ibm.com/developerworks/java/jdk/tools/ – SPEC benchmarking • http://www.spec.org Learn – IBM Java InfoCenter: • http://publib.boulder.ibm.com/infocenter/javasdk/v6r0/index.jsp Discuss – IBM Java Runtimes and SDKs Forum: • http://www.ibm.com/developerworks/forums/forum.jspa?forumID=367&start=043 © 2012 IBM Corporation
  44. 44. Copyright and Trademarks© IBM Corporation 2012. All Rights Reserved.IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., and registered in many jurisdictions worldwide.Other product and service names might be trademarks of IBM or other companies.A current list of IBM trademarks is available on the Web – see the IBM “Copyright and trademark information” page at URL: www.ibm.com/legal/copytrade.shtml44 © 2012 IBM Corporation
  45. 45. SPECjbb2012 architecture Single Application Set Multi-Application Set TxI BE Ctr Ctr TxI BE l l TxI BE Controller (Ctrl) – Controls and evaluates the runs Group Transaction Injector (TxI) – Issues “Requests” at a given rate – Measures response time by sending probe requests Backend SUT (BE) – Some % of transactions go across BEs exercising inter-JVM process communication45 © 2012 IBM Corporation
  46. 46. SPECjbb2012 architecture SP 1 SP 2 SM 2 Backend 1 HQ SM 1 Group 1 SM: Supermarket SM 2 HQ: Headquarters SP: Supplier HQ Backend 2 SM 1 SP 1 SP 2 Group 246 © 2012 IBM Corporation
  47. 47. Be aware of the impact of logging and tracing Tracing and logging events from your application can have hidden costs – I/O latency – Storage requirements – Overhead of test guarding tracing code – Impact on JIT compilation Do try to correlate application tracing information with events in other system or JVM logs47 © 2012 IBM Corporation

×