Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Dyna Trace Whitepaper Performance


Published on

Published in: Technology, Education
  • Be the first to comment

Dyna Trace Whitepaper Performance

  1. 1. Performance Management and Diagnostics in Distributed Java and .NET Applications >> Rapidly resolve performance problems across the software application lifecycle
  2. 2. Contents EXECUTIVE SUMMARY ................................................................................... 1 INTRODUCTION........................................................................................... 2 APPLICATION PERFORMANCE IN HETEROGENEOUS MULTI-SERVER CLUSTERED ENVIRONMENTS........................................................................................... 3 Symptoms and Causes of Performance Problems ......................................4 Fixing Performance Problems ..................................................................5 TRADITIONAL TOOLS FOR APPLICATION PERFORMANCE MANAGEMENT ....... 7 Developer Tools.....................................................................................7 Administrator Tools ................................................................................8 Need a Better Solution ...........................................................................8 PERFORMANCE MANAGEMENT IN APPLICATION LIFE-CYCLE ......................... 9 Application Performance Management Solution Requirements..................11 DYNATRACE DIAGNOSTICS ........................................................................ 12 Efficient Diagnostics .............................................................................13 Out-of-the-box, Extensible Diagnostics...................................................16 COMPARE YOURSELF.................................................................................. 17 CONCLUSION ........................................................................................... 19 >> rapidly resolve performance problems in distributed java & .net applications 1
  3. 3. Summary Executive Summary Today’s complex mission critical applications run in heterogeneous multi-server environments. When these applications falter, business productivity grinds to a halt, users are inconvenienced – costs rise and profits fall. Modern technologies such as Ajax, Java and .NET and approaches such as SOA, EAI, and MDA enable engineers to create and deploy applications rapidly. However, development tools generally do not enable engineers to establish a good understanding of the application’s performance characteristics, and avoid performance problems. Consequently, performance problems are discovered late in the application life-cycle and have to be corrected at considerable time and expense. In load-test and production environments, application performance management solutions typically consist of server monitors. When performance problems occur, such monitors provide alerts, but not enough information to diagnose the root cause because they do not look deeply inside the transaction execution to identify the actual root cause. Due to their large overheads, development tools cannot be used in such environments to troubleshoot the problems. As a result, IT personnel can spend hours or days trying to reproduce and analyze these problems. Often limited by the available information, they ameliorate the situation by adding resources or tuning at the server and system layer, without resolving the underlying design or programming issue. To eliminate wasted time and expense, IT organizations need a new class of application performance management solutions to monitor and diagnose performance problems. These solutions must provide detailed, transaction-specific diagnostic information for single and multi-server transactions. Such solutions should support the requirements of system administrators, performance analysts, testers and developers throughout the application life-cycle. In contrast to traditional monitoring tools designed to detect the symptoms of performance problems by measuring aggregate statistics at the server level, dynaTrace Diagnostics®, has been expressly designed to not only detect but also diagnose the root cause of performance problems: dynaTrace Diagnostics collects necessary contextual behavior data during transaction execution to construct the transaction’s execution path, known as the PurePath®. PurePath maps the transaction’s precise execution path, containing relevant sequence, timing, resource usage and contextual information for each method/step the transaction executes. If the transaction is executed on multiple servers, whether running on the same or different machines, dynaTrace Diagnostics precisely measures and reveals the PurePath through all of these servers. To minimize overhead and impact on application performance, dynaTrace Diagnostics’ embedded, dynamic, lightweight agents offload data they collect and send it to a central Diagnostics Server for efficient, real-time, off-line analysis. dynaTrace Diagnostics’ unique design enables IT personnel to: Prevent performance problems by gaining a better understanding of the dynamic behavior of the applications during development, and Reduce time to repair by reconstructing the problem transaction quickly from captured data to identify its root cause – enabling repair in minutes, not hours or days. Performance Management and Diagnostics in Distributed Java and .NET Applications 1
  4. 4. Introduction Today, a large number of mission-critical business processes are supported by performance sensitive applications. Developers can rapidly create such applications without writing a lot of “infrastructure” code using frameworks such as Java EE, .NET, Ajax and Atlas, etc. These applications can scale quickly by accessing objects and services located on other servers through built-in remoting capabilities – allowing application deployment in a variety of distributed multi-server clustered configurations. SOA and EAI drive this trend further by leveraging existing applications and services in distributed environments. Performance While such frameworks speed development, they also hide inner workings that can problems are contribute significantly to resource consumption, especially if such capabilities are common in mission misused. Consequently, mission-critical applications are often deployed with latent performance issues that surface later in production. Industry surveys reveal that: critical Java and .NET applications. Among companies with $1B or more in revenues, nearly 85% experienced incidents of performance degradation1, 40% of the unplanned downtime is due to application failures, and The cost of down time of mission-critical applications averages over $100,000/hour2. Problem resolution Industry surveys also show that: takes too much time IT groups spend 24% of their time in resolving application slow-downs3, and and resources. 80% of unplanned downtime can be mitigated by application development and operations working together4. Clearly, IT personnel spend too much time reacting to performance problems. Current tools are ill-suited for resolving application performance bottlenecks: development tools are inappropriate in production environments for many reasons including high overhead; monitoring tools detect but do not provide detailed diagnostic information necessary to resolve performance problems. In order to reduce the time to resolve such problems, IT personnel need a common, easy to use, low overhead measurement and analysis system that can efficiently collect necessary and sufficiently detailed diagnostic data, and speed up root cause analysis. In this paper, we first develop the requirements for such a system and then introduce dynaTrace Diagnostics, which has been expressly designed to detect and diagnose performance problems throughout the application life-cycle – from development through production – at a very low overhead. 1 Jean-Pierre Garbani, “Best Practices in Problem Management”, Forrester Research, June 23, 2004 2 Theresa Lanowitz, “Delivering Business Value Through Software Quality”, Gartner Symposium IT Expo 2004, October 17-22, 2004. 3 referring to Applied Research survey commissioned by Symantec. 4 Theresa Lanowitz, “Delivering Business Value Through Software Quality”, Gartner Symposium IT Expo 2004, October 17-22, 2004. Performance Management and Diagnostics in Distributed Java and .NET Applications 2
  5. 5. Application Performance in Heterogeneous Multi-Server Clustered Environments Different transactions As discussed, today’s mission critical applications run in heterogeneous multi-server can follow different environments. Figure 1 details an example application’s typical transaction flow, paths in a distributed which starts at the users Web browser, traverses the Java SE/EE servers for authentication and Web page rendering, executes business logic in the .NET system. servers, accesses mainframe databases and integrates external systems through Web services. R TIE E ATA RAM D INF MA MS RDB R TIE ESSR SIN E BU T SERV NE . R TIE ON ATI RVER T SEN E SE PRE SE/E A JAV AL S ERNVICE EXT -SER B WE Figure 1: A Transaction in multi-server, heterogeneous, clustered application environment The arrows along the red lines indicate the transaction’s high level execution path. The detailed execution path enumerates the processing steps (method calls, servlet invocation, etc.) and their context through various components in sequence of the transaction’s execution. A transaction can Various performance problems can occur during this execution. These problems can suffer performance lead to a variety of symptoms, some during the transaction’s execution and some problem anywhere in well after. its execution path. Performance Management and Diagnostics in Distributed Java and .NET Applications 3
  6. 6. Causes of Performance Problems An application may present a variety of symptoms of performance problems due to Addressing symptoms a number of different causes. The causes include non-optimal use of pre-existing of performance software frameworks and/or their built-in remoting capabilities, other design errors, problems does not coding errors, resource contention or inappropriate configuration settings. As really address the illustrated in Figure 2, these causes can occur anywhere along a transaction’s root cause. execution path. Figure 2: Typical sources of performance problems in distributed applications Performance Implications of Frameworks Framework and Developers create modern applications rapidly by leveraging frameworks or library code can preexisting infrastructure software libraries. Therefore, when a transaction executes, consume significant a significant amount of library code is executed as part of the transaction’s resources execution path. Application performance therefore depends not only on code that inadvertently. developers write specifically for the application but also on the facilities used from underlying libraries – and the hidden interactions among them. Therefore, developers need to understand the dynamic behavior of the underlying code and choose the right set of capabilities from the framework. Performance Implications of Distributed Deployment or Remoting Remoting can lead to For handling large transaction volumes or enhanced scalability, frameworks allow performance multi-tier software developed in a single application server environment to be easily deployed in multi-server distributed configurations. However, when two application problems. tiers communicate across server and/or machine boundaries, performance can be significantly affected. Such degradation depends upon the serialization or data marshalling costs and network latencies, which in turn depend upon the number of remote calls and the data transferred per call (Figure 3). If the application is not well designed for remoting, code running on one tier can remotely access objects resident on other tiers automatically, resulting in an unexpectedly large number of remote calls or data transfers. The performance effect of such poor design is generally not apparent during development because developers typically work with single server configurations, and even when they work with multi-server distributed configurations, they test the Performance Management and Diagnostics in Distributed Java and .NET Applications 4
  7. 7. software at low loads. Therefore, to eliminate latent performance problems due to remoting, developers need to understand the effect of remoting by examining the dynamic interactions of the components. Client Application Server Application Total Latency Client Application Server Application Total Latency Stub Dispatcher Stub Dispatcher (De)serialization (De)serialization (De)serialization (De)serialization Conversion Conversion Latency (Un)marshalling (Un)marshalling Latency (Un)marshalling (Un)marshalling Transport Transport Transport Transport Network Network TCP/IP TCP/IP TCP/IP TCP/IP Latency Latency Figure 3: Latency introduced by remoting. Fixing Performance Problems Finding and fixing the Table 1 enumerates a number of symptoms of performance problems and their root cause of the probable causes. Clearly, there can be many causes for each symptom. This implies problem is generally that the symptom of a performance problem does not explicitly reveal the cause of not possible without the problem. Often what is thought of as a cause is really a symptom and one may knowing the need to drill down recursively to find the root cause. transaction’s actual To find the root cause, it is important – and in many cases imperative – to identify execution path. the individual transaction(s) experiencing the performance problems and their execution path in the environment in which they are executing. This data must be sufficient for properly and efficiently diagnosing the problem. Without such information, performing problem diagnosis is the same as shooting in the dark and it is easy to jump to the wrong conclusions. Performance Management and Diagnostics in Distributed Java and .NET Applications 5
  8. 8. Symptom Sample Causes High response time Excessive resource consumption by transaction for specific Too much synchronization wait time transactions or Too much time to get inside the connection or server pool most transactions Improper settings such as pool size Excessive delay for external web-services Undersized system Erratic transaction Excessive garbage collection response time High resource utilization Erratic response of external web-services Application failures Programming errors – improper error condition handling or time outs Data specific problems Memory exhaustion Memory leaks Socket exhaustion File handler exhaustion High CPU Poor/Inefficient algorithms utilization Poor design choices consuming significant time in underlying layers Poor implementation – redundant work Undersized system Improper transaction routing High memory Memory leaks utilization or too Objects persist for unnecessarily long time frequent garbage Pool size too large collection Undersized system Lots of short lived objects High network Too many remoting calls utilization between Too much data transfer per call – poor design; lack of cohesion servers High IO Rate Too many SQL calls – improper database or query design Poor/Inefficient algorithms Insufficient cache Pool size too large for configuration leading to thrashing Too high Poor algorithm design - not enough parallelism synchronization Excessive execution time for sub-transactions delays Locks being held for too long Excessive resource Poor algorithms consumption by Poor design choices consuming significant time in underlying layers transaction Poor implementation – redundant work Too many remote calls Too much data transfer for remote calls Objects held for too long Poor SQL query and/or database design Long pool queue Too much resource consumption by transactions or utilization Large transaction execution time for other reasons including too much remoting, large synchronization delays or delays for external services or database queries Incorrectly sized pool Table 1: Performance problem symptoms and typical causes Performance Management and Diagnostics in Distributed Java and .NET Applications 6
  9. 9. Traditional Tools for Application Performance Management As noted earlier, it is important for the developers to understand the dynamic behavior and performance characteristics of their design choices in order to build a well performing application. When a transaction experiences performance problems, to really fix the problems, it is critical that IT personnel do proper diagnosis and identify the exact causes and locations of the deficiencies. Proper tools are needed to perform the job and we therefore discuss the effectiveness of traditional tools in preventing and eliminating performance problems. Traditional performance problem detection and resolution tools fall into the following broad categories: Developer tools including debuggers, loggers and other forms of custom instrumentation, code profilers, and Administrator tools, which primarily include server monitors and system utilities. Developer Tools Debuggers, loggers Debuggers are an integral part of the developers’ tool kit. They enable developers and code profilers do to go through specific execution steps at a controlled pace and allow them to focus not directly support on a specific small area of interest at a time but do not significantly enhance the analysis of distributed overall understanding of the interactions among components and layers. Debuggers applications and do are not suited for use in production or load-test environments because, for example, not work in (a) they stall the application, making performance measurements impossible, (b) they create high overhead, and (c) they require users to be expert programmers with production systems. access to source code, and (d) one can look only at one thread at a time causing timeouts (e.g. XA transaction timeout or servlet timeout). Creation of custom instrumentation such as using loggers or custom output routines requires access to the source code and the advance knowledge of what one needs to monitor for solving the problem. While it may some times be appropriate to instrument the code during the development phase, it is generally not practical for solving problems found during the later phases for many reasons, including (a) source code may need to be changed to do the necessary instrumentation, (b) since custom instrumentation is often written as in-line code, it can potentially change the behavior of the code, and as a result, either mask existing problems or introduce new performance problems, (c) the analysis of the output produced by such instrumentation is generally too laborious and time consuming, and (d) correlation of log messages across transactions often requires too much effort or may even be impossible. Code profilers are useful in development to understand which pieces of code consume the most CPU and for doing some statistical code optimization. However, their lack of support for distributed and heterogeneous application environments and the limited insights they provide into the dynamic application behavior – in particular due to the statistical nature of profiler output (averages and percent distributions) and the lack of context information required to reconstruct or even understand performance problems – prevent developers from using profilers in diagnosing and resolving transaction performance problems discovered in later life- Performance Management and Diagnostics in Distributed Java and .NET Applications 7
  10. 10. cycle phases. Further, since code profilers introduce large overheads, sometimes as large as 10x to 10,000x, they cannot be run in real load-test or production environments. Administrator Tools Monitors and system utilities provide overall usage and performance statistics on the Monitoring tools do server at reasonable overhead. Looking at this class of data, a server administrator not provide sufficient detail to eliminate the can potentially guess at the problem and tune the application or server configuration.5 Even when the tuning action provides performance relief, it may not root cause of the performance necessarily address the root cause of the problem and may shift the bottleneck to problems and force elsewhere in the system. In addition, while the aggregate data provided by a one to alleviate the monitor may be useful to a skilled administrator in alleviating consistently or symptoms by tuning regularly recurring symptoms, it does not help with resolving problems that appear sporadically. server configuration. When monitoring utilities provide specific information about transactions, the information is generally limited and cumbersome to obtain. For example, some monitors require the user to specify the transactions to be monitored in-depth. Others provide intermediate information such as servlet response time, but do not provide contextual information about what the servlets are actually executing. Even those monitors that attempt to provide some execution context do not provide enough information to determine the root cause. And some limit the user to monitor only certain transactions under certain conditions thus eliminating true application- wide analysis. The situation degrades when dealing with multi-server transactions because Monitors do not traditional monitors measure the individual transaction’s behavior within a single provide sufficient data to reconstruct server and cannot track the end-to-end execution of the transaction across multiple servers6. Hence, an engineer has to infer transaction behavior such as transaction the problem scenario, limiting the routing using only aggregate, statistical information (average, max, min, etc.) available at the server level. Consequently, the engineer can know the likelihood ability to reproduce that a transaction is executed on a certain server but cannot possibly identify the problem which results exact conditions, interactions and execution paths that led to the problem, limiting into large time to his ability to reconstruct the problem scenario. This lack of visibility in the repair. application behavior forces the engineer to identify root cause through trial and error, resulting in long and cumbersome repair times. The Need For a Better Solution Clearly traditional tools are insufficient for understanding the performance implications of software design and the application behavior in production systems. They are, therefore, inadequate for proactively reducing the risk of performance problems or for rapidly resolving performance problems when they do occur. 5 For example, increase the JDBC connection pool size or increase the heap size. 6 Such monitors can monitor transaction of a multi-tier application only when all tiers run in the same server. They are unable to monitor multi-tier transactions when the tiers run on more than one server, irrespective of whether the servers run on the same physical machine or run in a distributed environment. Performance Management and Diagnostics in Distributed Java and .NET Applications 8
  11. 11. Performance Management in the Application Life-Cycle Effective application Developers tend to work in single server environments and performance problems often go unnoticed until later in the life cycle when: performance management needs QA finishes functional testing and starts verifying performance a life-cycle characteristics approach. Performance analysts start performance and longevity testing to develop multi-server configuration guidelines Operations deploys the application into production Customers complain or abandon transactions after the system has gone live Due to the complexity of performance problems, it often takes a painful amount of time before the root cause is identified. A contentious situation such as the one depicted in Figure 4 often results before the problem is resolved. I cannot This It works Come back We got What more do reproduce Problem in my you want? when you can performance the has been enviroment. I gave you reproduce it alert at 2AM for problem. there for everything that or get more the checkout 1 month! the system gave details. transaction. Fix it now! me. If only, the Only if developers QA/Ops will had a clue hire someone about what who can the real program! enviroment is like. Figure 4: A consequence when performance problems are hard to diagnose To prevent such troublesome situations and to deliver a high performance system, IT must pay attention to performance issues throughout the application life-cycle. Performance Management and Diagnostics in Distributed Java and .NET Applications 9
  12. 12. Figure 5 outlines performance related roles and responsibilities of different players and the information flow among them. Quality Staging, Staging, Development Production Phase Assurance Deployment Architects, Architects, Architects, System Architects, Testers Operations Players Performance Analysts Developers Alert on and document Design for performance Discoverand identify Tune performance performance issues real- performance issues under real-world 24x7 Specify performance conditions management tools Performance Report quality trends for Recognize performance every application Optimize configuration trends Engineering Understand dynamic component for scalability Responsibilities application behavior Triage performance Isolate and document Isolate and document issues Reconstruct and performance issues performance issues fix problems Minimize downtime Typical Environment •Monitoring and Diagnostic Tools Monitoring •Recommended settings and thresholds Recommended •Expected performance characteristics, potential bottlenecks Expected and key performance indicators Performance Information •Performance and behavior changes between versions Performance •Performance and scalability reports Performance •Dynamic components interaction characteristics Dynamic •Data for offline analysis and problem reconstruction Data Instrument and Measure Transaction Behavior. Application Monitor Fulfillment of Service Level Agreements. Performance Reveal Software’s Dynamic Behavior and Performance Implications. Management Detect, Diagnose and Resolve Application Performance Problems Solution throughout Application Life-Cycle. Figure 5: Performance roles, responsibilities and information flow during application life-cycle Performance Management and Diagnostics in Distributed Java and .NET Applications 10
  13. 13. Application Performance Management Solution Requirements Effective APM As one considers Figure 5 above and the implications of modern software requires a common development frameworks and remoting discussed earlier, it becomes evident that: tool that all IT During development: personnel, including Developers need tools to analyze the dynamic behavior of the application developers, testers, through its underlying layers for understanding the performance system architects, implications of design alternatives, performance analysts, During post-development (QA, Staging or Pre-deployment and Operational): administrators and Non-development IT personnel, particularly Operations, need a tool to operators, can use automatically detect performance anomalies such as SLA violations and effectively. capture all necessary and relevant information for problem reconstruction. This tool should require neither programming skills nor access to the source code. Non-development IT personnel need an easy to use tool for doing high- level triage and providing full in-depth code level diagnostics information to system architects and development. Developers need detailed data with complete transaction context and step- by-step execution details from production/load-test environments for off- line diagnosis so that they can reconstruct what happened rather than make repeated, laborious attempts to reproduce problems. Organizations need tight cooperation and efficient, productive communication between all stakeholders responsible for application performance. To reinforce this point, note that IT personnel from different groups interact tremendously throughout the application life-cycle on performance matters (Figure 5) and recognize that non-developers typically identify performance problems and developers resolve them. Traditional application performance management tools do not meet the requirements for rapidly identifying, diagnosing and resolving the performance problems throughout the application life-cycle. A next generation solution is needed. dynaTrace Diagnostics® is such a solution Performance Management and Diagnostics in Distributed Java and .NET Applications 11
  14. 14. dynaTrace Diagnostics dynaTrace Diagnostics® is an application performance management solution, which fulfills the measurement and diagnosis requirements that have been identified in this paper. Specifically designed to support the entire application life-cycle, dynaTrace’s PurePath® technology captures essential information for all transactions during their execution across multiple servers in heterogeneous distributed environments at very low overhead (Figure 6). This enables IT personnel to: Understand the dynamic behavior of the software so that, where possible, performance problems can be prevented, and Detect and diagnose performance problems so they can be quickly resolved whenever they occur. In addition to high-level performance indicators, dynaTrace Diagnostics maps A transaction’s out the precise execution path - the PurePath - of each individual transaction PurePath identifies from its entry at the first monitored server, through all other servers where it is the code executed by processed, across system, technology and component boundaries. the transaction, the execution context PurePath uses KnowledgeSensors™ to capture all performance and relevant and the server on context information with minimum performance overhead. which it was executed. Figure 6: dynaTrace Diagnostics visualizes traces of individual transactions in distributed heterogeneous environments KnowledgeSensors mark a transaction’s progress along its execution path and identify all transaction entry points (e.g., Java Servlet invocations) and method calls, as well as their sequence and nesting. For each transaction, the Performance Management and Diagnostics in Distributed Java and .NET Applications 12
  15. 15. KnowledgeSensors record performance information like method call sequence, arguments, return values, exceptions, log messages, elapsed time and resource utilization statistics such as CPU usage, IO usage, network traffic, objects created, SQL calls, remote calls, and synchronization delays; see Figure 7 below. PurePath allows one dynaTrace Diagnostics records the PurePath for all transactions at very low to reconstruct overhead and sends it to the Diagnostics server for analysis. This ensures that IT: problem scenario Has a complete record of execution for all transactions - to get 100% without trial and monitoring coverage (and not to miss issues), error. Has a record of each transaction’s execution across application tiers, servers and machines - to analyze every potential issue, Can actually see the transaction’s execution path, and avoid guessing by trying to follow its execution from one server to the next, Can understand the dynamic behavior of the software, Can determine the root cause of the problem experienced by a specific transaction the first time – without having to pre-specify what transactions to monitor and then wait for the problem to recur, Can recreate problem scenarios, including problem transactions, from recorded data and pinpoint the exact cause of performance problems quickly, avoiding traditional, expensive trial and error approach, Can diagnose problems in near real time or afterwards, and Can diagnose problems off-line without loading the production systems. ent pon com 5 s sive taton en Exp lemen s ject imp Ob alls s 00 s 39 c 800 SOA call r QL ove 2S alls 17 te c mo fered 0 re trans 10 MB eak 10 ry L mo Me on ti nisa chro s Synay 24 Del text Conhod s, t (meumentues) 8s arg rn val s, 6 n k o nds retu ption ges, e Clic sferFu e exc messaesourc ran T log ng & r imi e tg usa Figure 7: A transaction's PurePath shows where it experiences performance problems Efficient Diagnostics dynaTrace dynaTrace Diagnostics enables engineers to diagnose problems efficiently. Diagnostics allows Given PurePath information, they need not spend time trying to reproduce the you to diagnose problem. They can analyze the problem by performing either: problems efficiently, Outside-In diagnosis beginning with an incident of a user-visible whether they are performance problem, such as a slow-responding transaction or user- visible to the users or visible exception message and drilling down until the root cause is not. identified, or Performance Management and Diagnostics in Distributed Java and .NET Applications 13
  16. 16. Inside-Out diagnosis beginning with an internal measure of the performance problem, such as an exception message or a method running very slowly, identify associated transactions and drill down through their PurePaths to identify the root cause. Drill down through a When certain transactions do not meet service level or performance transaction’s requirements, dynaTrace Diagnostics’ intuitive console allows IT personnel to drill down through the transaction’s PurePath to identify the root cause(s) of PurePath to determine the root performance problems (Figure 8) such as: cause of its Execution steps (e.g., method calls and servlet executions) that consume performance too many resources or run slowly, problems. Excessively called methods or servlets, even if in framework software, Code that makes an excessive number of SQL calls or long running SQL calls, Excessive wait for resources such as execution threads or connection, Threads and locks causing synchronization delays, Components that make excessive remote calls, Remote calls doing excessive data transfer, Remote method or web-services calls taking too much time, Code where memory leaks occur, and Code where large number of short-lived objects are created and destroyed. Figure 8: dynaTrace Diagnostics PurePath allows engineers to reconstruct the problem by viewing the exact call sequence including performance metrics and detailed context information and unearth the root cause of the problems. Performance Management and Diagnostics in Distributed Java and .NET Applications 14
  17. 17. When internal problems such as massive memory consumption or server crashes are encountered, SLA violations are detected, or a comparative analysis of historical data reveals potential performance issues (Figure 9), IT personnel use dynaTrace Diagnostics to identify transactions with a high contribution to the symptoms. By drilling down into these transactions’ PurePaths, they can better understand the context that leads to such high contributions. By using this context information and the ability to recreate transactions, they can quickly identify the root causes. Figure 9: dynaTrace Diagnostics dashboard highlights problems, allows trend analysis, and historic comparisons such as of different application versions For example, consider two situations which may not be visible – yet – to the dynaTrace Diagnostics helps users: an exception message and a slow running method. With dynaTrace resolve performance Diagnostics problems before When an exception message is found in the logs, one can identify: customers experience • The transaction and its parameters that led to the exception, or report them. • Actual method call that generated the exception message, • The parameters passed to this method call as well as to all of its predecessors, including the parameters input by the user, and thus, • The root cause such as user error, insufficient error handling in code, other logic errors, or system conditions such as out of disk space. When one identifies a slow running method, one can quickly determine: • Whether the method runs slow constantly or just from time to time, Performance Management and Diagnostics in Distributed Java and .NET Applications 15
  18. 18. • Transactions that execute this method, • Transactions that execute and bog down in this method, and • The break-down of such transaction's execution time in this method into the time taken by underlying method calls and queries, and • The core method that needs to be corrected to achieve higher performance. dynaTrace Note that since dynaTrace Diagnostics maps the performance of individual Diagnostics helps transactions, rather than just the aggregate performance of all transactions or a address the problems class of transactions, it allows IT personnel to: in right priority, Determine a transaction’s business value by looking at its parameters, before they affect allowing them to prioritize different incidents and focus energies on the many users. most valuable issues. Address performance issues in their infancy – when they show up for a few transactions – before those affect a large number of users and have a negative impact on business. Out-of-the-box, Extensible Diagnostics dynaTrace dynaTrace Diagnostics comes with an array of ready-to use KnowledgeSensors Diagnostics allows for a variety of commercial and open source: one to monitor the run-time virtual machine environments entire software stack database access layers – from the custom application platforms and servers application code to run time environment remoting libraries of the virtual machine web services stacks for custom and messaging libraries and frameworks covering the entire software stack packaged application (Figure 10). software. .NET application in Custom Application Java application C# or other languages Knowledge Sensors Spring, Toplink, Struts, Frameworks Atlas Ajax IBM WebSphere MQ, Messaging IBM CICS Transaction Server ADO.NET, ASP.NET BEA T3 (RMI, JMS), IBM WebSphere, BEA WebLogic, WebService Stacks .NET/WCF AXIS, Web Methods, Glue Pre-built RMI(IIOP, JRMP, HTTP(s), T3) Remoting .NET(WCF) Knowledge Visibroker, IIOP/ORBS Sensors Sun Java AS, IBM WebSphere, Shipped BEA WebLogic, Application Server Microsoft Windows AS With JBoss, Apache Tomcat dynaTrace Oracle AS, SAP Netweaver Diagnostics Application Platform J2EE, JSEE .NET Database Access SQL, JDBC, Hibernate SQL, ADO.NET Layer Sun JVM, IBM JVM, Runtime Environment Microsoft CLR BEA JRockit Figure 10: KnowledgeSensors capture transaction execution through all software layers (For up-to-date list, please visit Performance Management and Diagnostics in Distributed Java and .NET Applications 16
  19. 19. These pre-built KnowledgeSensors encapsulate deep knowledge enabling IT personnel to manage performance in their environments right out of the box – without any effort spent on customization. For more detail into custom applications such as a policy quotation system for an insurance company for example, developers can easily define and package KnowledgeSensors for their own applications using dynaTrace Diagnostics’ point and click interface, then ‘hot’ deploy them to the target environment. For packaged applications such as SAP ERP, application developers can easily define and package KnowledgeSensors for those applications. These packages can either be shipped with the application or separately. Alternately, IT personnel at the licensed organizations, or third parties, can define KnowledgeSensors for that application without needing access to the application’s source code, package them, and then deploy on their own. Compare For Yourself Earlier, we asserted that dynaTrace Diagnostics is the only solution available on the market that meets the requirements for efficient performance problem detection and analysis that can be used throughout the application life-cycle by all members of the IT team. We invite you to scan Table 2 to compare other products that you may be familiar to dynaTrace Diagnostics. We are confident that you will agree that dynaTrace Diagnostics fits the bill perfectly while other solutions fall significantly short Performance Management and Diagnostics in Distributed Java and .NET Applications 17
  20. 20. Diagnostics Compare Yourself dynaTrace vendor other Key Diagnostics Capabilities Diagnosis Depth Requirements Capture necessary data for each individual transaction, and not just average transaction measurements, in load testing and 24x7 production environments, enabling diagnosis of the business-critical outlier transactions. Capture all performance and contextual data that is required for reconstructing a performance problem – thus eliminating the need to reproduce it – and quickly identifying the code where the performance problem occurs. Such data should include method response times, remoting performance and payload metrics, synchronization metrics, method and Web request arguments, log messages and exceptions. Reveal the relationships among events such as exceptions, log messages, input metrics, SQL executions and performance threshold violations by associating them with transactions to identify the root-cause. Analyze transaction metrics in context of server resource metrics to determine whether the performance problem is caused by configuration issues or programming issues. Diagnose memory leaks, even in production environments. Precisely trace execution of each transaction across multiple servers (logical or physical) and clients to understand its impact on each server and application component as well as to understand implications of remoting to design high performance distributed applications using SOA, Web-Services, etc. Life- Application Life-Cycle Requirements Provide real-time data to Operations, down to the code level, for each and every individual transaction for high-level problem triage and to performance analysts and system architects for live root-cause analysis Provide offline code-level diagnosis capabilities that enable developers and architects to interactively diagnose all individual transactions for reconstructing, isolating and resolving the performance problem, eliminating the need to reproduce the problem. Capture necessary performance data in QA and production environments and transfer the information to engineering for analysis, potentially on another system, eliminating the need for having developers on site to debug performance problems or for having to spend significant amount of time on reproducing the problem. Provide automated performance comparison reports, down to individual transactions and code level, among subsequent diagnosis sessions for evaluating the success of performance tuning activities, comparing different application versions and configurations and understanding the root-cause of the differences. Enable engineers, architects and performance analysts to define measurement granularity, so that they get from QA and operations exactly what they need. Store and maintain the performance data for long term historical and trend analysis. Integrate with IDEs, automated build and test systems, load testing tools, issue-tracking systems and enterprise management systems to enhance the productivity of IT personnel throughout the application-life-cycle. Deployment and Operational Requirements Configuration-free agents for automated, centralized deployment. Centralized management of agents with automated and real-time remote configuration updates to quickly and easily adapt the depth and granularity of captured diagnostics data on the fly, without having to restart the application. Auto-discover application components for out-of-the-box diagnosis results and intuitive customization. Continuous measurement and diagnosis in load testing and 24x7 production environments through lightweight agent technology at negligible CPU overheads and flat memory usage of a few megabytes. Monitor service levels at individual transaction level and alert on violations. Automatically capture history of all transactions including deep diagnostics data for off-line root-cause analysis to eliminate the need for problem reproduction. Map transactions to requests, users and application functionality to prioritize problem resolution based on business impact. User Interface and Usability Requirements Simple, intuitive yet comprehensive and responsive user interface that does not require detailed programming knowledge but still provides information that programming experts can use. Uses nomenclature and presents statistics that are relevant to and usable by all members of the IT team, whether they are developers, testers, system architects or server administrators. Serves as the common solution to be used by developers and non-developers for capturing, storing and analyzing performance data throughout the application life-cycle, reducing time to repair. Table 2: Comparison tool for evaluating application performance management solutions Performance Management and Diagnostics in Distributed Java and .NET Applications 18
  21. 21. Conclusion Consistently delivering high application performance in today’s complex multi-server heterogeneous distributed environments is a daunting task. The ability to diagnose and resolve performance problems rapidly is critical to achieve this goal. Therefore, architects and managers should think beyond traditional performance management paradigms and establish effective systems and processes throughout the entire application life-cycle. The ability to Capture, at production-safe overhead, the detailed execution information for each transaction, during its execution, Reliably reconstruct problem scenarios from the captured information, and Quickly analyze this information to determine true root cause are keys to fixing application performance problems quickly and easily. With its innovative PurePath instrumentation technology, low-overhead dynamic monitoring, intuitive user interface featuring end-to-end visualization and analysis capabilities, and integration with IDEs and enterprise management frameworks, dynaTrace Diagnostics truly represents the next generation of solutions explicitly designed for use by all IT personnel throughout the application life-cycle. dynaTrace Diagnostics goes far beyond monitoring and enables IT to take productive and efficient action to fix performance problems. dynaTrace Diagnostics enables IT to: Study applications’ dynamic behavior during development to eliminate redundant calls, inefficient objects and algorithms, and tune caches and configurations, Fix the root cause of performance problems rather than mitigate or hide them by system tuning actions alone, Unearth poorly performing transactions even when overall averages are within acceptable range and allow engineers to take corrective action before problems explode on a large scale, Focus their energies on addressing troublesome transactions or hotspots, rather than trying unnecessarily to reduce overall average response times at considerable expense, Identify and focus on business-critical applications or transactions rather than working harder to improve the performance of all transactions using the same server, Perform their investigation offline without having to pre-define what data needs to be saved and without having to spend lot of time reproducing the problem, Give the recorded data to the engineers for analysis, allowing everyone to focus on their primary duties, and thus, Bridge the communication gap between system administrators, testers, performance analysts and developers. Consequently, dynaTrace Diagnostics proactively averts performance problems and reduces time-to-repair. Its life-cycle-centric design enables IT personnel to work together efficiently and effectively to deliver high performance consistently in complex heterogeneous multi-server clustered systems. We invite you to learn more at Performance Management and Diagnostics in Distributed Java and .NET Applications 19
  22. 22. software Headquarter EMEA: dynaTrace software GmbH Freistädter Str. 313, 4040 Linz, Austria/Europe, T+ 43 (732) 908208, F +43 (732) 210100.008 Headquarter North America: dynaTrace software Inc, West Street 200, Waltham, MA 02451, USA, T +1 (339) 9330317 F +1 (781) 2075365 E: All rights reserved dynaTrace software is a registered trademark of dynaTrace software GmbH. All other marks and names mentioned herein may be trademarks of other respective companies. (070522) anagement and Diagnostics in Distributed Java and .NET Applications 20