Tales from the Lab: Experiences and Methodology Demand Technology User Group December 5, 2005 Ellen Friedman SRM Associate...
Testing in the Lab <ul><li>Experiences of a consultant </li></ul><ul><ul><li>Taming the Wild West </li></ul></ul><ul><ul><...
Agenda <ul><li>Introduction </li></ul><ul><li>Software Performance Engineering and Benefits of Testing </li></ul><ul><li>B...
Software Performance Engineering  <ul><li>Performance engineering is the process by which new applications (software) are ...
Should we bother to Test?? WE CAN”T PLAN FOR WHAT WE DON’T KNOW
What do we need to achieve? <ul><li>Scalability </li></ul><ul><ul><li>Predictable scaling of software/hardware architectur...
Testing throughout the application lifecycle Cost of Fixing a problem late in the development is extremely  $$$$$$$
What is a Performance Test Lab? A “facility” to pro-actively assess the satisfactory delivery of Service to users   prior ...
Lab- What is it Good For? <ul><li>Before you deploy the application- create an environment that simulates the production e...
Evaluate system Develop Scripts Test Strategy Execute Baseline  Tests Validate Baseline Run Controlled  Benchmarks Analyze...
Evaluate System: Workload Characterization  <ul><ul><li>Identify Critical Business Functions </li></ul></ul><ul><ul><li>De...
Evaluate System: Workload Forecasting <ul><li>Define key volume indicators for  </li></ul><ul><li>What are the drivers for...
Workload Forecasting: Historical Review <ul><li>Does the business have a set peak? </li></ul><ul><ul><li>December for reta...
Volume vs. Response Time  Scale: Volume *1000 PPH
Service Level Considerations   <ul><li>e-Business System:Tracking System for Package Inquiries:  </li></ul><ul><li>WHERE I...
Lab can be used throughout the Application Lifecycle  <ul><li>Testing throughout the Application Life Cycle </li></ul><ul>...
How many Labs? Where to put them <ul><li>Locations for testing in various technical, business, or political contexts. The ...
Types of Labs and their Purpose <ul><li>Application unit testing </li></ul><ul><ul><li>Hardware or software incompatibilit...
Testing Concepts 101 <ul><li>Define the problem- Test Objectives </li></ul><ul><ul><li>Limit the scope </li></ul></ul><ul>...
Testing Process 101 <ul><li>Ensure that Lab mimics production ( H/W, S/W, Workload/business functions being tested ) </li>...
Developing the script: <ul><li>Meet with the Business Team, Applications Team to understand the workload. </li></ul><ul><u...
Load Testing Parameters <ul><li>Simulating Volume and distribution of arrival rate </li></ul><ul><ul><li>Hourly volume- di...
Note: reduction in read bytes/sec over time  X read bytes/second over time  How long should the test run? Need to reach st...
Creating the Test Environment in the Lab <ul><li>Creating the data/database </li></ul><ul><ul><ul><li>Copy database from p...
What type of staff do we need? <ul><li>Programmers </li></ul><ul><li>Korn Shell Programmers </li></ul><ul><li>Mercury Mave...
Establish Metrics  &   Analysis Methodology <ul><li>Based on the testing objectives, what data do we need to collect and m...
Build a Template for Comparison <ul><li>Before vs. After Comparison of Test Cases </li></ul><ul><li>Collect the performanc...
CASE STUDY <ul><li>Packaging- Shipping System </li></ul><ul><ul><li>Many centers throughout the country </li></ul></ul><ul...
Case Study  Configuration Architecture <ul><li>Database Server: </li></ul><ul><li>Runs 2 Instances of SQL (Main, Reporting...
Scanning the package on the Belt IF SLA not met packages aren’t processed automatically Additional manual work is required...
Case Study – Hardware Application Server DatabaseServers Database 1 Database 2 <ul><li>Database Server- DB #1 </li></ul><u...
Case Study: Software and OS <ul><li>Windows 2000 </li></ul><ul><li>SQL Server 2000 </li></ul><ul><ul><li>2 Database Instan...
Case Study: When do we test in the Lab? <ul><li>Hardware Changes </li></ul><ul><li>OS Changes </li></ul><ul><li>Software p...
Checklists and Forms <ul><li>Test Objectives </li></ul><ul><li>Application Groups must identify: </li></ul><ul><ul><li>Spe...
Case Study: Hardware Checklist
Sign-offs on Procedures/Pre-flight <ul><li>Who? </li></ul><ul><ul><li>Applications team </li></ul></ul><ul><ul><li>Lab gro...
Script Development:  Collected data from Production Systems <ul><li>Applications to include for testing and to be used to ...
Case Study: Developing a Script <ul><li>Major business functions for labeling and shipping : </li></ul><ul><ul><li>Verifyi...
Case Study: Performance Testing in the Lab <ul><li>Production Analysis indicated: </li></ul><ul><ul><li>Insufficient memor...
Planning: Testing out the configuration options <ul><li>Test out each of the options and provide a recommendation </li></u...
Validating the baseline: Taming the West! If you can’t measure it, you can’t manage it!  (CMG slogan)
Case Study What are we measuring? <ul><li>End to End Response Time  (percentiles, average) </li></ul><ul><li>SQL Stored Pr...
Validating the Baseline <ul><li>Data from two production systems was obtained to produce: </li></ul><ul><ul><li>Test datab...
Story: Creating a new Environment <ul><li>A series of performance tests were conducted in  Green Environment  to evaluate ...
Analysis to evaluate new Baseline <ul><li>Compare I/O activity for Green and Red </li></ul><ul><ul><li>Metrics: </li></ul>...
Comparing Overall Response Time Red vs. Green and Separate Server Green and Red tests with 2 mirrored pair of X drives are...
Comparison of Green and Red Environments (X drive –database) Read Activity 16% higher Write Activity 38% higher
Comparison of Green and Red Environments (D drive –Tempdb/logs) Read Activity 1% higher Write Activity 13% higher I/O acti...
Comparison of I/O Load SQL Activity: Green vs. Red Increase in Reads in Red due to Main Increase in Writes in Red caused b...
I/O Load Change: Main Instance  Separate server vs. Baseline Read Activity is reduced by 43% with  separate server
Differences between Red and Green <ul><li>D Drive activity is approximately the same </li></ul><ul><ul><li>TempDB and logg...
Red Environment:  Comparing Three Days <ul><li>Background </li></ul><ul><ul><li>Several large databases:  </li></ul></ul><...
Reviewing Log Write Activity Note: No log bytes – no replication Of UW1 database on 4-4
RED:  Comparing Three Days  Database Disk Activity Note: 4-8 UOWIS results are for separate server Increase in work perfor...
Comparing Database Reads/Writes Main Instance
Comparing Database Reads/Writes Reporting Instance Total page reads for reporting instance should remain constant Why did ...
Where are the differences on the two days ? Note: Differences in Stored Procedure- Total Reads (logical) Data Cap Summary ...
What have we uncovered about test differences? <ul><ul><li>Processor usage approximately the same </li></ul></ul><ul><ul><...
Testing Summary <ul><li>Need to create and follow a test plan which outlines </li></ul><ul><ul><li>All pre-flight procedur...
Measurement Summary <ul><li>The nature of performance data is that it is long tailed </li></ul><ul><ul><li>Averages aren’t...
Reporting the Test Results:Template <ul><li>Executive Summary </li></ul><ul><ul><li>Graphs of results- e.g., end to end re...
Summary <ul><li>Can’t always simulate everything - do the best you can. </li></ul><ul><li>Implement the change in producti...
Questions????????? <ul><li>Contact Info: </li></ul><ul><ul><li>Ellen Friedman </li></ul></ul><ul><ul><li>SRM Associates, L...
Upcoming SlideShare
Loading in …5
×

Tales from the Lab: Application stress testing best practices

533 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
533
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • The case study is a Windows 2000 client-server application executing in multiple locations throughout the country, and serves to illustrate a methodology for application performance testing in a test laboratory. Some of the considerations for evaluating different hardware and software options are reviewed and an approach is offered for comparing these options. The study continues through production implementation, where the results achieved in production are compared with those measured in the laboratory. The case study highlights the development of test scripts, validation of the baseline and ensuring test repeatability. Metrics to be analyzed for Windows and SQL Server applications are also reviewed as part of the study. The graphs and charts also serve to illustrate an approach that can be used to compare test results and production changes.  
  • If you know what to expect in terms of performance You can plan accordingly You can minimize risk You know where you are going You can sleep at night
  • A facility can be a place or a process that can be readily employed. We will discuss both options. By the way, this is not a novel concept in IT. We have always done stress tests, functional tests, user acceptance tests, comparative tests, etc. This is both user acceptance and comparative rolled into one.
  • Key transactions: Claims/Eligibility etc. What resources? What tiers?
  • Do we have cycles such as Christmas for retail or in the health insurance business some increase in activity during the winter when people are more sick or some period after open enrollment
  • A test environment might include one or more labs, and a lab might include one or more locations. The lab and test network should be designed for testing and be isolated from the corporate network.
  • If you are going ahead with a test lab concept, you need to follow some besic steps
  • Load testing is a way to stress test a system. By creating lots of load on a system, you can see how it will react at peak levels. Without load testing, you never know how your system will react at peak levels or with many simultaneous users. You expose yourself to the risk of having deployed a system with hardware and software that is too slow or unresponsive. This type of mistake could cost millions of dollars in hardware, network and software redesign, customers could be lost and users lose faith in your ability to provide the systems they need.
  • Give some background on how these 10 things were identified, the approach we took to investigate ways to address these concerns, etc. Specific recommendations/solutions will be presented in the status section of the presentation.
  • There are some operational differences between the centers, since most of the larger centers limit the number and type of applications processed during critical timeframes (e.g., when packages must be shipped out of the building). As such, we concentrated on the key critical systems that were to execute during the most critical timeframe and we reviewed production systems to quantify the specific conditions that needed to be simulated.   We collected system, process and SQL performance measurement data from a sample of the larger sites in order to determine the following – From the workflow, we were able to construct a script using both in-house and commercial scripting tools. The script included workflow information for the critical business functions such as package scanning, name and address verification as well as systems functions including database replication.  
  • The analysis of the production systems indicated that there was insufficient memory available for all the databases in the main instance. There were no significant performance issues on the application server. As such, the configuration options to be tested were focused specifically on improvements that could be made to the performance of the SQL database server. Unfortunately, the OS installed on the database and applications servers is Windows 2000 Standard Edition, which only supports 4 GB of memory. The company did not plan to upgrade the OS release until early 2006 after they certify Windows 2003 Enterprise Edition for the application
  • The plan was to test out the various hardware solutions in the laboratory and to then make a recommendation for production deployment to improve performance for the larger centers. Prior to production implementation, a “beta” of the solution was to be deployed at a few centers. Measuring performance in the field would be critical to determining the efficacy of the solution in terms of its ability to meet the SLA requirements. A secondary benefit is that it would also provide valuable information for future testing efforts so that necessary modifications to the test conditions could be made.   The two solutions that were initially deployed included the 5X solution and the separate server for the critical database. The 5X solution was deployed first as it was the easiest to implement. The systems group was unable to deploy the 3X+2Y solution quickly since it required procedural changes and additional certification. The sections below present the testing analysis and results in the lab, and then compare those results with the production deployment.
  • evaluating disk configuration changes on performance. Multiple baseline tests were conducted until production conditions were mirrored as closely as possible. In addition, the baseline test was repeated 2-3 times in order to measure test repeatability and reliability. The baseline results were reproducible to within 5% when executing in the same environment.
  • Note separate uowis server had 109k reads and 758kbytes/second writes But the reason that the overall work was so similar for reads and writes is that There were still writes to the ad-hoc database
  • After a series of tests are completed, a report detailing the findings should be prepared. A template for reporting should include an executive summary and a detailed analysis section. The executive summary highlights in non-technical language the test objectives, scope, and results; recommends the next steps to be taken; and details any action items. The results section should clearly identify whether service levels were met and provide detailed analysis of where delays in the application were found. A decomposition of end-to-end response time, highlighting the time spent in each server should be presented. The analysis section should quantify the methodology employed, the tools utilized and the results and their impact. The results should clearly identify whether application, architectural design, hardware or database changes will be necessary to meet performance objectives. The report should point out the next steps required and if additional testing will be necessary. Sufficient detailed performance data should be provided to support your recommendations and action items. Readers of your report will want to understand what steps taken to ensure the reliability and validity of the results. This means documenting the lab configuration and testing methodology. The hardware, software, network and OS that were used in the lab tests must be clearly itemized. Other items that need to be incorporated into your detailed report include the following: A list of administrative tools (standard Windows Server tools, third party, and custom-built). A list of the upgrades, such as service packs, drivers, and basic input/output system (BIOS), which must be installed on the OS.
  • The case study has served to illustrate the testing methodology discussed early in the paper. Before any testing can begin, it is critical to establish a valid, repeatable baseline. Scripts that are developed must accurately represent the systems, applications, database and architecture. Once the baseline is established and validated, the what-if testing can proceed. Various hardware/software options can then be evaluated in the lab and later testing the option in production can make the ultimate comparison. The testing process is an iterative one, the case study showed that the results achieved in production were similar if not better than those achieved in the lab. But had the results been contradictory, it would indicate that the load testing and lab environment was not representative of production and that the scripts would need to be modified. The validation process exercised during the early testing in the lab avoided this problem. By validating the baseline load tests, we were able to proceed with confidence. Ultimately, the proof was that similar results were achieved in production to that measured in the lab. It is usually impossible to test everything, so it is important to focus on those tests that have the most promise for success. It may not be feasible to implement “the most optimal performance improvement” and the tests will probably need to be adjusted to obtain political support. The tests also will need to be able to be implemented within a reasonable timetable. Test prioritization is important because time is always of the essence. One needs to glean the most from each test case and learn from the mistakes made. Repeatability is most critical, especially in establishing a valid baseline. The key focal points come from learning about test design: the mantras are repeatability, tes
  • Tales from the Lab: Application stress testing best practices

    1. 1. Tales from the Lab: Experiences and Methodology Demand Technology User Group December 5, 2005 Ellen Friedman SRM Associates, Ltd
    2. 2. Testing in the Lab <ul><li>Experiences of a consultant </li></ul><ul><ul><li>Taming the Wild West </li></ul></ul><ul><ul><ul><li>Bringing order to Chaos </li></ul></ul></ul><ul><ul><li>HOW? </li></ul></ul><ul><ul><ul><li>Methodology- Capacity Planning,SPE, Load Testing, </li></ul></ul></ul><ul><ul><ul><li>Discipline </li></ul></ul></ul><ul><ul><ul><ul><li>Checklists/Procedures </li></ul></ul></ul></ul><ul><ul><ul><li>What happens when procedures aren’t followed </li></ul></ul></ul><ul><ul><ul><ul><li>Detective Work </li></ul></ul></ul></ul>
    3. 3. Agenda <ul><li>Introduction </li></ul><ul><li>Software Performance Engineering and Benefits of Testing </li></ul><ul><li>Back to Basics: </li></ul><ul><ul><li>Workload Characterization/Forecasting Capacity Planning </li></ul></ul><ul><li>Building the Test Labs </li></ul><ul><li>Testing Considerations </li></ul><ul><ul><li>Scripts and test execution </li></ul></ul><ul><li>Some Examples </li></ul><ul><li>Documenting the test plan and reporting results </li></ul><ul><li>Summary </li></ul>
    4. 4. Software Performance Engineering <ul><li>Performance engineering is the process by which new applications (software) are tested and tuned with the intent of realizing the required performance. </li></ul><ul><li>Benefit: </li></ul><ul><ul><li>Identify problems early-on in the application life-cycle </li></ul></ul><ul><ul><li>Manage Risk </li></ul></ul><ul><ul><ul><li>Facilitates the identification and correction of bottlenecks to </li></ul></ul></ul><ul><ul><ul><ul><li>Minimize end to end response time </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Maximize application performance </li></ul></ul></ul></ul>
    5. 5. Should we bother to Test?? WE CAN”T PLAN FOR WHAT WE DON’T KNOW
    6. 6. What do we need to achieve? <ul><li>Scalability </li></ul><ul><ul><li>Predictable scaling of software/hardware architecture </li></ul></ul><ul><ul><li>Do we have capacity to meet resource requirements? </li></ul></ul><ul><ul><ul><li>How many users will system handle before we need to upgrade or add web servers/app servers </li></ul></ul></ul><ul><li>Stability </li></ul><ul><ul><li>Ability to achieve results under unexpected loads and conditions </li></ul></ul><ul><li>Performance vs Cost </li></ul><ul><ul><li>Achieving SLA and minimizing cost </li></ul></ul>
    7. 7. Testing throughout the application lifecycle Cost of Fixing a problem late in the development is extremely $$$$$$$
    8. 8. What is a Performance Test Lab? A “facility” to pro-actively assess the satisfactory delivery of Service to users prior to system Implementation or roll-out. - A test drive capability.
    9. 9. Lab- What is it Good For? <ul><li>Before you deploy the application- create an environment that simulates the production environment </li></ul><ul><li>Use this environment to reflect the conditions of target production environment </li></ul>
    10. 10. Evaluate system Develop Scripts Test Strategy Execute Baseline Tests Validate Baseline Run Controlled Benchmarks Analyze Results Report Findings SLAs, Workload Characterization, Volumes Obtain tools, methodology, build scripts Run the tests in the lab and obtain baseline Ensure that test scripts adequately represent the production environment Testing Plan Analyze Results
    11. 11. Evaluate System: Workload Characterization <ul><ul><li>Identify Critical Business Functions </li></ul></ul><ul><ul><li>Define Corresponding System Workloads/Transactions </li></ul></ul><ul><ul><ul><li>Map business workloads to system transactions </li></ul></ul></ul><ul><ul><ul><li>Identify flow of transactions through the system </li></ul></ul></ul><ul><ul><ul><li>Identify current and expected future volume </li></ul></ul></ul><ul><ul><ul><li>Determine resource requirements for business-based workloads at all architectural tiers </li></ul></ul></ul><ul><ul><ul><ul><li>Web server, Applications server, Database server </li></ul></ul></ul></ul>
    12. 12. Evaluate System: Workload Forecasting <ul><li>Define key volume indicators for </li></ul><ul><li>What are the drivers for volume and/or resource usage for the system? </li></ul><ul><li>Examples: </li></ul><ul><ul><li>Banking: Checks processed </li></ul></ul><ul><ul><li>Insurance: Claims processed </li></ul></ul><ul><ul><li>Financial: Trades processed </li></ul></ul><ul><ul><li>Shipping: Packages processed </li></ul></ul>
    13. 13. Workload Forecasting: Historical Review <ul><li>Does the business have a set peak? </li></ul><ul><ul><li>December for retail, and shipping </li></ul></ul><ul><ul><li>Peak/Average Ratio? 20% or 30% higher? </li></ul></ul><ul><li>Volume vs. Resource Usage </li></ul><ul><ul><li>Larger centers require greater computing resources </li></ul></ul><ul><ul><li>Need to determine scaling of hardware/software resources as a function of volume </li></ul></ul>
    14. 14. Volume vs. Response Time Scale: Volume *1000 PPH
    15. 15. Service Level Considerations <ul><li>e-Business System:Tracking System for Package Inquiries: </li></ul><ul><li>WHERE IS MY PACKAGE? </li></ul><ul><ul><li>Call center handles real-time customer inquiries </li></ul></ul><ul><ul><ul><li>SLA- caller cannot be put on hold >3 minutes </li></ul></ul></ul><ul><ul><ul><li>90% of all calls should be cleared on first contact </li></ul></ul></ul><ul><ul><ul><li>Responsiveness to customer needs </li></ul></ul></ul><ul><ul><li>Web-interface for customers </li></ul></ul><ul><ul><ul><li>Page load time and query resolution <6-8 seconds </li></ul></ul></ul>
    16. 16. Lab can be used throughout the Application Lifecycle <ul><li>Testing throughout the Application Life Cycle </li></ul><ul><ul><li>Planning </li></ul></ul><ul><ul><li>Design/ coding </li></ul></ul><ul><ul><li>Development/testing/UAT </li></ul></ul><ul><ul><li>Production Deployment </li></ul></ul><ul><ul><li>Post-production-change management </li></ul></ul><ul><ul><li>Optimization (performance and volume testing) </li></ul></ul><ul><li>Labs reduce risk to your production environment </li></ul><ul><ul><li>Solid testing leads to cleaner implementations !! </li></ul></ul>
    17. 17. How many Labs? Where to put them <ul><li>Locations for testing in various technical, business, or political contexts. The following factors influence the decisions you make about your test environment: </li></ul><ul><ul><li>Your testing methodology </li></ul></ul><ul><ul><li>Features and components you will test </li></ul></ul><ul><ul><li>PEOPLE, MONEY, Location </li></ul></ul><ul><ul><ul><li>Personnel who will perform the testing </li></ul></ul></ul><ul><ul><ul><li>Size, location, and structure of your application project teams. </li></ul></ul></ul><ul><ul><ul><li>Size of your budget. </li></ul></ul></ul><ul><ul><ul><li>Availability of physical space. </li></ul></ul></ul><ul><ul><ul><li>Location of testers. </li></ul></ul></ul><ul><ul><ul><li>Use of the labs after deployment . </li></ul></ul></ul>
    18. 18. Types of Labs and their Purpose <ul><li>Application unit testing </li></ul><ul><ul><li>Hardware or software incompatibilities </li></ul></ul><ul><ul><li>Design flaws </li></ul></ul><ul><ul><li>Performance issues </li></ul></ul><ul><li>Systems integration testing lab </li></ul><ul><ul><li>User Acceptance Testing (UAT) </li></ul></ul><ul><ul><li>Application compatibility </li></ul></ul><ul><ul><li>Operational or deployment inefficiencies </li></ul></ul><ul><ul><li>Windows 2003 features </li></ul></ul><ul><ul><li>Network infrastructure compatibility </li></ul></ul><ul><ul><li>Interoperability with other network operating systems </li></ul></ul><ul><ul><li>Hardware compatibility </li></ul></ul><ul><ul><li>Tools (OS, third-party, or custom) </li></ul></ul><ul><li>Volume testing lab </li></ul><ul><ul><li>Performance and capacity planning </li></ul></ul><ul><ul><li>Baseline traffic patterns </li></ul></ul><ul><ul><ul><li>traffic volumes without user activity </li></ul></ul></ul><ul><li>Certification Lab </li></ul><ul><ul><li>Installation and configuration documentation </li></ul></ul><ul><ul><li>Administrative procedures and documentation </li></ul></ul><ul><ul><li>Production rollout (processes, scripts, and files; back-out plans) </li></ul></ul>
    19. 19. Testing Concepts 101 <ul><li>Define the problem- Test Objectives </li></ul><ul><ul><li>Limit the scope </li></ul></ul><ul><li>Establish metrics & analysis methodology </li></ul><ul><ul><li>Tools/analysis </li></ul></ul><ul><li>Establish the environment </li></ul><ul><ul><li>Design the test bed </li></ul></ul><ul><ul><ul><li>Simulate the key business functions </li></ul></ul></ul><ul><ul><ul><li>Develop scripts and their frequency of execution </li></ul></ul></ul>
    20. 20. Testing Process 101 <ul><li>Ensure that Lab mimics production ( H/W, S/W, Workload/business functions being tested ) </li></ul><ul><li>Test measurement tools and develop analysis tools </li></ul><ul><ul><li>ARM the application </li></ul></ul><ul><ul><ul><li>Instrumentation to provide end to end response time </li></ul></ul></ul><ul><ul><ul><li>Instrumentation to provide business metrics to correlate </li></ul></ul></ul><ul><li>Execute controlled test </li></ul><ul><ul><li>Single variable manipulation </li></ul></ul><ul><ul><ul><li>Ensure repeatability </li></ul></ul></ul><ul><li>Analyze data & repeat if required (e.g., tune system) </li></ul><ul><li>Extrapolate </li></ul><ul><li>Document Test set-up and results </li></ul>
    21. 21. Developing the script: <ul><li>Meet with the Business Team, Applications Team to understand the workload. </li></ul><ul><ul><li>What is typical? What is most resource intensive. </li></ul></ul><ul><ul><li>Determine the appropriate mix of work </li></ul></ul><ul><ul><ul><li>Typical navigation and screen flow </li></ul></ul></ul><ul><ul><ul><li>% of time each screen is accessed by user </li></ul></ul></ul><ul><ul><ul><li>Number of users to test with, number of different accounts to use (other factors impacting representative ness of test) </li></ul></ul></ul><ul><ul><ul><li>Include cases to test resource intensive activities and functions </li></ul></ul></ul><ul><ul><ul><li>Include cases where user may abandon session because r/t is too long </li></ul></ul></ul><ul><ul><ul><li>Test for time-outs </li></ul></ul></ul>
    22. 22. Load Testing Parameters <ul><li>Simulating Volume and distribution of arrival rate </li></ul><ul><ul><li>Hourly volume- distribution is not uniform, “Bursty” arrival rate </li></ul></ul><ul><ul><ul><li>Web sessions are only about 3 minutes long </li></ul></ul></ul><ul><ul><ul><ul><li>When is traffic heaviest? </li></ul></ul></ul></ul><ul><ul><ul><ul><li>How long does the user spend at the site? </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Need to vary the number of users started over the hour/ User Think Time </li></ul></ul></ul></ul><ul><ul><li>Package Shipping Example: Different from web site- more predictable </li></ul></ul><ul><ul><ul><li>Arrival rate: highest in first hour </li></ul></ul></ul><ul><ul><ul><li>Limited by capacity of site to load the packages/speed of belts etc. </li></ul></ul></ul><ul><ul><ul><li>Package scanning: some automated but still has human involvement </li></ul></ul></ul>
    23. 23. Note: reduction in read bytes/sec over time X read bytes/second over time How long should the test run? Need to reach steady state! Test run is four hours here!
    24. 24. Creating the Test Environment in the Lab <ul><li>Creating the data/database </li></ul><ul><ul><ul><li>Copy database from production- subset it </li></ul></ul></ul><ul><ul><ul><li>Manually key/Edit some of the data </li></ul></ul></ul><ul><ul><ul><li>Create image copy of system for use in each run </li></ul></ul></ul><ul><ul><li>Verifying the test conditions </li></ul></ul><ul><ul><ul><li>Utilize ghost imaging or software such as Powerquest or Live State to save the database and system state between test runs </li></ul></ul></ul><ul><ul><ul><ul><li>May need to also verify configuration settings that aren’t saved in the image copy </li></ul></ul></ul></ul><ul><ul><ul><li>Make sure that you are simulating the correct conditions (End of Day/Beginning of Day/Normal production flow) </li></ul></ul></ul><ul><li>Scripting the key business functions </li></ul><ul><ul><li>Vary the test data as part of scripting </li></ul></ul><ul><ul><ul><li>Vary users/accounts/pathing </li></ul></ul></ul>
    25. 25. What type of staff do we need? <ul><li>Programmers </li></ul><ul><li>Korn Shell Programmers </li></ul><ul><li>Mercury Mavens? </li></ul>
    26. 26. Establish Metrics & Analysis Methodology <ul><li>Based on the testing objectives, what data do we need to collect and measure? </li></ul><ul><ul><li>CPU, Memory, I/O, network, response time </li></ul></ul><ul><li>What tools do we need for measurement? </li></ul><ul><ul><li>Do not over-measure </li></ul></ul><ul><ul><ul><li>Don’t risk over-sampling and incurring high overhead </li></ul></ul></ul><ul><ul><li>Create a Template to use for comparison between test runs </li></ul></ul>
    27. 27. Build a Template for Comparison <ul><li>Before vs. After Comparison of Test Cases </li></ul><ul><li>Collect the performance data- Metrics </li></ul><ul><ul><li>CPU: Processor Metrics </li></ul></ul><ul><ul><ul><li>System, User and Total Processor Utilization </li></ul></ul></ul><ul><ul><li>Memory: </li></ul></ul><ul><ul><ul><li>Available bytes, Page reads/second, Page Ins/second, Virtual/Real bytes </li></ul></ul></ul><ul><ul><li>Network </li></ul></ul><ul><ul><ul><li>Bytes sent/received, Packets sent/received per NIC </li></ul></ul></ul><ul><ul><li>Disk </li></ul></ul><ul><ul><ul><li>Reads and Writes/second, Read and Write bytes/second, Seconds/Read, Seconds/Write, Disk utilization </li></ul></ul></ul><ul><ul><li>Process: SQL Server (2 instances ) </li></ul></ul><ul><ul><ul><li>CPU </li></ul></ul></ul><ul><ul><ul><li>Working set size </li></ul></ul></ul><ul><ul><ul><li>Read/Write bytes per second </li></ul></ul></ul><ul><ul><li>Database- SQL </li></ul></ul><ul><ul><ul><li>Database Reads/Writes per instance, Stored Procedure Timings </li></ul></ul></ul><ul><ul><ul><li>Log Bytes flushed per database </li></ul></ul></ul>
    28. 28. CASE STUDY <ul><li>Packaging- Shipping System </li></ul><ul><ul><li>Many centers throughout the country </li></ul></ul><ul><ul><li>Same Applications </li></ul></ul><ul><ul><li>Same Hardware </li></ul></ul><ul><li>Testing in the lab is required to identify bottlenecks and optimize performance </li></ul><ul><ul><li>SLA not being met in some larger centers </li></ul></ul><ul><ul><li>Suspect Database Performance </li></ul></ul>
    29. 29. Case Study Configuration Architecture <ul><li>Database Server: </li></ul><ul><li>Runs 2 Instances of SQL (Main, Reporting) </li></ul><ul><li>Databases are configured on the X drives </li></ul><ul><li>TempDB and Logs are configured on D drive </li></ul>
    30. 30. Scanning the package on the Belt IF SLA not met packages aren’t processed automatically Additional manual work is required to handle exceptions
    31. 31. Case Study – Hardware Application Server DatabaseServers Database 1 Database 2 <ul><li>Database Server- DB #1 </li></ul><ul><ul><li>G3 (2.4 GHz) with 4 GB memory </li></ul></ul><ul><ul><li>Raid 10 Configuration </li></ul></ul><ul><ul><ul><li>Internal </li></ul></ul></ul><ul><ul><ul><ul><li>1 C/D logically partitioned </li></ul></ul></ul></ul><ul><ul><ul><li>External (10 slots) </li></ul></ul></ul><ul><ul><ul><ul><li>2 X drives- mirrored </li></ul></ul></ul></ul><ul><ul><ul><ul><li>2 Y drives- mirrored </li></ul></ul></ul></ul><ul><li>Application Server </li></ul><ul><ul><li>G3 (2.4 GHz) with 3 GB memory </li></ul></ul><ul><ul><li>2 Internal Drives (C/D) </li></ul></ul><ul><li>Database Server- DB # 2 </li></ul><ul><ul><li>G3 (2.4 GHz) with 4 GB memory </li></ul></ul><ul><ul><li>Internal </li></ul></ul><ul><ul><ul><li>1 C/D logically partitioned </li></ul></ul></ul><ul><ul><ul><li>2 X mirrored drives </li></ul></ul></ul>
    32. 32. Case Study: Software and OS <ul><li>Windows 2000 </li></ul><ul><li>SQL Server 2000 </li></ul><ul><ul><li>2 Database Instances </li></ul></ul><ul><ul><ul><li>Reporting </li></ul></ul></ul><ul><ul><ul><li>Main Instance- Multiple Databases </li></ul></ul></ul><ul><ul><ul><ul><li>Replication of Main Instance to Reporting Instance on the same server </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Main Instance and Reporting Instance share same drives </li></ul></ul></ul></ul>
    33. 33. Case Study: When do we test in the Lab? <ul><li>Hardware Changes </li></ul><ul><li>OS Changes </li></ul><ul><li>Software patch level changes to main suite of applications </li></ul><ul><li>Major application changes </li></ul><ul><li>Changes to other applications which coexist with primary application suite. </li></ul>
    34. 34. Checklists and Forms <ul><li>Test Objectives </li></ul><ul><li>Application Groups must identify: </li></ul><ul><ul><li>Specific application version to be tested as well as those of other co-dependent applications </li></ul></ul><ul><ul><li>Database set-up to process the data </li></ul></ul><ul><ul><li>Special data </li></ul></ul><ul><ul><li>Workstation set-up </li></ul></ul><ul><li>Volume- Induction rate/flow(arrival rate) </li></ul><ul><li>Workflow and percentages </li></ul><ul><ul><li>Scripts/percentage/flow rate </li></ul></ul>
    35. 35. Case Study: Hardware Checklist
    36. 36. Sign-offs on Procedures/Pre-flight <ul><li>Who? </li></ul><ul><ul><li>Applications team </li></ul></ul><ul><ul><li>Lab group </li></ul></ul><ul><ul><li>Systems groups </li></ul></ul><ul><ul><ul><li>Network </li></ul></ul></ul><ul><ul><ul><li>Distributed Systems </li></ul></ul></ul><ul><ul><ul><li>Database </li></ul></ul></ul><ul><ul><ul><li>Performance </li></ul></ul></ul>
    37. 37. Script Development: Collected data from Production Systems <ul><li>Applications to include for testing and to be used to determine resource profiles for key transactions and business functions </li></ul><ul><li>Volumes to test with </li></ul><ul><li>Database conditions including: database size, database state requirements (e.g. end of day conditions) </li></ul><ul><li>Application workflow- based on operational characteristics in various centers </li></ul><ul><ul><ul><li>Job and queue dependencies </li></ul></ul></ul><ul><ul><ul><li>Requirements for specific data feeds to include </li></ul></ul></ul><ul><li>  </li></ul>
    38. 38. Case Study: Developing a Script <ul><li>Major business functions for labeling and shipping : </li></ul><ul><ul><li>Verifying the name and address of the item to be shipped </li></ul></ul><ul><ul><ul><li>Interface to other system and uses algorithms for parsing names/addresses </li></ul></ul></ul><ul><ul><li>Route planning- interface with OR systems to optimize routing </li></ul></ul><ul><ul><li>Scanning the package information (local operation) </li></ul></ul><ul><ul><ul><li>Determining the type of shipment: freight/letter/overnight small package for shipping the item, and the appropriate route </li></ul></ul></ul><ul><ul><li>Sorting the packages according to type of shipment </li></ul></ul><ul><ul><li>Printing the “smart labels” </li></ul></ul><ul><ul><ul><li>how/where to load the package </li></ul></ul></ul><ul><ul><li>Tracking the package </li></ul></ul>
    39. 39. Case Study: Performance Testing in the Lab <ul><li>Production Analysis indicated: </li></ul><ul><ul><li>Insufficient memory to support database storage requirements </li></ul></ul><ul><ul><ul><li>Resulting in increased I/O processing </li></ul></ul></ul><ul><ul><li>OPTIONS </li></ul></ul><ul><ul><ul><li>Add memory </li></ul></ul></ul><ul><ul><ul><ul><li>Not feasible requires OS upgrade to address more than 4 GB of storage with Windows 2000 Standard Edition </li></ul></ul></ul></ul><ul><ul><ul><li>Make the I/O faster- faster drives or more drives </li></ul></ul></ul><ul><ul><ul><ul><li>Spread the I/O across multiple drives (external disk storage is expandable up to 10 slots available) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Separate the database usage across 2 sets of physical drives </li></ul></ul></ul></ul><ul><ul><ul><li>Split the database across multiple servers (2 database servers) </li></ul></ul></ul><ul><ul><ul><ul><li>Easier upgrade then OS change </li></ul></ul></ul></ul><ul><ul><ul><li>Change the database design (Expected in 1Q2006, testing now) </li></ul></ul></ul>
    40. 40. Planning: Testing out the configuration options <ul><li>Test out each of the options and provide a recommendation </li></ul><ul><li>SLA: 99% of packages must complete their processing in under 500 milliseconds </li></ul><ul><li>Each option was evaluated based on its relative ability to satisfy the SLA criteria. </li></ul>
    41. 41. Validating the baseline: Taming the West! If you can’t measure it, you can’t manage it! (CMG slogan)
    42. 42. Case Study What are we measuring? <ul><li>End to End Response Time (percentiles, average) </li></ul><ul><li>SQL Stored Procedure Timings (percentiles, average) </li></ul><ul><ul><li>SQL Trace information summarized for each stored procedure for a period of time </li></ul></ul><ul><li>Perfmon: System, Process, SQL (average, max) </li></ul><ul><ul><li>CPU, Memory, Disk </li></ul></ul><ul><ul><li>Process: Memory, Disk, Processor </li></ul></ul><ul><ul><li>SQL: Database Activity, Checkpoints, Buffer Hit etc. </li></ul></ul>
    43. 43. Validating the Baseline <ul><li>Data from two production systems was obtained to produce: </li></ul><ul><ul><li>Test database from multiple application systems </li></ul></ul><ul><ul><ul><li>Database states were obtained, system inter-dependencies were satisfied, application configuration files </li></ul></ul></ul><ul><li>Baseline test was executed- Multiple Iterations </li></ul><ul><ul><li>Performance measurements from two other systems were collected and compared against baseline execution </li></ul></ul><ul><ul><li>Results were compared </li></ul></ul><ul><li>Database and scripts were modified to better reflect production conditions </li></ul>
    44. 44. Story: Creating a new Environment <ul><li>A series of performance tests were conducted in Green Environment to evaluate I/O performance </li></ul><ul><ul><li>To be reviewed in presentation on Thursday 12-8. </li></ul></ul><ul><li>Green Environment was required for another project. So moved to a new “ Red Environment ” </li></ul><ul><ul><li>Data created from a different source (2 different production environments) </li></ul></ul><ul><ul><ul><li>Simulating high volume </li></ul></ul></ul><ul><li>What happened? </li></ul><ul><ul><li>Different page densities </li></ul></ul><ul><ul><ul><li>Different distribution of package delivery dates </li></ul></ul></ul><ul><ul><li>Different database size for critical database </li></ul></ul><ul><ul><ul><li>Red was much fatter! </li></ul></ul></ul>
    45. 45. Analysis to evaluate new Baseline <ul><li>Compare I/O activity for Green and Red </li></ul><ul><ul><li>Metrics: </li></ul></ul><ul><ul><ul><li>End to End Response Time </li></ul></ul></ul><ul><ul><ul><li>SQL Stored Procedure Timings </li></ul></ul></ul><ul><ul><ul><li>SQL Activity </li></ul></ul></ul><ul><ul><ul><ul><li>Database Page Reads/Writes overall and for each database </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>(X drive containing database) </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><li>Log Bytes Flushed per second (each database)- </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>D-drive (logs) </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><li>SQL Read and Write bytes/second </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>SQL reads and writes is overall so it includes database I/O and log activity </li></ul></ul></ul></ul></ul><ul><ul><ul><li>Disk Activity </li></ul></ul></ul><ul><ul><ul><ul><li>Overall Drive D/X Read/Write bytes/second </li></ul></ul></ul></ul>
    46. 46. Comparing Overall Response Time Red vs. Green and Separate Server Green and Red tests with 2 mirrored pair of X drives are baselines Results of baselines should be comparable!!!
    47. 47. Comparison of Green and Red Environments (X drive –database) Read Activity 16% higher Write Activity 38% higher
    48. 48. Comparison of Green and Red Environments (D drive –Tempdb/logs) Read Activity 1% higher Write Activity 13% higher I/O activity is approximately same on D drive
    49. 49. Comparison of I/O Load SQL Activity: Green vs. Red Increase in Reads in Red due to Main Increase in Writes in Red caused by both
    50. 50. I/O Load Change: Main Instance Separate server vs. Baseline Read Activity is reduced by 43% with separate server
    51. 51. Differences between Red and Green <ul><li>D Drive activity is approximately the same </li></ul><ul><ul><li>TempDB and logging </li></ul></ul><ul><li>X Drive activity is increased in Red environment </li></ul><ul><li>Most of differences are due to an increase in Reads on X drive for Main Instance </li></ul><ul><ul><li>Implies that the database was much “fatter” </li></ul></ul><ul><ul><li>Confirm this by reviewing Page reads/Page Writes per database from SQL statistics </li></ul></ul><ul><ul><li>Review database sizes (unfortunately didn’t have this data so we inferred it based on I/O data and SQL trace data) </li></ul></ul><ul><ul><li>SQL trace data showed more Page Reads for key databases </li></ul></ul>
    52. 52. Red Environment: Comparing Three Days <ul><li>Background </li></ul><ul><ul><li>Several large databases: </li></ul></ul><ul><ul><ul><li>Main: UOWIS, PAS </li></ul></ul></ul><ul><ul><ul><li>Reporting: Adhoc, UW1, Distribution </li></ul></ul></ul><ul><ul><li>4-1: Replication turned off for UW1 database </li></ul></ul><ul><ul><li>4-4: Replication on for UW1 database </li></ul></ul><ul><ul><li>4-8: Separate server for UOWIS, replication turned on for UW1 </li></ul></ul><ul><li>Expectations </li></ul><ul><ul><li>4-1 will perform better than 4-4 reduce I/O significantly </li></ul></ul><ul><ul><ul><li>Expect significant reduction in Reporting Database I/O </li></ul></ul></ul><ul><ul><li>4-8 separate server will separate out the critical database </li></ul></ul><ul><ul><ul><li>Expect same amount of work performed as 4-4 but a reduction in Read Activity for UOWIS because data will now be in memory </li></ul></ul></ul>
    53. 53. Reviewing Log Write Activity Note: No log bytes – no replication Of UW1 database on 4-4
    54. 54. RED: Comparing Three Days Database Disk Activity Note: 4-8 UOWIS results are for separate server Increase in work performed on 4-8 vs. 4-4
    55. 55. Comparing Database Reads/Writes Main Instance
    56. 56. Comparing Database Reads/Writes Reporting Instance Total page reads for reporting instance should remain constant Why did it increase on 4-8?
    57. 57. Where are the differences on the two days ? Note: Differences in Stored Procedure- Total Reads (logical) Data Cap Summary and in Belt Summary Reports (not main functionality)
    58. 58. What have we uncovered about test differences? <ul><ul><li>Processor usage approximately the same </li></ul></ul><ul><ul><li>Amount of Write Activity per instance is same </li></ul></ul><ul><ul><ul><li>Reviewed log bytes/flushed for each instance </li></ul></ul></ul><ul><ul><li>Reporting instance performed more I/O- more reads </li></ul></ul><ul><ul><ul><li>Additional report jobs were executed on 4-8 and not on 4-4 </li></ul></ul></ul><ul><ul><ul><li>Reports run 4 times per hour (every 15 minutes- causes burst in I/O activity) </li></ul></ul></ul><ul><ul><ul><ul><li>When UOWIS database is on the same server (sharing same drives as other Main Instance and Reporting Instance work) response time is higher </li></ul></ul></ul></ul><ul><ul><li>Response Time is directly related to physical reads and physical disk read performance </li></ul></ul><ul><ul><li>Spreading the I/O across more drives and/or providing more memory for the critical database instance improves performance </li></ul></ul>
    59. 59. Testing Summary <ul><li>Need to create and follow a test plan which outlines </li></ul><ul><ul><li>All pre-flight procedures </li></ul></ul><ul><ul><li>Confirm that environment is ready to go </li></ul></ul><ul><ul><li>Validate baselines </li></ul></ul><ul><ul><li>Run tests in organized fashion following the plan </li></ul></ul><ul><li>Do a sanity check! </li></ul><ul><ul><li>Do results make sense </li></ul></ul><ul><ul><li>Otherwise search for the truth- don’t bury the results </li></ul></ul>
    60. 60. Measurement Summary <ul><li>The nature of performance data is that it is long tailed </li></ul><ul><ul><li>Averages aren’t representative </li></ul></ul><ul><ul><li>Get percentiles </li></ul></ul><ul><li>Need to understand the variability of tests conducted </li></ul><ul><ul><li>Run the same test multiple times to obtain a baseline </li></ul></ul><ul><ul><ul><li>Helps you iron out your procedures </li></ul></ul></ul><ul><ul><ul><li>Can get a measure of variability of test case so that you can determine if the change you are testing is significant </li></ul></ul></ul><ul><ul><ul><ul><li>If the variability experienced between your base test runs is small that is good- you have repeatability </li></ul></ul></ul></ul><ul><ul><ul><ul><li>If the variability is large </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>You need to make sure that any change you make shows an even greater change </li></ul></ul></ul></ul></ul>
    61. 61. Reporting the Test Results:Template <ul><li>Executive Summary </li></ul><ul><ul><li>Graphs of results- e.g., end to end response time </li></ul></ul><ul><ul><ul><li>Scalability of solution </li></ul></ul></ul><ul><ul><li>Overall findings </li></ul></ul><ul><li>Background </li></ul><ul><ul><li>Hardware/OS/Applications </li></ul></ul><ul><ul><li>Scripts </li></ul></ul><ul><li>Analysis of Results </li></ul><ul><ul><li>System and application performance </li></ul></ul><ul><ul><ul><li>Decomposition of response time </li></ul></ul></ul><ul><ul><ul><ul><li>Web tier, Application, Database </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Drill down again for details as necessary e.g., database metrics </li></ul></ul></ul></ul></ul><ul><li>Next steps </li></ul>
    62. 62. Summary <ul><li>Can’t always simulate everything - do the best you can. </li></ul><ul><li>Implement the change in production and go back to the lab to understand why it matched or didn’t </li></ul><ul><li>When you discover a problem, </li></ul><ul><ul><li>Apply what you’ve learned </li></ul></ul><ul><ul><li>Make necessary changes to procedures, documentation, methodology- in the lab and recommend changes for outside the lab </li></ul></ul><ul><li>Improve the process, don’t just bury or hide the flaws! </li></ul><ul><ul><li>Result: better testing and smoother implementations </li></ul></ul>
    63. 63. Questions????????? <ul><li>Contact Info: </li></ul><ul><ul><li>Ellen Friedman </li></ul></ul><ul><ul><li>SRM Associates, Ltd </li></ul></ul><ul><ul><li>[email_address] </li></ul></ul><ul><ul><li>516-433-1817 </li></ul></ul><ul><ul><li>Part II. To be presented at CMG Conference </li></ul></ul><ul><ul><li>Thursday 9:15-10:15 </li></ul></ul><ul><ul><ul><li>Session 512 </li></ul></ul></ul><ul><ul><ul><li>Measuring Performance in the Lab: </li></ul></ul></ul><ul><ul><ul><li>A Windows Case Study </li></ul></ul></ul>

    ×