How we define Performance• Response Time The time it takes for an application to respond to a user request.• Throughput The amount of concurrently processed request in a given time period. e.g.: requests/minute.• Availability Status of our systems. Are they up and running for us and our users?• Accuracy Is the response what the user actually expected? Are there any errors
Performance Pyramid Business Application Container System Performance
Types of Measurement • Cyclic Measurements – Are collected ar regular time intervals – Are time based – JMX, CPU, Memory • Event-based measurements – Are collected as a request occurs – Are transactional – Response Times, CPU consumption
How can this happen?A: Our response time is 2.3 secondsB: Our response time is 1.5 secondsC: Our response time is 6 seconds
Measurement Aggregation• Minimum and Maximum The best and the worst request. Beware of outliers.• Average Sum/Count. Uses in many cases. Quality depends on actual values.• Median What 50 percent of our users see.• Percentile What n percent of our users see.
How do we manage performance and solve problems?
Types of Problems • Data-Driven Problems Problems which occur for specific users, scenarios and which depend on (input) parameters e.g.: a search query • Load Driven Problems Problems which depend on the current system load. Usually occur at higher load. In most cases resource dependent. e.g.: increased response time with more users. • Environment Driven Problems Problems caused by factors outside the application. e.g.: hardware, network connectivity, etc.
Operations Main Tasks• Monitoring Collect all relevant KPIs and check against SLAs and baselines?• Alerting/Incident Management Inform ops about problems• Impact Analysis Analyze the impact of issues. Who is affected?• Isolation What is the cause of a problem and who must I talk to?• Diagnosis Why is there a problem and how can we fix it?
Key Performance Indicators/Metrics• Visitors and Requests How many people are using our site?• Response Times How fast do we service requests.?• Errors and Failed Transactions How many problems do we see and how often do they affect functionality?• Availability Can our systems currently be reached?• Utilization Metrics Do we have enough resources and are we using them efficiently?