Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
^Bottlenecks Exposed:
The Most Frequently Found Performance
Problems – and How to Nail Them!
Dan Downing, VP Testing Servi...
2
• Identify common website performance bottlenecks:
• Source (what component they occur on)
• Symptom (how you know there...
3
Terms & Concepts
• Application Performance Testing: A repeatable methodology for volume-simulation
of real-world applica...
4
Symptoms
• “It’s Too Slow”
– As perceived from slow browser response by functional
testers
– As measured by poor scalabi...
5
3-Tier Environment
• Network
– Firewall, load balancer, routers, network interface
cards, cabling between all components...
6
Performance Bottleneck Sources
Network
Web ServerApp Server
DB Server
30%16%>30%
16%12%21-30%
25%40%11-20%
27%29%<10%
Nt...
7
Performance Bottleneck Sources
In my experience, it’s the application! (~80% of the time)
Network
8% Web Server
12%
App ...
8
Database (Simple) Anatomy
Data
Data
Data
Log
BI
ClientCommBuffer
Query
Parser
Query
Opti-
mizer
Query
Plan
Storage
Query...
9
Key DB Server Measurements
Should be ~80% of available user memory on Server, and should average < 75%; else, add!DB Mem...
10
DB Server Causes & Cures
Pinpoint and correct!Inefficient access method; too many DB
connections; small comm buffers;…
...
11
Inefficient SQL statement
24%
Inefficient SQL query model
17%
Inefficient DB configuration
14%
Hi row-at-a-time logic
1...
12
Example:
B2B Supply Chain Management
• Symptom:
– Transactions that return list data running
very slowly; they don’t sc...
13
LR Oracle Monitor
Table scan blocks
average = 12
Index fast full
scans = 0
14
App Server (Simple) Anatomy
ConnectionMgr
Presentation
Manager
Object
Cache
DB ServerApp Server (e.g. usually two; Sun
...
15
Key App Server Measurements
Should see all app server instance doing similar amount of work; else indicates load balaci...
16
App Server Metrics & Cures
CureMeasurementCause
Pinpoint and correct!Low OS resources; erratic
transaction performance
...
17
App Server Causes
Memory leak
15%
Inefficient garbage
collection
12%
Sub-optimal session model
12%
Poorly configured Ap...
18
Example:
B2C Large Retail Web Store
Web Server Sun E420
DB Server Sun E4500
App Server Sun E420
Oracle
• Symptom:
– App...
19
Web Server Metrics & Cures
CureMeasurementCause
Add cpus, memory; add web servers;
distribute content; add specialized
...
20
Web Server Causes
Security too tight
8% Broken links
8%
Inefficient transaction
design
11%
Other
12%
Hi SSL transaction...
21
Example:
B2E Collaborating Communities
Web/ App
Server Dell 1550
DB Server Dell 2450
SQL Server
IIS/Visual
Basic
SQL
Se...
22
Network Metrics & Cures
Review/tune configuration of
NICs, Routers, other devices
Hi latency values in network
delay mo...
23
Network Causes
Load balancing ineffective
22%
Insufficent overall
bandwidth
13%
Security too tight
15%
Poorly
configure...
24
Web Server Sun E420
DB Server Sun E4500
App Server Sun E420
Oracle
Example:
B2C On-line Printing Services
• Symptom:
– ...
25
Monitoring Tools
• LoadRunner
– Transaction performance monitor
– Server resource monitor
– Oracle, SQL Server, selecte...
26
Tool Example:
WebLogic Console
27
Lessons Learned
1. 80% of the time it is the application or system software, not
the infrastructure!
2. Make friends wi...
28
Questions?
ddowning@mentora.com
Upcoming SlideShare
Loading in …5
×

Bottlenecks exposed

307 views

Published on

Bottlenecks exposed

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Bottlenecks exposed

  1. 1. ^Bottlenecks Exposed: The Most Frequently Found Performance Problems – and How to Nail Them! Dan Downing, VP Testing Services MENTORA Atlanta • Boston • DC • San Jose 404.250.6515 • www.mentora.com Bottlenecks Exposed – Title Slide Web Application Copyright Mentora 2001
  2. 2. 2 • Identify common website performance bottlenecks: • Source (what component they occur on) • Symptom (how you know there’s a problem) • Causes (what creates the problem) • Measurements (how to nail it) • Cures (how to make it go away) • Illustrate with examples of B2C, B2B, B2E cases Audience: Performance Engineer, Load Testing Expert, with intermediate experience Objectives
  3. 3. 3 Terms & Concepts • Application Performance Testing: A repeatable methodology for volume-simulation of real-world applications in a customer’s environment to yield performance results that can be implemented to deliver efficient utilization of computing resources. • Scalability: The demonstrated ability (or lack thereof) of a system (or component) to yield the same response time of a business process irrespective of the magnitude of the load applied to the system. • Bottleneck: A hardware component or process or software of the system-under-test that is causing performance degradation and low scalability under load. • Resource Utilization: The quantification of a shared computing resource being consumed by an application process or component. • Symptom: The outwardly visible but unquantifiable effect of a performance bottleneck • Cause: The specific and measurable factor yielding one or more symptoms. • Cure: The specific action applied to the Cause that will measurably improve the visible symptom. • Measurement: A numeric value of a performance-affecting factor that can be quantified by a monitoring tool and related to a specific component of the system- under-test.
  4. 4. 4 Symptoms • “It’s Too Slow” – As perceived from slow browser response by functional testers – As measured by poor scalability during first low-load test – As experienced (too late!) by low productivity by real production users • “It’s broken” – Page ‘never returns’ after button press – Web server errors (404, 500…) – Application error messages in application logs Symptoms are usually very unspecific!
  5. 5. 5 3-Tier Environment • Network – Firewall, load balancer, routers, network interface cards, cabling between all components • Web Server Tier – One or more (usually many) low capacity computers that receive, route, and display results of http requests from visitors’ browsers • Application Server Tier – One or more (often 2) medium-high capacity computers that receives, applies business logic to, and returns to the web server the results of the http request • Database Server Tier – One or more (usually one with redundant stand-by) high capacity computers that operate database software, and access database (often on large disk arrays) for servicing user data requests Web Server Sun E220 DB Server Sun E4500 App Server Sun E420 Oracle
  6. 6. 6 Performance Bottleneck Sources Network Web ServerApp Server DB Server 30%16%>30% 16%12%21-30% 25%40%11-20% 27%29%<10% NtwkWeb Srvr How often? What in your experience* do you find as the relative distribution of bottlenecks? 9%7%>60% 29%21%41-60% 32%48%21-40% 21%11%11-20% 7%11%<10% DB Srvr App Srvr How often? * Poll results of 56 Mercury Conference ’01 attendees of intermediate to advanced experience.
  7. 7. 7 Performance Bottleneck Sources In my experience, it’s the application! (~80% of the time) Network 8% Web Server 12% App Server 35% DB Server 45% - % distribution is a SWAG based on experience testing dozens of apps Most of the application code resides here… 21-40% (48%) 21-40% (32%) 11-20% (40%) >30%% (30%) Highest ranges from poll shown in color
  8. 8. 8 Database (Simple) Anatomy Data Data Data Log BI ClientCommBuffer Query Parser Query Opti- mizer Query Plan Storage Query Executor Metadata cache Write Buffer Shared Memory Data Cache Disk Array (e.g. Sun A10000) DB Server (e.g. Sun 4500 quad cpu 2 GB memory) DB Connection Pool App Server (e.g. Sun 420) Data SQL Data
  9. 9. 9 Key DB Server Measurements Should be ~80% of available user memory on Server, and should average < 75%; else, add!DB Memory Should be balanced across all drives, else indicates ‘db hot spot’ on large, hi-access tables, which need to be striped across multiple drives; avg 20% below disk IO saturation level Server I/O Correlates with cache-hit ratio; should decrease run-to-run as cache is tunedPhysical reads/writes A measure of the number of open client queries; should be low, or could be an indicator of inefficient query model Open cursors A measure of the data-intensiveness of queries; read bytes should be <50% of sent bytes, else indicates complex application queries should become stored procedures SQL*Net bytes rcvd/sent from/to client A general indicator of db load handling, and should be compared run-to-runTransactions/second Should be low (<20%); else could indicate under-sized query cache, old/no optimizer statistics, or flawed query model in app server function Parse-to-execute ratio Should be low for normal transactions (can be high for reporting functions); else indicates that indexes missing or poorly designed Table scan blocks/sec Should be zero at target loads; if not, indicated transaction model design problemDeadlocks Should be hi – 90-95% range; else data cache sized too low and too much physical IOCache Hit Ratio Should be low and constant, else yields virtual memory disk IO, which indicates insufficient memory allocated to DB processes Server Page Faults/s. Memory available should stay constant and average below 70-80%; else add memoryServer Memory Shows raw horsepower consumption on the server; should average 70-80%; else add cpus!Server CPU Impact/RangeMeasurement
  10. 10. 10 DB Server Causes & Cures Pinpoint and correct!Inefficient access method; too many DB connections; small comm buffers;… Other Fix application transaction codeDeadlocks non-zero /errors in error logDeadlocks rerun optimizer statisticshigh table scan blocks; many slow functions Out-of-date statistics Increase cache sizeLow cache-hit ratio, hi physical readsData cache too small Review/fix transaction logic; modify DB locking strategy Hi blocked transactions, high table locksInefficient concurrency model Raise size of query plan cacheHi parsed-to-executed queries ratioQuery plan cache too small Find/add/fix table indexeshigh table scan blocks; slow functionMissing/ineffective indexes Tune query prepares in App server / code Hi open cursors; hi bytes sent from clientOveruse of row-at-a-time processing Reconfigure DB (add memory, write processes, threads, …) Low correlation btw DB and Server resource utilization; unbalanced I/O Inefficient DB configuration Convert client SQL to stored procedures | optimize slow q’s Many slow pages; hi 'bytes recvd' by db server; low db cpu; or: many slow queries Inefficient SQL query model Analyze query plan, optimize query Slow page (>10 sec) which ties to a specific function, thus an SQL query; hi db cpu | IO Inefficient SQL statement CureMeasurementCause
  11. 11. 11 Inefficient SQL statement 24% Inefficient SQL query model 17% Inefficient DB configuration 14% Hi row-at-a-time logic 12% Missing indexes 9% Inefficient concurrency model 7% Query cache too small 7% Data cache too small 5% Other 5% Database Server Causes ~60% of the time the time it’s bad SQL or bad indexes!
  12. 12. 12 Example: B2B Supply Chain Management • Symptom: – Transactions that return list data running very slowly; they don’t scale • Measurement: (using LR Oracle Monitor) – Hi table scan blocks – Low index fast full scans • Cure: – Add additional indexes – Design indexes so queries can be resolved with index table columns w/o accessing base table – Enable fast scan Oracle parameter Web Server Sun E220 DB Server Sun E420 App Server Sun E420 Oracle Apache WebLogic Oracle
  13. 13. 13 LR Oracle Monitor Table scan blocks average = 12 Index fast full scans = 0
  14. 14. 14 App Server (Simple) Anatomy ConnectionMgr Presentation Manager Object Cache DB ServerApp Server (e.g. usually two; Sun 420 dual cpu 1GB memory) Data SQL Web Server Client Requests html pages Business Logic Presentation Logic SecurityMgr TransactionMgr DBConn.Mgr MessagingMgr Communic.Mgr
  15. 15. 15 Key App Server Measurements Should see all app server instance doing similar amount of work; else indicates load balacing problem Load balancing Should contain low/no error messages, low warnings; else indicates application problemsApplication log Memory should track App Server memory, should stabilize at target load at 70% average, else possible memory leak or add memory Server Memory Active sessions should rise with load, and stabilize at less than Total; if does not stabilize, indicates insufficient processing power to keep up with DB; if maxes out, too few connections Active/Total DB Pool Connections A general indicator of app server load as evidenced by web server request volume, and should be compared run-to-run and track with load applied Requests/second Should be a relatively low ratio vs. non-secure transactions (<15%?); else, eating up cpu, bwSSL transactions/sec Should be rise as load increases, stabilize at target load, approximate vendor target/instance; else, decrease inactive session keep-alive time Active/Total Sessions Memory should rise as active sessions grow, should shrink in garbage collection cycle, and should stabilize at target load at 70% average, else possible memory leak or add memory App Server memory Should be hi – 90% range; else data/object caches sized too low and too much physical IOCache Hit Ratios Should be low and constant, else yields virtual memory disk IO, which indicates insufficient memory allocated to App Server processes Server Page Faults/s. Shows raw horsepower consumption on the server; should average 70-80%; else add cpus!Server CPU Impact/RangeMeasurement
  16. 16. 16 App Server Metrics & Cures CureMeasurementCause Pinpoint and correct!Low OS resources; erratic transaction performance Other Change object access methodSlow object creationInefficient object access method Review/relax app securityHi calls on port 7002Inefficient security model Pinpoint & diagnose longest running business processes Slow specific business functionInefficiently coded transaction Raise DB connections; lower no. of App Server instances Steadily rising active connections, hi cpu utilization Poorly configured DB connection pool Add cpus, memory; decrease no. App server instances Hi cpu, memory, I/O utilizationInsufficient hardware resources Validate proper JVM-to-app server match; Increase data & object caches; add HW memory Low correlation btw App and HW resource utilization; overall poor performance Poorly configured App Server Tune session keep-alive settingSteadily rising active sessionsSub-optimal session model Tune app server load balancingSpikes in transaction timesInefficient garbage collection Find and fix memory faulty application code Memory utilization rises steadily, doesn't recover Memory leak
  17. 17. 17 App Server Causes Memory leak 15% Inefficient garbage collection 12% Sub-optimal session model 12% Poorly configured App Server 12% Insufficient hardware resources 10% Poorly configured DB connection pool 9% Inefficiently coded transaction 11% Inefficient DB access architecture 4% Inefficient object access method 5% Other 10% 60% of the time: object caching, SQL, db connection pool; 20% of the time: inefficient application server
  18. 18. 18 Example: B2C Large Retail Web Store Web Server Sun E420 DB Server Sun E4500 App Server Sun E420 Oracle • Symptom: – App server memory leak • Measurement: – Steadily increasing, non-recovering memory usage in Dynamo console – Memory exhausted and app server dies over 8 hour run • Solution: – Test individual functions – Isolate errant function not releasing memory – Fix code! – Re-test to validate fix (longevity test) Apache ATG Dynamo Oracle
  19. 19. 19 Web Server Metrics & Cures CureMeasurementCause Add cpus, memory; add web servers; distribute content; add specialized servers (images, streaming media…) Hi cpu, memory, I/O; timeout errors Insufficient hw capacity Tune web server configurationHi I/O, hi memory utilization, low throughput Poorly configured server Review/revise load balancing policiesUneven utilization across web servers Unbalanced load across servers Review/relax secure transaction model Memory utilization >70%, low throughput; hi port 443 calls Hi SSL transactions Diagnose App, DB serversLow OS resource utilization, overall poor throughput Other Reduce keep-alive time; correct transaction design Hi ip connections per active session Inefficient transaction design Diagnose / fix applicationBroken link errorsBroken links Direct firewall and user traffic to different ports Hi firewall-to-web server trafficSecurity too tight
  20. 20. 20 Web Server Causes Security too tight 8% Broken links 8% Inefficient transaction design 11% Other 12% Hi SSL transactions 13% Unbalanced load across servers 15% Poorly configured server 15% Insufficient hw capacity 18% Major contributor: Secure transactions; often: load balancing; sometimes: high-resource specialized functions (external links, email, chat)
  21. 21. 21 Example: B2E Collaborating Communities Web/ App Server Dell 1550 DB Server Dell 2450 SQL Server IIS/Visual Basic SQL Server Cisco Load Director • Symptom: – Slow overall performance – DB server low activity • Measurement: – Web/App server resources maxed out – Non-scalable transaction times • Solution: – Short-term: Move “Chat” function to dedicated server – Long-term: Re-architect system in java, separate Web and App tiers, introduce dedicated server for chat and email functions
  22. 22. 22 Network Metrics & Cures Review/tune configuration of NICs, Routers, other devices Hi latency values in network delay monitor; low throughput Poor network architecture CureMeasurementCause ??????Other Tune NIC buffers; add 2nd NIC for failover heartbeat Low throughput btw serversPoorly configured/insufficient network interface cards Loosen security policies; redesign application security High traffic btw firewall & servers Security too tight Get hoster to raise bw ceiling; increase system bw; add NICs for failover functions Low, maxed throughput; high collision rate Insufficient overall bandwidth Revise load balancing policyUneven load at web serversLoad balancing ineffective
  23. 23. 23 Network Causes Load balancing ineffective 22% Insufficent overall bandwidth 13% Security too tight 15% Poorly configured/insufficient NICs 10% Other 20% Poor network architecture 20% No single major cause; often problem is load balancing, security, or network architecture.
  24. 24. 24 Web Server Sun E420 DB Server Sun E4500 App Server Sun E420 Oracle Example: B2C On-line Printing Services • Symptom: – Low transaction performance scalability under load – High latency across load balancer • Measurement: – Unbalanced load on web server tier • Solution: – Replace load balancer (bad hardware) – Change load balancer policies from IP- based to server-load based Cisco Load Director
  25. 25. 25 Monitoring Tools • LoadRunner – Transaction performance monitor – Server resource monitor – Oracle, SQL Server, selected app servers monitors – Network delay monitor • Database performance monitoring tools – Quest Oracle Instance Monitor, Embarcadero, BMC DB Patrol • App Server System Console (from app server vendor) • Java object monitoring tools – JProbe, Performasure (Sitraka) • Network Analyzer (aka network sniffer) • Operating system utilities – Unix top, sar, vmstat, iostat – 2000/NT Perfmon
  26. 26. 26 Tool Example: WebLogic Console
  27. 27. 27 Lessons Learned 1. 80% of the time it is the application or system software, not the infrastructure! 2. Make friends with your app server, db server, and hardware monitoring tools! 3. Application architect, DBA, and App Server experts are indispensable and must be involved during load tests! 4. Arrive armed with the Top 10 Things to check for each component! 5. Id the measurements you need to be able to make 6. Systems Engineer with networking, firewall, and load balancer expertise is very handy!
  28. 28. 28 Questions? ddowning@mentora.com

×