GPD Service performance testing and root cause analysis is an engagement between Company-X and TechVoyant Private Limited to establish and ascertain problems that Company-X is facing in the performance of their project reporting web service called Global Delivery Dashboard.
Application Performance Analysis Finding Bottlenecks in End-user Experience of Applications
Introduction
VISHWANATH RAMDAS
TECHVOYANT INFOTECH PRIVATE LTD
IT INFRASTRUCTURE DESIGN AND MANAGEMENT SERVICES
Application Performance depends on
Client Nodes & Configuration
Network Infrastructure
Devices | Links Capacity
Usage of the Capacity Patterns
Server nodes hardware
Capacity to Handle Transactions
Available Resources
CPU | Memory
OS Services
Web Server
Application Design
Application Code
Database Queries
Typical Stages in Performance Analysis
Understand the Problem
Design the Experiment
State the Null Hypothesis
Factor all Elements in the End user Experience.
Design experiments to eliminate variables.
Collect Data
Where do you measure
How do you measure
What do you measure
Analyze Data
Verify THE HYPOTHESIS
Analyze and contextualize information.
Arrive at causes and suggest remedies
Capture All the Elements – Fish Bone Diagram GPD
Key Challenges in Performance Analysis
How do you design the experiment to isolate the variables in the problem.
How do you probe and Measure
Most Infrastructure Elements are known
Context specific Measurement of the Elements is difficult
What Mathematical Model during Analysis ?
Large Intranet Web Application Case Study ..
Global Projects Delivery (GPD) – Context
The GPD server is based out of Mumbai
The server is accessed by users from multiple locations
Bangalore 3 locations
Mumbai 3 locations
Hyderabad
The server is critical to monitoring & managing project progress across all Development Centers in India
User base of 6000+ users
GPD Architecture Overview Web Layer /IIS HTTP/HTT COM Components Services Layer ASP ASP ASP Messaging/ Mail Extraction Business Processing Validation Graphs/ Charts Query/ Report Building MS Project Utility Active Report Server Microsoft Project Pinnacle Graphic Server Microsoft CDONTS SQLServer Data Access Layer ADO Report Browser
Current Concern Area
GPD Server response has been erratic
User experience has been varying
Across Locations worsening in remote locations
Across time
Frequent transaction failure and disconnects
Probes – Onion Ring formation for differential Analysis. Based on the network and system design given by Company-X, we planned to deploy 12 probes over different layers of the infrastructure
Test Implementation – Approach.
Set up Server
Set up probes at key locations on the network
Simulate user transactions
Assign Simulations & Schedule
Transact & Collect performance Data
PROBES SIMULATION PROBING DATA COLLECT TEST APPLICATION
Typical Internet Transaction Where is the server? DNS resolution Client Identifies server Connect to server with request (GET) Client connects to server Server Response with Initial byte of data Includes Web server Application Server Database .. Server responds With request Time to download data fully Including Page Layout Page objects .. images frames (which form requests) Page content Request is transmitted to the client
Study was done in 3 STAGES
Study Web Server Logs for user access Patterns
Log files of the GPD Server
Identify most accessed pages within GPD.
Setup Experiment Server
Setup & Deploy HP Internet Services Tool to probe
DNS Time; Connect Time; Server response; Data transfer;
Set Up probes in critical customer locations & Collect Data for over 1 week
De-duplicate and massage data collected
Analyze & Report
Slice data collected across dimensions
Use differential data to triangulate on the root cause
Page wise analysis - 8 pages constitute 80% of hits … Selected as the Target Pages in the Test Environment 7.5% others 0.6% PM/DetailsDA.asp 0.7% IBIsEntForRvw.asp 0.8% IB/IBQueryBld.asp 1.0% PM/ProjectRES.asp 1.1% EmployeeSel.asp 1.1% PMDashboard.asp 1.4% ReportUIBld.asp 3.2% IBIssueAssgn.asp 3.3% DailyActivityMatrix.asp 4.1% CommonPage.asp 5.3% DeveloperDatabase.asp 7.0% Introduction.asp 7.0% IBIssueList.asp 8.8% CommonList.asp 47.2% DailyActivity.asp
Navigation bar forms 71% of access > large overhead to access relevant pages
¾ of transacted pages are the navigation bar.
Among the remaining ¼ transactions the following pages are key
Dailyactivity.asp
Commonlist.asp
ProjectResources.asp
TaskAssignment.asp
In English! these key pages are 3.2% 3.3% 4.1% 5.3% 7.0% 7.0% 8.8% 47.2% Assign Work Request Task Up Date for a period Project Task List for User ?? Back end page that runs with login List Work Request Received Project List for logged user Task Up date IBIssueAssgn.asp DailyActivityMatrix.asp CommonPage.asp DeveloperDatabase.asp Introduction.asp IBIssueList.asp CommonList.asp DailyActivity.asp
Time analysis of traffic- Most of the Volume happens @ 3 points in the day
The design will consist of probe schedule of 5 min access within 15 min intervals to address
Morning Session peak
Post lunch session Peak
Evening Close out peak
This would be the same for all 8 probes across the network
Usage pattern on 2 nd November..
Large Usage early in the morning
A marginal peak in the evening
Server response is within control!
Maximum response time for any page was less than 8 seconds
96% + responses were within 0.05 seconds!
Very few page/transaction drops @ server!
2 % transactions failed
failed transactions were mainly due to:
Server error (83%)
Page Not Found: 404 (15%)
98% of response is in Data Transfer.. Server Response with Initial byte of data Includes Web server Application Server Database .. Server responds With request 98% 1.6% ~0% ~0% Where is the server? DNS resolution Client Identifies server Connect to server with request (GET) Client connects to server Time to download data fully Including Page Layout Page objects .. images frames (which form requests) Page content Request is transmitted to the client
98% of response is regardless of location or time…. Overall SERVER IS NOT THE BOTTLE NECK
In 2 nd run key pages were smaller but down load times didn’t reduce!
Key page sizes are ~ 50% smaller in size
YET! Download (transfer times ) approximately the same
Network is still the bottleneck! (98.4%)
For e.g. in Common list & daily Activity the transfer time increases 3 – 5 X with more requests.
Multiple Requests is causing a slow down in data transfer > APPLICATION DESIGN!
Pages with multiple requests result in data transfer at much lower rates.
For e.g. Commonlist.asp
Project list of the user
Multi Requests also do not improve performance during lean hours like midnight!
2 nd run – multiple requests continue to effect transfer throughput
Still 50% slower throughput in multiple request pages.
How Does a typical page load?
Initially the layout is loaded (frames | tables)
Frame Content is downloaded as requests
Objects in each frame are requested from the server.
Pages with multiple frames download the data in multiple requests
That’s like a waiter bringing your dinner as individual items rather than with a tray!
Some insights
Frequently visited pages are large with multiple frames These pages have large foot-prints
Multiple requests sought by web/app server to generate a single page
Task update (dailyactivity.asp) is most accessed page.
Access is 6 - 10 X times more compared to other pages
Multiple keystrokes & pages to reach the desired page
Leading to excessive data transfer over the network
Small relevant payload
large redundant data overhead –rich with images & frames.
Overall 7 % of transactions failed: mostly in login.asp ~ IIS Web server?
The most sensitive pages to transaction failure
20% > Login.ASP > Is there a problem with IIS?
9% > Employee Select.ASP
8% > Common List.ASP > nothing conclusive! SQL ? COM+
Page failure was not related to page size!.
Conclusions
There is a need to reduce data volume at source by
Changes to the presentation layer
Changes in sequence of pages
Remove redundant objects/images
De-link application logic from presentation and data
There is a need to re-engineer the network to handle more traffic
Relocating the server closer to majority users
Shifting the network hub closer to server
Removing bottlenecks across the network
Router, switch, firewall configuration
Expanding access pipes
QOS, Shaping
Physically
Current VOLUME OF DATA CAPACITY OF PIPE Improved
Actions from Discussion HI | Need to create the extra access HI | reduces data volume Create Separate portals for Managers & Users. Users could have frameless simple pages NIL MED : reduce multiple requests. LO : reduce redundant hits to server HI : reduce redundant hits to server HI : change design of pages HI : potential to improve data transfer speeds by multiples APPLICATION Reduce # of requests by reducing FRAMES; Images (not as critical) Improve presentation flow | Ensure users need fewer clicks to access important pages like daily Activity. Remove images on the pages GPD ; Company-X :hourglass HI : Consulting engagement HI : Identify specific device & design related issues Study and optimize data routing between clients and server HI : direct cost increase HI : More space less latency Introduce more bandwidth on to the pipe Lo : Efforts minimal but impact on other applications could be adverse. HI : More space less latency Increase Pipe Allocation from current burst max of 2 MB to 3 MB HI : move hub to server location MED : fewer hopes fewer drops Bring network HUB close to server location HI : need to relocate production server to BLR HI : reduces load on the link (2/3 users are in BLR & HYD >> free pipe by 50% traffic BRING SERVER CLOSER TO MASS OF USERS NETWORK EFFORT IMPACT ACTION
0 comments
Post a comment