Metrics that Matter
– Approaches to
Managing High
Performing Web
Sites


Ben Rushlo
Director Keynote Professional
Services...
Agenda
 User Centric System Approach
 Performance Management Begins with Metrics
 Metrics That Matter
 Diagnostic Proc...
Personal Background
 9 years at Keynote – Keynote Consulting Practice
 5 years at Keynote – Director of Keynote Consulti...
User Centric
System Approach
Change Has Come
 Single data center  Cloud hosting/services
 HTML  ASP/JSP
 JS  AJAX
 Animated GIFs  Sites Complet...
Change Has Come
 Your user has changed
  Decreased tolerance, increased expectations
  Utility/Always on
  Integrated ...
System Approach




“A system is a dynamic and complex whole,
  interacting as a structured functional unit”
Online Applications Are Complex
                                   Systems

                                              ...
Online Applications Are Complex
               Systems

     JSP                          ASP



            Application C...
Online Applications Are Complex
            Systems
Online Applications Are Complex
                             Systems
 While we have undergone rapid change in the area of...
Top Order Metrics
 In any complex system, there is an overwhelming
  number of metrics (things to measure that describe
 ...
Top Order Metrics
 Top order metrics require a top down approach
 It is virtually impossible to combine low level metric...
Top Order Metrics
 Performance management must begin and end at the
  end users perspective
  The end user provides
   ...
Performance
Management
Beings with Metrics
Data Collection
 Beginning with the users perspective (unifying approach) how do
  we collect data?
   Point in time?
  ...
Point In Time Tools
 Point in Time Tools
   User Feedback
   Yslow
   Google Page Speed
   Firebug
   HTTP Analyzer
...
What a Difference a Couple
                 Thousand Data Points Make
 Amazon Home Page
  HTTP Analyzer Trace
  81 requ...
What a Difference a Couple
                                                           Thousand Data Points Make
 Amazon –...
Ongoing Measurement Approaches

 Passive technology “watches” network traffic
  Benefits:
    Can “see” all users (huge...
Ongoing Measurement Approaches

 Tagging technology uses JS to instrument areas on
  the page with timers
  Benefits:
  ...
Ongoing Measurement Approaches

 Active technology uses synthetic transactions to
  “simulate” users on the site
  Benef...
Inside or Outside?
 Where does the online application live?
  No longer completely in the data center in most cases
  H...
Multiple Locations or Not?

                         25.0


                         20.0


                         15.0
...
Download Time




                                           0
                                               1
          ...
Browser or Not?

                      4.0

                      3.5

                      3.0

                      2....
Browser or Not?
Browser Or Not?
 The browser is “the” application engine
  JS execution
  Client side processing
  Dynamic content

 ...
Multiple Connection Speeds or Not?

 Broadband
  3.0Mbps  Above
  DSL/Cable Home
  Business
 Midband
  Below 1.5Mbp...
How Wide and How Deep?
 On any site there are an extremely large number of
  pages that can be measured
  Can’t measure ...
Metrics that Matter
Context Is Everything
 Imagine if we all made up our own “goals” for
  cholesterol
 I consistently find performance peop...
Search – Rental Cars

   Expedia                                                                              10.14
     D...
Total Transaction Availability –
                             Rental Cars

     Dollar                                    ...
Averages Are the Muddy Middle
Variability Is Very Important


                                                  Render Time Statistical Summary

       ...
Client Side Processing
 Client side processing is virtually unexamined in most
  performance management programs
  Not t...
Core PM Metrics
 To impact and improve user centric performance,
  focus on 9 core metrics:
  Availability
  Outages
 ...
Core PM Metrics
 Availability – 99.5% for multi-step transaction
 Outages – 1 hour per month
 Average Download Time - 1...
Health Scorecard Example
Health Scorecard Example
Health Scorecard Example
Asynchronous/Blocking
Page Usability Metric – Pre Render
   Delay Versus Post Render
Diagnostic Process
Diagnostic Process
 Being with standards and good “change based” alerting
  Are the metrics out of threshold (based on c...
Diagnostic Process
 Diagnose performance over time
  Yesterday
  Last week
  Last month and Month-To-Date
Diagnostic Process
 Do you see consistent performance problems over
  time?
  If so, the page needs to be profiled to de...
Where Performance Problems Lie
Diagnostic Process
 Errors
  Categorize by type
     Network
     Server
     Application

  Tool should have actual...
Keys to Improving
Performance
Overuse of Modular JS/CSS
 Silo versus user “flow” based approach
  JS and CSS have no strategy for minimizing separate ...
JS Placement




      Javascript files load one file at
      a time




        None of these images were downloaded to ...
Roundtrips
 Myth in front end design that page size/asset site is still
  significant
  Reducing cookie “overhead”
  GZ...
Third Party Tag Placement, Overuse
             and Quality
0
                                                                                               2000
                    ...
Client Side Processing




Key: Identify and Reduce Client Side Processing
Cache Management

          3.00

          2.50

          2.00
Seconds




          1.50
                       400K
  ...
Slow and Variable Application Calls
Slow and Variable Application
           Calls




  Key: Profile application call variability
Other!!!!
 Capacity issues
 Persistent connections
 Incorrectly sized content
 Network retrans
 Errors of every type
Implementing Your
Total Site Quality
Framework
Implementing Total Site Quality
                        Framework
 Begin with the user centric approach
 Apply competiti...
www.fastwebrace.com
Submit your fast or slow site by July 15th
How to Reach Me



ben.rushlo@keynote.com




                         (623) 547-7068




           http://www.linkedin.c...
Upcoming SlideShare
Loading in …5
×

Metrics that Matter-Approaches To Managing High Performing Websites

1,437 views

Published on

Managing the technical quality of your site has become more complex and the number of metrics you collect has skyrocketed. Faced with hundreds of candidate metrics, how do you select those that are most meaningful? In this session you will learn which KPIs are key for successfully testing and managing your site. You will walk away with a holistic framework for managing site quality.

Published in: Technology, Design
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,437
On SlideShare
0
From Embeds
0
Number of Embeds
26
Actions
Shares
0
Downloads
71
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Metrics that Matter-Approaches To Managing High Performing Websites

  1. 1. Metrics that Matter – Approaches to Managing High Performing Web Sites Ben Rushlo Director Keynote Professional Services June, 22nd 2009
  2. 2. Agenda  User Centric System Approach  Performance Management Begins with Metrics  Metrics That Matter  Diagnostic Process  Keys to Improving Performance  Implementing A Total Site Quality Framework
  3. 3. Personal Background  9 years at Keynote – Keynote Consulting Practice  5 years at Keynote – Director of Keynote Consulting Practice  Focus on mid/large enterprise sites  Wal-Mart  eBay  Honda  Ford  Schwab  Background in capacity planning  CIS/MIS degree
  4. 4. User Centric System Approach
  5. 5. Change Has Come  Single data center  Cloud hosting/services  HTML  ASP/JSP  JS  AJAX  Animated GIFs  Sites Completely in Flash  Content Driven  Transaction Driven  Experience Driven  US Market  Global Market  Single domains  20 domains per page  Legacy systems  Outsourced web services
  6. 6. Change Has Come  Your user has changed  Decreased tolerance, increased expectations  Utility/Always on  Integrated completely into our lives  When Larry King is using Twitter….  When outages are front page news…
  7. 7. System Approach “A system is a dynamic and complex whole, interacting as a structured functional unit”
  8. 8. Online Applications Are Complex Systems Application Code Content Delivery Network Front End Design Third Party Web Services Online Application Network/Servers/Infrastruc Tracking/Ad Tags ture User Experience ISPs Cloud Services Creative/Visual Content
  9. 9. Online Applications Are Complex Systems JSP ASP Application Code DB Query Java Environment CSS Code Java Script Code Front End Design Browser AJAX/XML Threading
  10. 10. Online Applications Are Complex Systems
  11. 11. Online Applications Are Complex Systems  While we have undergone rapid change in the area of web site design/technology/architecture has performance management changed with it?  Or are we still living in a client server focused paradigm?  Are we viewing the discrete and disconnected elements of the system and not the system?  CPU/Memory/IO etc.  Garbage collection rate/threads etc.  Locks/query time etc.
  12. 12. Top Order Metrics  In any complex system, there is an overwhelming number of metrics (things to measure that describe elements of the system)  However, within any system there are key indicators of system health  Think of air speed, altitude  Think of GDP or consumer confidence  Think of blood pressure and weight
  13. 13. Top Order Metrics  Top order metrics require a top down approach  It is virtually impossible to combine low level metrics upwards to understand system health  Except for extreme cases (100% CPU, server down etc.)  Most performance management issues are not so simple  Low level metrics are very useful once you have identified areas of focus/problem areas
  14. 14. Top Order Metrics  Performance management must begin and end at the end users perspective  The end user provides  A unifying approach to a very complex system  Key barometer of site/application success  A direct tie to business owner/goals and work of performance management team
  15. 15. Performance Management Beings with Metrics
  16. 16. Data Collection  Beginning with the users perspective (unifying approach) how do we collect data?  Point in time?  Ongoing collection?  Data center or Internet?  Browser based?  Geographically distributed?  Connection speed?  How wide and how deep?
  17. 17. Point In Time Tools  Point in Time Tools  User Feedback  Yslow  Google Page Speed  Firebug  HTTP Analyzer  HTTP Watch  KITE  Good for rules based/best practice analysis and point in time data collection  Free or almost free!
  18. 18. What a Difference a Couple Thousand Data Points Make  Amazon Home Page  HTTP Analyzer Trace  81 requests/responses
  19. 19. What a Difference a Couple Thousand Data Points Make  Amazon – Profile  15 slowest requests (Average and variability)  2,000 data points in sample http://w w w .amazon.com/ http://w w w .amazon.com/gp/advertising/iframeproxy?dclick=amzn.us.gw .atf;sz%3D300x250;bn%3D 507846; http://z-ecx.images-amazon.com/images/G/01/w ma/clog/core2._V241266071_.js http://g-ecx.images- amazon.com/images/G/01/gno/images/orangeBlue/navPackedSprites_v8._V245110247_.png http://d3dtik4dz1nejo.cloudfront.net/70.html http://z-ecx.images-amazon.com/images/G/01/nav2/gamma/amazonJQ/amazonJQ-combined-core- 20620._V223529337_. http://g-ecx.images-amazon.com/images/G/01/x-locale/common/transparent-pixel._V42752373_.gif http://g-ecx.images- amazon.com/images/G/01/marketing/visa/321/CS2274_Amazon_Card_Images_79x80_Blue_r01._V http://m1.2mdn.net/view ad/1511700/new _dslr_300_022709.jpg http://g-ecx.images-amazon.com/images/G/01/gourmet/110/CC50_B0002R38XC._V235261631_.jpg http://g-ecx.images-amazon.com/images/G/01/ui/loadIndicators/loadIndicator-large._V248199609_.gif http://z-ecx.images-amazon.com/images/G/01/nav2/gamma/amazonJQ/amazonJQ-combined- coreCSS-59291._V24547874 http://g-ecx.images-amazon.com/images/G/01/gift-cards/topnav/giftcard-envelope- gno._V250128993_.gif http://z-ecx.images-amazon.com/images/G/01/nav2/gamma/amazonShoveler/amazonShoveler- amazonShovelerCss-128 http://g-ecx.images- amazon.com/images/G/01/img09/sports/50/summer_toppicks_50._V225610403_.gif 0 200 400 600 800 1000 1200 1400 1600 1800 2000 MS Average 85th 95th
  20. 20. Ongoing Measurement Approaches  Passive technology “watches” network traffic  Benefits:  Can “see” all users (huge sample, actual visitors)  Allows for “measurements” of pages that are difficult to measure in any other way (like a purchase confirmation)  Challenges:  Security issues  Hybrid hosted sites and third party content (can’t see what is happening with browser and external sources)  Not good for availability (a key PM activity)  Highly variable sample
  21. 21. Ongoing Measurement Approaches  Tagging technology uses JS to instrument areas on the page with timers  Benefits:  Real user data.  Large sample  End user perspective (can include client time)  Challenges:  Requires code changes (on each page)  Lacking in granularity  Management ongoing can be cumbersome and difficult
  22. 22. Ongoing Measurement Approaches  Active technology uses synthetic transactions to “simulate” users on the site  Benefits:  Controlled and consistent environment (only variables originate from the site)  Repeatable  Large sample  Challenges:  Not every path can be scripted  Not every user configuration can be modeled  Choosing the “right” path can be difficult
  23. 23. Inside or Outside?  Where does the online application live?  No longer completely in the data center in most cases  Hybrid hosting, CDN, web services, third party content, third party tags etc.  Very incomplete view of performance/quality  Where does the user live?  No users access the site from the data center  Performance management cannot be done effectively within a LAN environment  Impact of external latency cannot be calculated
  24. 24. Multiple Locations or Not? 25.0 20.0 15.0 Seconds 10.0 5.0 0.0 QuickTax Home Page Online Get Started Validation Online Edition Vancouver Telus 2.00 1.16 1.23 7.31 0.56 Calgary Telus 2.41 1.50 1.42 8.80 0.56 Toronto Bell 3.48 2.84 2.38 18.98 1.09 Montreal Verizon 3.99 3.08 2.57 20.91 1.15 Vancouver Telus Calgary Telus Toronto Bell Montreal Verizon
  25. 25. Download Time 0 1 2 3 4 5 6 7 8 9 UPS Live Travelocity Wikipedia Sprint HotJobs Career Builder Disney Fidelity Yellow Pages Google AT&T Orbitz Merrill Lynch MSN eBay Ask CNN Expedia Time On Netw ork AOL Bank Of America Symantic Facebook Ticketmaster NY Times Apple Hewlett-Packard Client Side Processing Amazon CBS Sportsline Verizon Yahoo USA Today Browser or Not? Dell Walmart Priceline.com MSNBC Weather.com Charles Schwab FedEx Monster
  26. 26. Browser or Not? 4.0 3.5 3.0 2.5 Seconds 2.0 1.5 1.0 0.5 0.0 Photo - Dealer Home Page TL Home Video Features Specs Results Gallery Time In Brow ser 1.36 1.54 1.70 1.56 1.89 1.89 Dow nload Time 1.41 0.99 0.46 0.83 0.37 1.67 Dow nload Time Time In Brow ser
  27. 27. Browser or Not?
  28. 28. Browser Or Not?  The browser is “the” application engine  JS execution  Client side processing  Dynamic content  It is almost impossible to emulate complexity of browser  Threading model  Blocking/Asynchronous characteristics  Dynamic JS and CSS engine  Flash/Silverlight/Flex load/dynamic paths and execution  Render related issues
  29. 29. Multiple Connection Speeds or Not?  Broadband  3.0Mbps  Above  DSL/Cable Home  Business  Midband  Below 1.5Mbps  Entry level DSL  Narrowband  56Kbps  Dial-up  Consumer Satellite
  30. 30. How Wide and How Deep?  On any site there are an extremely large number of pages that can be measured  Can’t measure everything  How do we choose?  User centric/business centric model  What are the most common and most critical paths that the user takes throughout the site?  What pages share similar architecture/design/dependencies?  What pages/functions will wake up the CEO if they fail?  Even very large and complex sites can be measured in two to five key business paths typically
  31. 31. Metrics that Matter
  32. 32. Context Is Everything  Imagine if we all made up our own “goals” for cholesterol  I consistently find performance people (CIO’s  performance analysts) who just make up what they think are appropriate goals/targets for key metrics  99.999%?  97%?  A key component of any successful PM program is context, using appropriate goals/targets  Competitive data sets are a great way to get that context  Great point of connection with business owners/objectives
  33. 33. Search – Rental Cars Expedia 10.14 Dollar 7.49 Travelocity 6.48 Alamo 6.15 Orbitz 5.77 Median 4.58 Thrifty 3.39 Budget 2.83 Hertz 2.32 Avis 2.29 Enterprise 2.21 0 2 4 6 8 10 12 Seconds Source: Keynote Competitive Research – Rental Cars 2009
  34. 34. Total Transaction Availability – Rental Cars Dollar 99.98 Enterprise 99.93 Alamo 99.76 Orbitz 99.33 Hertz 98.69 Median 98.36 Travelocity 98.03 Avis 97.98 Expedia 97.24 Thrifty 96.53 Budget 94.38 90 92 94 96 98 100 Percentage Source: Keynote Competitive Research – Rental Cars 2009
  35. 35. Averages Are the Muddy Middle
  36. 36. Variability Is Very Important Render Time Statistical Summary 12.0 10.0 8.0 Seconds 6.0 4.0 2.0 0.0 Interval Login Click Exchange Search - Orlando Submit Search - Cancun International|Resort, Arithmetic Mean Geometric Mean Median 85th Percentile 95th Percentile
  37. 37. Client Side Processing  Client side processing is virtually unexamined in most performance management programs  Not tracked by most tools  Only beginning to be discussed as part of performance management  Yet for many sites this is the key contributor to poor performance
  38. 38. Core PM Metrics  To impact and improve user centric performance, focus on 9 core metrics:  Availability  Outages  Average Download Time - Geo Mean  Time in Client Versus Time In Generation/Backend  Variability - 85th and 95th percentiles  Geographic Variability  Hourly Variability (Load Handling)  Third Party Quality  Size/Element Count/Domains
  39. 39. Core PM Metrics  Availability – 99.5% for multi-step transaction  Outages – 1 hour per month  Average Download Time - 1.5 -2.5s (broadband)  Time in Client Versus Time In Generation/Backend – Less than 30% of page load  Variability - 85th and 95th percentiles – No more than 1.5X the median  Geographic Variability – No more than 2X (fastest versus slowest)  Hourly Variability (Load Handling) – Less than 20% peak versus off peak  Third Party Quality – Tags under 50MS each (limited variability, good availability)  Size/Element Count/Domains – Depends! 
  40. 40. Health Scorecard Example
  41. 41. Health Scorecard Example
  42. 42. Health Scorecard Example
  43. 43. Asynchronous/Blocking
  44. 44. Page Usability Metric – Pre Render Delay Versus Post Render
  45. 45. Diagnostic Process
  46. 46. Diagnostic Process  Being with standards and good “change based” alerting  Are the metrics out of threshold (based on context)?  Or have they changed from where they have been?
  47. 47. Diagnostic Process  Diagnose performance over time  Yesterday  Last week  Last month and Month-To-Date
  48. 48. Diagnostic Process  Do you see consistent performance problems over time?  If so, the page needs to be profiled to determine  Content (CDN or web server quality)  Application  Front-end design (e.g. Third party calls)  If so, has something changed?  New content? New requests?  Is there a time of day/hour/location pattern?  Capacity  Edge cache  ISP issue
  49. 49. Where Performance Problems Lie
  50. 50. Diagnostic Process  Errors  Categorize by type  Network  Server  Application  Tool should have actual (not simulated) screen capture  Tool should use a browser  Many errors (most) are custom application or malformed pages  Browser is much better at catching errors that “HTTP Request/Response Tool” because it is more sensitive to dynamic ,real world issues
  51. 51. Keys to Improving Performance
  52. 52. Overuse of Modular JS/CSS  Silo versus user “flow” based approach  JS and CSS have no strategy for minimizing separate and isolated files  Need to take into account “flow” of user throughout site  Combination of JS and CSS is key  Reduces roundtrips  Lessen impact of single threading on JS  Combination (or packing) of files more critical than minification Key: Combine JS/CSS. Think Paths not Pages.
  53. 53. JS Placement Javascript files load one file at a time None of these images were downloaded to the browser until 2.4 seconds into a 2.8 second page load Key: Combine, Move Down External JS
  54. 54. Roundtrips  Myth in front end design that page size/asset site is still significant  Reducing cookie “overhead”  GZip  Minification  Image optimization  Etc  These are best practices but they cannot compare to the criticality of round trips  Network speed much more critical than bandwidth (above 3.0Mbps) Key: Reduce roundtrips. CSS Sprite for static content
  55. 55. Third Party Tag Placement, Overuse and Quality
  56. 56. 0 2000 4000 6000 8000 10000 12000 10-Jun-09 10-Jun-09 10-Jun-09 10-Jun-09 10-Jun-09 10-Jun-09 11-Jun-09 11-Jun-09 11-Jun-09 11-Jun-09 11-Jun-09 12-Jun-09 12-Jun-09 12-Jun-09 12-Jun-09 12-Jun-09 13-Jun-09 13-Jun-09 13-Jun-09 13-Jun-09 13-Jun-09 14-Jun-09 Third Party Tag 14-Jun-09 14-Jun-09 14-Jun-09 14-Jun-09 15-Jun-09 Quality 15-Jun-09 15-Jun-09 15-Jun-09 15-Jun-09 15-Jun-09 Key: Place Third Party Content in Footer and Track Quality 16-Jun-09 16-Jun-09 Site Launched 16-Jun-09 16-Jun-09 16-Jun-09 17-Jun-09 Third Party Tag Placement and 17-Jun-09 17-Jun-09
  57. 57. Client Side Processing Key: Identify and Reduce Client Side Processing
  58. 58. Cache Management 3.00 2.50 2.00 Seconds 1.50 400K 40K 1.00 0.50 0.00 Home Page - New Visitor Home Page - Return Visitor Geometric Mean Key: Configure cache settings – Far Future Etc.
  59. 59. Slow and Variable Application Calls
  60. 60. Slow and Variable Application Calls Key: Profile application call variability
  61. 61. Other!!!!  Capacity issues  Persistent connections  Incorrectly sized content  Network retrans  Errors of every type
  62. 62. Implementing Your Total Site Quality Framework
  63. 63. Implementing Total Site Quality Framework  Begin with the user centric approach  Apply competitive context and business goals to create appropriate targets  Collect 9 core PM metrics  Use an ongoing, external, geographically distributed, browser based solution to collect data  Path based, key pages/function approach  Apply collected data against targets  Flag change/target exceeded  Perform diagnostic process
  64. 64. www.fastwebrace.com Submit your fast or slow site by July 15th
  65. 65. How to Reach Me ben.rushlo@keynote.com (623) 547-7068 http://www.linkedin.com/in/benrushlo

×