Metrics 101
What to watch
What we’ll cover

 Why collect metrics
 Understanding web latency
 How to target your findings
 Concrete steps to get start...
Part one
Why collect metrics?
http://www.flickr.com/photos/chidorian/12411641/
Downtime costs
Downtime costs


 eBay offline ($90K/h)
                        22h outage at eBay cost $2M ($90,909/h) (Internetnews, 1999)
Downtime costs


 eBay offline ($90K/h)
                           22h outage at eBay cost $2M ($90,909/h) (Internetnews, 1...
Downtime costs
                                          Amazon offline ($1M/h)
                                      Amazo...
Downtime costs
                                           Amazon offline ($1M/h)
                                       Ama...
Downtime costs
                                           Amazon offline ($1M/h)
                                       Ama...
Availability Downtime/year Loss @$50K/h
      90% %    36.5 days   Can$43,800,000
     95%     18.25 days    Can$21,900,00...
Availability Downtime/year Loss @$50K/h
      90% %    36.5 days   Can$43,800,000
     95%     18.25 days    Can$21,900,00...
Availability Downtime/year Loss @$50K/h
      90% %    36.5 days   Can$43,800,000
     95%     18.25 days    Can$21,900,00...
Harris poll conducted by Tealeaf in 2008
You really don’t want web
        users to call you.
         $15


         $12


           $9


           $6


       ...
You really don’t want web
        users to call you.
         $15


         $12


           $9


           $6


       ...
You really don’t want web
        users to call you.
         $15


         $12


           $9


           $6


       ...
You really don’t want web
        users to call you.
         $15


         $12


           $9


           $6

        ...
You really don’t want web
        users to call you.
         $15


         $12


           $9


           $6          ...
http://www.flickr.com/photos/pagedooley/2811157950/
If you don’t know the past
                                                                       you can’t know the futur...
“A plan so crazy, it just might work.”
http://www.flickr.com/photos/genewolf/147722350
http://www.flickr.com/photos/billselak/366692332/
Everything starts with a baseline.
Everything starts with a baseline.



 Know what’s
   worst.
Everything starts with a baseline.



 Know what’s          Prove you
   worst.            made it better.
The cycle of optimization
         Metrics & strategy
The cycle of optimization
         Metrics & strategy


                              Collection
The cycle of optimization
         Metrics & strategy


                              Collection




                     ...
The cycle of optimization
         Metrics & strategy


                               Collection




                    ...
The cycle of optimization
               Metrics & strategy


                                     Collection



Link to K...
The cycle of optimization
               Metrics & strategy

Optimization
                                     Collection
...
The cycle of optimization
               Metrics & strategy

Optimization
                                     Collection
...
http://www.flickr.com/photos/elsie/8229790/
Understanding your goals.




              http://www.flickr.com/photos/itsgreg/446061432/
Organic                                  Ad
                       Campaigns
     search                                 n...
Bad
                                                                                   $
                                 ...
Enterprise subscriber $

                                         1

                              End user (employee) $
 ...
$



                                     Media site
     Enrolment                         Targeted
                     ...
Why measure
                          Tactical, to find and fix
                          Strategic, to plan/trend




Part ...
Slow sites suck
Slow sites suck

Lower conversion rates
Slow sites suck

Lower conversion rates
Less likely to attract a loyal following
Slow sites suck

Lower conversion rates
Less likely to attract a loyal following
Liable for damages
Slow sites suck

Lower conversion rates
Less likely to attract a loyal following
Liable for damages
Liable for refunds or ...
Slow sites suck

Lower conversion rates
Less likely to attract a loyal following
Liable for damages
Liable for refunds or ...
Why the web is slow
A crash course in performance & availability.
Load      Web      App
         Internet   balancer   server   server   DB
Client
                                  www.ex...
Your website
                      Load      Web      App
         Internet   balancer   server   server   DB
Client
     ...
DNS




                          Load      Web      App
            Internet    balancer   server   server   DB
Client

D...
DNS     DNS
                lookup



                           Load      Web      App
            Internet     balancer ...
DNS     DNS
                lookup



                           Load      Web      App
            Internet     balancer ...
IP                   IP

                      Load      Web      App
         Internet   balancer   server   server   DB
...
IP                    IP

                       Load      Web      App
         Internet    balancer   server   server   ...
IP             R             IP
         R
                              Load      Web      App
             Internet   R ...
IP             R             IP
         R
                              Load      Web      App
             Internet   R ...
IP             R             IP
         R
                              Load      Web      App
             Internet   R ...
Letter writing
                 Postal service
You        Them
(sender)   (receiver)
This
     is
      a
    sentence




  You            Them
(sender)       (receiver)
This
     is
      a
    sentence




  You            Them
(sender)       (receiver)
You        Them
(sender)   (receiver)
You        Them
(sender)   (receiver)
sentence
           This       is
                     a




  You                 Them
(sender)            (receiver)
You        Them
(sender)   (receiver)
This
     is
      a
    sentence




  You            Them
(sender)       (receiver)
This
                is
               a
           sentence


                3
                2
                1
     ...
You        Them
(sender)   (receiver)
This
     is
      a
    sentence




  You            Them
(sender)       (receiver)
This
                 is


    sentence


                 2
                 1
  You            Them
(sender)       (rece...
This
                is


           sentence


                2
                1
                4
  You           Them...
This
           WTF?        is


                  sentence


                       2
                       1
          ...
sentence   a       This


     4         3      1
sentence   a                         This


     4         3                         1
                   “Can you send #2...
sentence      a                         This


     4            3                           1
                      “Can ...
How computers “connect”
IP                   IP

                      Load      Web      App
         Internet   balancer   server   server   DB
...
The HTTP “stack”



  IP                     IP

                        Load      Web      App
           Internet   bala...
The HTTP “stack”


 TCP                    TCP

  IP                     IP

                        Load      Web      Ap...
The HTTP “stack”

 SSL                    SSL

 TCP                    TCP

  IP                     IP

                 ...
The HTTP “stack”
 HTTP                  HTTP

 SSL                    SSL

 TCP                    TCP

  IP              ...
Getting a page by hand
Getting a page by hand
Trying 67.205.65.12...
Connected to bitcurrent.com.
Escape character is '^]'.
Getting a page by hand
Trying 67.205.65.12...
Connected to bitcurrent.com.
Escape character is '^]'.
GET /
<!DOCTYPE html ...
Static
                                     content

 HTTP                      HTTP       HTTP

 SSL                     ...
Static
                                     content

 HTTP                      HTTP       HTTP

 SSL                     ...
Static
                                     content   Dynamic
                                               content
 HTTP...
Static
                                     content   Dynamic
                                               content
 HTTP...
Static
                                     content   Dynamic   Stored
                                               cont...
Static
                                     content   Dynamic   Stored
                                               cont...
Browser   Data center

            Server
Browser   Data center

            Server
Browser                                               Data center

                                                       ...
Browser                                                      Data center

                                                ...
Browser                                                                 Data center

                                     ...
Browser                                                                     Data center

                                 ...
“Page load time” isn’t simple
 Documents versus event models
 AJAX
 Mobility
 CDNs
 Third-party content
 Embedded objects ...
Connections to load
Connection   0 - www.bitcurrent.com (67.205.65.12)
Connection   1 - www.bitcurrent.com (67.205.65.12)
...
Analytics site

                        Server


                          Data center
Browser
                           ...
Analytics site

                        Server


                          Data center
Browser
                           ...
What ultimately matters:
When can the user start using the application as
its designers intended?
Part of the problem
 You control           You’re blamed for
 Server latency        Page rendering
 Network latency for   ...
Part of the problem
 You control           You’re blamed for
 Server latency        Page rendering
 Network latency for   ...
Part of the problem
 You control           You’re blamed for
 Server latency        Page rendering
 Network latency for   ...
Why measure
                     Tactical, to find and fix
                     Strategic, to plan/trend

                  ...
Three tiers of data
 WAN accessibility: One test from many locations
   Can everybody get here?
 App functionality: Severa...
WAN accessibility
                               Place A


                               Task B
Client


                ...
Analytics can tell you a lot.
App functionality
                               Page A


                               Page B
Client


                 ...
http://www.flickr.com/photos/tinfoilraccoon/197640807/
Places and Tasks.
Landing page:
View one story
Landing page:
View one story
                 Task: Log in
                  Enter credentials

                       Ver...
Landing page:
View one story
                  Task: Log in
                   Enter credentials

                        ...
Landing page:
    Task:          View one story
Create account
                                     Task: Log in
    Pick ...
Landing page:
    Task:          View one story
Create account
                                     Task: Log in
    Pick ...
Landing page:
    Task:            View one story
Create account
                                           Task: Log in
 ...
Landing page:
    Task:            View one story
Create account
                                           Task: Log in
 ...
Landing page:
    Task:            View one story
Create account
                                             Task: Log in...
Landing page:
Create acct.     View one story
                                      Task: Log in
               Place: Vie...
Landing page:
  Create acct.
Create acct.        View one story
   Form uptime    Place: View stories
                    ...
Landing page:
Create acct.          View one story
                                              Task: Log in
            ...
Landing page:
Create acct.     View one story
                                      Task: Log in
               Place: Vie...
Places
Efficiency matters
  How quickly, how many,
  productivity
  Learning curve OK
Leave when they’re bored
Collect “aha...
Tasks
Effectiveness matters
  Completion, abandonment
  Intuitiveness rules
Leave when they change their
mind or it breaks...
2 sides of the same coin

                  End user
 Web analytics
                  monitoring

  What did       Could t...
For e-commerce sites




Can people buy things?
For media sites




 Are ads loading quickly and successfully clicked
 through?
 Is content loading fast enough for visito...
For collaboration sites




 Can visitors contribute (posting content, voting?)
 Is bad content being mitigated (trolling,...
For SaaS sites




Are your end users productive?
Are they making fewer mistakes?
Is the site working during customers’ bu...
Tiered tests
                               Place A


                               Task B
Client


                     ...
Testing the tiers

                           Load      Web      App
         Internet        balancer   server   server  ...
,)-$(&./01+2(3/04(#$+#+(




                 &)$
                 %,$
                 %+$
                 %*$
!"#$%&'()...
Why measure
                     Tactical, to find and fix
                     Strategic, to plan/trend

                  ...
Synthetic testing.
Load      Web      App
         Internet   balancer   server   server   DB
Client
Management tool




                      Load      Web        App
         Internet   balancer   server     server   DB
C...
Load      Web      App
         Internet   balancer   server   server   DB
Client
Load      Web      App
         Internet   balancer   server   server   DB
Client
Test   Testing
config    node

                  Data center
        Testing
         node      Website


        Testing
 ...
Test   Testing
config    node

                  Data center
        Testing
         node      Website


        Testing
 ...
Test   Testing
config    node

                  Data center
        Testing
         node      Website


        Testing
 ...
Test     Testing
  config      node

                      Data center
            Testing
             node      Website
R...
Three things to watch for


 Cached vs. uncached
 Scripts vs. puppetry
 Simultaneous vs. sequential
0
                                     10
               Load time (seconds)




Cached
Uncached
10
Load time (seconds)




                      3.157s



         0
                      Cached   Uncached
13.349s



10
Load time (seconds)




                      3.157s



         0
                      Cached   Uncached
Testing script



Script interpreter
Testing script
  Site: test.com
Page: index.html

             Script interpreter
Testing script
         Site: test.com
       Page: index.html

                    Script interpreter

            HTTP G...
Testing script
  Site: test.com
Page: index.html

             Script interpreter

                            200 OK
    ...
Testing script
  Site: test.com            Test complete
Page: index.html

             Script interpreter
Browser
  controller



Actual browser
Browser
                        controller
        DOM actions
(“click on button 4”)

                   Actual browser
Browser
                             controller
             DOM actions
     (“click on button 4”)

                     ...
Browser
                        controller
        DOM actions
(“click on button 4”)

                   Actual browser

 ...
Browser
                        controller
        DOM actions                  DOM contents
(“click on button 4”)        ...
Simultaneous

5 tests at 15:00
Simultaneous        Sequential
                    5 tests from
5 tests at 15:00
                   15:00 to 15:05
Synthetic pros & cons
Pros                          Cons
Easy to set up                Brittle
Only way to test without   ...
Ultimately,
Synthetic testing shows you if the site’s working.
Real User Monitoring.
Synthetic isn’t enough
Synthetic isn’t enough
Browser    Web
          server
Browser     Load      Web
          balancer   server
Browser               Load      Web
          Network   balancer   server
            tap
Browser               Load      Web
          Network   balancer   server
            tap
Browser               Load      Web
          Network   balancer   server
            tap
Browser               Load      Web
          Network   balancer   server
            tap
Browser               Load      Web
          Network   balancer   server
            tap
Browser               Load      Web
          Network   balancer   server
            tap
Browser               Load      Web
          Network   balancer   server
            tap

                      User A
Browser               Load      Web
          Network   balancer   server
            tap

                      User A


...
Browser                  Load      Web
          Network      balancer   server
            tap

                         ...
Browser                  Load      Web
          Network      balancer   server
            tap

                         ...
Browser                  Load      Web
          Network      balancer   server
            tap

                         ...
TopN, worstN

RUM tools are excellent for more qualitative data
  What’s most broken?
  What’s biggest?
  What’s slowest?
...
RUM pros & cons
Pros                           Cons
Directly correlated with       May require physical
clickstream, analy...
Ultimately
RUM shows you if the site’s working.
Why measure
                           Tactical, to find and fix
                           Strategic, to plan/trend

      ...
http://upload.wikimedia.org/wikipedia/commons/0/0e/Count-von-count.jpg
0   10   20   30   40 50   60   70   80   90
                    Age
Average age = 10




0   10                 20   30   40 50   60   70   80   90
                                  Age
20


            Average age = 10
Count




  0
        0   10                 20   30   40 50   60   70   80   90
       ...
20


            Average age = 10
Count




  0
        0   10                 20   30   40 50   60   70   80   90
       ...
20


            Average age = 10
Count




  0
        0   10                 20   30   40 50   60   70   80   90
       ...
Average varies wildly,
making it hard to
threshold properly or
see a real slow-down.
80th percentile
only spikes once
for a legitimate
slow-down (20%
of users affected)
Setting a useful
threshold on
percentiles gives
less false positives
and more real alerts
200
# of requests




      0
                0 2   4     6     8 10 12 14 16         18   20
                          Pa...
200




                          Average latency = 5s
# of requests




      0
                0 2   4               6  ...
0
                                        # of requests
                                                                  ...
KISS
“It can scarcely be
                            denied that the supreme goal of
                          all theory is to...
“As simple as possible,
   but no simpler.”




  (FYI, this is irony.)
http://www.flickr.com/photos/evilerin/3540381299/
http://www.flickr.com/photos/golf_pictures/2538894627/
Login

1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s




                                         Checkout

1s 2s 3s 4s 5s 6s 7s ...
Login      Average 4s


1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s




                                         Checkout   Ave...
Login      Average 4s
                                                       95% 8s
1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s...
Login      Average 4s
                                                       95% 8s
1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s...
Login        Average 4s
                                                         95% 8s
1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s...
740                     260
                                         Login:
                                         <=4s
...
740                     260
                                                    Total samples 1000
                       ...
740                     260
                                                       Total samples 1000
                    ...
740                     260
                                                       Total samples 1000
                    ...
740                     260
                                                       Total samples 1000
                    ...
740                     260
                                                       Total samples 1000
                    ...
Login:
                                           <=4s
  1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s


                        ...
Login:
                                           <=4s
  1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s


                        ...
Login:
                                           <=4s
  1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s


                        ...
(Snore?)
)"                                                                                                                        ...
)"                                                                                                                        ...
Baselines


Establish an agreed-upon set of metrics, and always
compare to these baselines.
What does “normal” look like?
...
Why measure
                      Tactical, to find and fix
                      Strategic, to plan/trend

                ...
Your goal is
to be clearly
understood.
How technical
                  are they?




Your goal is
to be clearly
understood.
How technical
                  are they?




Your goal is
                How will they
to be clearly     use it?

unders...
How technical
                  are they?


                                  To fix
Your goal is                    someth...
How technical
                  are they?


                                  To fix
Your goal is                    someth...
How technical
                  are they?


                                  To fix
Your goal is                    someth...
How technical
                  are they?


                                  To fix
Your goal is                    someth...
How technical
                  are they?


                                  To fix
Your goal is                    someth...
By timeframe
Type of
metric    Timeframe   Delivery   Detail
By timeframe
 Type of
 metric      Timeframe   Delivery   Detail

Break/fix
monitoring
By timeframe
 Type of
 metric      Timeframe   Delivery   Detail

Break/fix
monitoring

  Daily
 reports
By timeframe
 Type of
 metric      Timeframe   Delivery   Detail

Break/fix
monitoring

  Daily
 reports

Quarterly
planning
By timeframe
 Type of
 metric      Timeframe    Delivery      Detail

Break/fix                 Push alerts    Simple
     ...
By timeframe
 Type of
 metric      Timeframe    Delivery      Detail

Break/fix                 Push alerts    Simple
     ...
By timeframe
 Type of
 metric      Timeframe    Delivery       Detail

Break/fix                 Push alerts    Simple
    ...
By medium
Where will this wind up?
  Dashboard
  NOC screen
  Log file
  Someone’s
  spreadsheet
  Inbox


                ...
Why measure
                    Tactical, to find and fix
                    Strategic, to plan/trend

                  Wh...
So
what
should
you      Some homework.
do?
First

 Meet your analytics team
 Find out
   What are the key goals they’re monitoring?
   Where are visitors coming from...
Second


Pick the three processes, pages, or functions that
matter most to you
  Landing pages, or part of a conversion fu...
Third

 Set up monitoring of:
   Your site from many places (synthetic testing)
   Your top 3 core business processes (syn...
Fourth

Wait a week or two
  To establish a baseline
  To detect seasonal variance
  To show others and get buy-in
Fifth
 While you’re waiting, understand the elements latency
 and how they affect your performance
   DNS
   SSL
   Networ...
Set a target threshold


 Now that you have an idea of what “normal” is, set a
 threshold
   ... but not just any threshol...
The login page                     Function

will have a total latency           Metric

of under 4 seconds               ...
Apdex
score
      =   (          ) (
              Satisfied
              requests
                       +
              ...
How Apdex works
 Frustrated: over 8 seconds




  Tolerating: 2-8 seconds




   Satisfied: 0-2 seconds
How Apdex works
                        Frustrated: 5 hits
  Total requests: 100



                        Tolerating: 30...
Train your audience

 Visit key stakeholders and walk them through the
 report
 Get them used to the information
   In the...
Put monitoring into your
release cycle

 Talk to the development team
   Adding instrumentation
   Identifying new code fu...
Part eight
Some tools to check out
Paid
Synthetic
 Keynote Systems
 Gomez
 Webmetrics
 Alertsite
 Dotcom Monitor
 Pingdom
 ...and many others
RUM
Client-side
  AJAX (Gomez, Coradiant
  TrueSight Edge)                          Full
  Agent-based (Aternity)         ...
Analytics

 Omniture
 Webtrends
 Coremetrics
 Woopra
 etc. (lots of specialization)
Open Source
Firebug




getfirebug.com
Firebug




                Also: Webkit inspector,
getfirebug.com    Google Page Speed
Google Analytics




analytics.google.com
webpagetest.org
Monitor.us

   (Free ain’t
   pretty, and
   pretty ain’t
   free, but it
     works.)



mon.itor.us
AJAX measurement libraries
Collecting from visitors:
  Jiffy (http://code.google.com/p/jiffy-web/)
    AJAX client sends m...
YSlow




        http://justtalkaboutweb.com/wp-content/uploads/2008/06/yslow.gif
                    http://events.stanf...
Sites

Dashboard        Juice analytics’     Dashboard
   spy                blog          Insight’s gallery




  Simple ...
Part nine
Planning for the future
AJAX
AJAX




        As for your male and female
        slaves whom you may have:
       you may buy male and female
       s...
http://www.flickr.com/photos/farhannasir/4577508824/



                                      Mobility
http://www.flickr.com/photos/andrewparnell/2738598951/
GET index.html HTTP/1.1
Host: www.stockprice.com
Cookie: sessionID=KDF74INED6
Accept: */*

<!DOCTYPE html PUBLIC "-//W3C//...
GET index.html
Host: www.stockprice.com
Cookie: sessionID=KDF74INED6

AAPL:243.20
Web of                          Web of
documents                           events
(circa 1999)                    (circa 2...
Recap
What you need to go and do now.
Metrics must be

Relevant: related to a core business assumption
Actionable: the basis for a decision or improvement
Repro...
Visit your analytics team & read your business model
Pick three core business functions to watch
Start monitoring
  One pa...
Metric   Source   Target
Metric   Source   Target
Onload
 time
Metric    Source     Target
Onload   From many
 time      places
Metric    Source       Target
Onload   From many     To the top
 time      places    landing page
Metric    Source       Target
Onload   From many     To the top   Uncached
 time      places    landing page   Cached
Metric    Source       Target
Onload   From many     To the top   Uncached
 time      places    landing page   Cached

Ser...
Metric    Source          Target
Onload   From many        To the top   Uncached
 time      places       landing page   Ca...
Metric    Source            Target
Onload   From many          To the top     Uncached
 time      places         landing p...
Metric    Source            Target
Onload   From many          To the top     Uncached
 time      places         landing p...
Metric    Source            Target
Onload   From many          To the top     Uncached
 time      places         landing p...
Metric    Source            Target
Onload   From many          To the top     Uncached
 time      places         landing p...
Metric    Source            Target
Onload   From many          To the top     Uncached
 time      places         landing p...
Metric    Source            Target
Onload   From many          To the top     Uncached
 time      places         landing p...
Metric    Source            Target
Onload   From many          To the top     Uncached
 time      places         landing p...
Metric    Source             Target
Onload   From many           To the top     Uncached
 time      places          landin...
Metric    Source             Target
Onload   From many           To the top     Uncached
 time      places          landin...
Metric    Source             Target
Onload   From many           To the top       Uncached
 time      places          land...
Metric    Source             Target
Onload   From many           To the top       Uncached
 time      places          land...
Got one report?




1   2   3   4   5   6   7   8   9   10 11 12 13 14 15 16
Got one report?
5,000
  Unique page views




             0
                      1   2   3   4   5   6   7   8   9   10 ...
Got one report?
5,000
  Unique page views




                                                                            ...
Got one report?
5,000
  Unique page views




                                                                            ...
Got one report?
5,000
  Unique page views




                                                                            ...
Got one report?
5,000
  Unique page views




                                                                            ...
Got one report?
5,000
  Unique page views




                                                                            ...
Got one report?
5,000                                                                            $10,000




             ...
Got one report?
5,000                                                                            $10,000




             ...
Got one report?
5,000                                                                            $10,000




             ...
Thanks!
@seanpower
sean@httpd.org
@acroll
alistair@bitcurrent.com

www.watchingwebsites.com




                   (and go...
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Metrics 101
Upcoming SlideShare
Loading in...5
×

Metrics 101

25,358

Published on

Slides from the Velocity 2010 presentation, "Metrics 101" by Alistair Croll and Sean Power, authors of Complete Web Monitoring (O'Reilly, 2010)

Published in: Technology, Business
4 Comments
84 Likes
Statistics
Notes
No Downloads
Views
Total Views
25,358
On Slideshare
0
From Embeds
0
Number of Embeds
32
Actions
Shares
0
Downloads
1,377
Comments
4
Likes
84
Embeds 0
No embeds

No notes for slide

Transcript of "Metrics 101"

  1. 1. Metrics 101 What to watch
  2. 2. What we’ll cover Why collect metrics Understanding web latency How to target your findings Concrete steps to get started
  3. 3. Part one Why collect metrics?
  4. 4. http://www.flickr.com/photos/chidorian/12411641/
  5. 5. Downtime costs
  6. 6. Downtime costs eBay offline ($90K/h) 22h outage at eBay cost $2M ($90,909/h) (Internetnews, 1999)
  7. 7. Downtime costs eBay offline ($90K/h) 22h outage at eBay cost $2M ($90,909/h) (Internetnews, 1999) Financial company down ($100K/h) 53.2% of finance companies lose over $100,000/hour (nextslm.org)
  8. 8. Downtime costs Amazon offline ($1M/h) Amazon loses nearly $1M/hour if down (NYT, 2008) eBay offline ($90K/h) 22h outage at eBay cost $2M ($90,909/h) (Internetnews, 1999) Financial company down ($100K/h) 53.2% of finance companies lose over $100,000/hour (nextslm.org)
  9. 9. Downtime costs Amazon offline ($1M/h) Amazon loses nearly $1M/hour if down (NYT, 2008) Network downtime ($42K/h) 1 hour of network downtime costs $42,000 (Gartner, 2003) eBay offline ($90K/h) 22h outage at eBay cost $2M ($90,909/h) (Internetnews, 1999) Financial company down ($100K/h) 53.2% of finance companies lose over $100,000/hour (nextslm.org)
  10. 10. Downtime costs Amazon offline ($1M/h) Amazon loses nearly $1M/hour if down (NYT, 2008) Network downtime ($42K/h) 1 hour of network downtime costs $42,000 (Gartner, 2003) eBay offline ($90K/h) 22h outage at eBay cost $2M ($90,909/h) (Internetnews, 1999) Financial company down ($100K/h) 53.2% of finance companies lose over $100,000/hour (nextslm.org) Let’s say $50K/h if you’re serious.
  11. 11. Availability Downtime/year Loss @$50K/h 90% % 36.5 days Can$43,800,000 95% 18.25 days Can$21,900,000 98% 7.30 days Can$8,760,000 99% 3.65 days Can$4,380,000 99.5% 1.83 days Can$2,196,000 99.8% 17.52 hours Can$876,000 99.9% 8.76 hours Can$438,000 99.95% 4.38 hours Can$219,000 99.99% 52.6 minutes Can$43,833 99.999% 5.26 minutes Can$4,383 99.9999% 31.5 seconds Can$438
  12. 12. Availability Downtime/year Loss @$50K/h 90% % 36.5 days Can$43,800,000 95% 18.25 days Can$21,900,000 98% 7.30 days Can$8,760,000 99% 3.65 days Can$4,380,000 99.5% 1.83 days Can$2,196,000 99.8% 17.52 hours Can$876,000 99.9% 8.76 hours Can$438,000 Less than 99.95% 4.38 hours Can$219,000 an hour a 99.99% 52.6 minutes Can$43,833 year 99.999% 5.26 minutes Can$4,383 99.9999% 31.5 seconds Can$438
  13. 13. Availability Downtime/year Loss @$50K/h 90% % 36.5 days Can$43,800,000 95% 18.25 days Can$21,900,000 98% 7.30 days Can$8,760,000 99% 3.65 days Can$4,380,000 99.5% 1.83 days Can$2,196,000 99.8% 17.52 hours Can$876,000 99.9% 8.76 hours Can$438,000 Less than 99.95% 4.38 hours Can$219,000 an hour a 99.99% 52.6 minutes Can$43,833 year 99.999% 5.26 minutes Can$4,383 Less than 99.9999% 31.5 seconds Can$438 a minute a year
  14. 14. Harris poll conducted by Tealeaf in 2008
  15. 15. You really don’t want web users to call you. $15 $12 $9 $6 $3 $0 Web self-service IVR Email Live phone Cost estimates BiT Group White Paper: “Web Self-Service Lowers Call Center Costs and Improves Customer Service” Low Average High
  16. 16. You really don’t want web users to call you. $15 $12 $9 $6 $3 Can$0.24 $0 Web self-service IVR Email Live phone Cost estimates BiT Group White Paper: “Web Self-Service Lowers Call Center Costs and Improves Customer Service” Low Average High
  17. 17. You really don’t want web users to call you. $15 $12 $9 $6 $3 Can$0.24 Can$0.45 $0 Web self-service IVR Email Live phone Cost estimates BiT Group White Paper: “Web Self-Service Lowers Call Center Costs and Improves Customer Service” Low Average High
  18. 18. You really don’t want web users to call you. $15 $12 $9 $6 Can$3.00 $3 Can$0.24 Can$0.45 $0 Web self-service IVR Email Live phone Cost estimates BiT Group White Paper: “Web Self-Service Lowers Call Center Costs and Improves Customer Service” Low Average High
  19. 19. You really don’t want web users to call you. $15 $12 $9 $6 Can$5.50 Can$3.00 $3 Can$0.24 Can$0.45 $0 Web self-service IVR Email Live phone Cost estimates BiT Group White Paper: “Web Self-Service Lowers Call Center Costs and Improves Customer Service” Low Average High
  20. 20. http://www.flickr.com/photos/pagedooley/2811157950/
  21. 21. If you don’t know the past you can’t know the future. If you don’t know the future, you can’t budget for it. Photo by Alan Cleaver from his Flicker Freestock set. Thanks, Alan! http://www.flickr.com/photos/alancleaver/2638883650/
  22. 22. “A plan so crazy, it just might work.”
  23. 23. http://www.flickr.com/photos/genewolf/147722350
  24. 24. http://www.flickr.com/photos/billselak/366692332/
  25. 25. Everything starts with a baseline.
  26. 26. Everything starts with a baseline. Know what’s worst.
  27. 27. Everything starts with a baseline. Know what’s Prove you worst. made it better.
  28. 28. The cycle of optimization Metrics & strategy
  29. 29. The cycle of optimization Metrics & strategy Collection
  30. 30. The cycle of optimization Metrics & strategy Collection Reporting
  31. 31. The cycle of optimization Metrics & strategy Collection Reporting Institutionalizing the results
  32. 32. The cycle of optimization Metrics & strategy Collection Link to KPI/ Reporting ROI Institutionalizing the results
  33. 33. The cycle of optimization Metrics & strategy Optimization Collection & change Link to KPI/ Reporting ROI Institutionalizing the results
  34. 34. The cycle of optimization Metrics & strategy Optimization Collection & change Link to KPI/ Reporting ROI Institutionalizing the results
  35. 35. http://www.flickr.com/photos/elsie/8229790/
  36. 36. Understanding your goals. http://www.flickr.com/photos/itsgreg/446061432/
  37. 37. Organic Ad Campaigns search network $ 1 1 1 Advertiser site Visitor 2 O er 3 $ 8 Upselling 4 Abandonment Reach 5 Purchase step $ Mailing, alerts, Purchase step $ 9 promotions $ Conversion $ Disengagement 7 Enrolment 6 Impact on site $ Positive $ Negative
  38. 38. Bad $ 4 content Social Search Invitation network link results 4 Good content 1 $ 1 1 Collaboration site 2 Visitor Content creation Moderation $ 3 Spam & trolls $ Engagement 5 Viral 6 Social graph spread 7 Disengagement $ Impact on site $ Positive $ Negative
  39. 39. Enterprise subscriber $ 1 End user (employee) $ Refund $ 2 Renewal, upsell, SLA reference SaaS site violation Performance Good Bad 3 Helpdesk Support 5 $ Usability escalation costs 7 4 Good Bad Productivity Good Bad 6 Churn $ Impact on site $ Positive $ Negative
  40. 40. $ Media site Enrolment Targeted 2 embedded ad 5 $ 6 1 Ad Visitor network 4 3 5 Advertiser $ Departure $ site Impact on site $ Positive $ Negative
  41. 41. Why measure Tactical, to find and fix Strategic, to plan/trend Part two The elements of web latency
  42. 42. Slow sites suck
  43. 43. Slow sites suck Lower conversion rates
  44. 44. Slow sites suck Lower conversion rates Less likely to attract a loyal following
  45. 45. Slow sites suck Lower conversion rates Less likely to attract a loyal following Liable for damages
  46. 46. Slow sites suck Lower conversion rates Less likely to attract a loyal following Liable for damages Liable for refunds or service credits
  47. 47. Slow sites suck Lower conversion rates Less likely to attract a loyal following Liable for damages Liable for refunds or service credits Customers find other channels that cost more
  48. 48. Why the web is slow A crash course in performance & availability.
  49. 49. Load Web App Internet balancer server server DB Client www.example.com
  50. 50. Your website Load Web App Internet balancer server server DB Client www.example.com
  51. 51. DNS Load Web App Internet balancer server server DB Client DNS “www.example.com”
  52. 52. DNS DNS lookup Load Web App Internet balancer server server DB Client DNS “www.example.com”
  53. 53. DNS DNS lookup Load Web App Internet balancer server server DB Client DNS “www.example.com”
  54. 54. IP IP Load Web App Internet balancer server server DB Client
  55. 55. IP IP Load Web App Internet balancer server server DB Client Internet routing
  56. 56. IP R IP R Load Web App Internet R balancer server server DB Client R R Internet routing
  57. 57. IP R IP R Load Web App Internet R balancer server server DB Client R R TCP session
  58. 58. IP R IP R Load Web App Internet R balancer server server DB Client R R TCP session
  59. 59. Letter writing Postal service
  60. 60. You Them (sender) (receiver)
  61. 61. This is a sentence You Them (sender) (receiver)
  62. 62. This is a sentence You Them (sender) (receiver)
  63. 63. You Them (sender) (receiver)
  64. 64. You Them (sender) (receiver)
  65. 65. sentence This is a You Them (sender) (receiver)
  66. 66. You Them (sender) (receiver)
  67. 67. This is a sentence You Them (sender) (receiver)
  68. 68. This is a sentence 3 2 1 4 You Them (sender) (receiver)
  69. 69. You Them (sender) (receiver)
  70. 70. This is a sentence You Them (sender) (receiver)
  71. 71. This is sentence 2 1 You Them (sender) (receiver)
  72. 72. This is sentence 2 1 4 You Them (sender) (receiver)
  73. 73. This WTF? is sentence 2 1 4 You Them (sender) (receiver)
  74. 74. sentence a This 4 3 1
  75. 75. sentence a This 4 3 1 “Can you send #2 again?”
  76. 76. sentence a This 4 3 1 “Can you send #2 again?” is “Sure. Here you go.” 2
  77. 77. How computers “connect”
  78. 78. IP IP Load Web App Internet balancer server server DB Client
  79. 79. The HTTP “stack” IP IP Load Web App Internet balancer server server DB Client
  80. 80. The HTTP “stack” TCP TCP IP IP Load Web App Internet balancer server server DB Client
  81. 81. The HTTP “stack” SSL SSL TCP TCP IP IP Load Web App Internet balancer server server DB Client
  82. 82. The HTTP “stack” HTTP HTTP SSL SSL TCP TCP IP IP Load Web App Internet balancer server server DB Client
  83. 83. Getting a page by hand
  84. 84. Getting a page by hand Trying 67.205.65.12... Connected to bitcurrent.com. Escape character is '^]'.
  85. 85. Getting a page by hand Trying 67.205.65.12... Connected to bitcurrent.com. Escape character is '^]'. GET / <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/ xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head profile="http://gmpg.org/xfn/11"> <script type="text/javascript" src="http:// www.bitcurrent.com/wp-content/themes/ grid_focus_public/js/perftracker.js"></script> <script> </body> </html> Connection closed by foreign host.
  86. 86. Static content HTTP HTTP HTTP SSL SSL TCP TCP IP IP Load Web App Internet balancer server server DB Client image.gif GET www.example.com/image.gif
  87. 87. Static content HTTP HTTP HTTP SSL SSL TCP TCP IP IP Load Web App Internet balancer server server DB Client image.gif GET www.example.com/image.gif
  88. 88. Static content Dynamic content HTTP HTTP HTTP SSL SSL TCP TCP IP IP Load Web App Internet balancer server server DB Client dynamic.jsp GET www.example.com/dynamic.jsp
  89. 89. Static content Dynamic content HTTP HTTP HTTP SSL SSL TCP TCP IP IP Load Web App Internet balancer server server DB Client dynamic.jsp GET www.example.com/dynamic.jsp
  90. 90. Static content Dynamic Stored content data HTTP HTTP HTTP SSL SSL TCP TCP IP IP Load Web App Internet balancer server server DB Client (Database) POST www.example.com/data.cgi
  91. 91. Static content Dynamic Stored content data HTTP HTTP HTTP SSL SSL TCP TCP IP IP Load Web App Internet balancer server server DB Client (Database) POST www.example.com/data.cgi
  92. 92. Browser Data center Server
  93. 93. Browser Data center Server
  94. 94. Browser Data center Server TCP SYN (“let’s talk”) TCP SYN ACK (“Agreed: let’s talk”) TCP ACK (“OK, we’re talking)
  95. 95. Browser Data center Server TCP SYN (“let’s talk”) TCP SYN ACK (“Agreed: let’s talk”) TCP ACK (“OK, we’re talking) SSL (“Someone might be listening!”) SSL (“Here’s a decoder ring”)
  96. 96. Browser Data center Server TCP SYN (“let’s talk”) TCP SYN ACK (“Agreed: let’s talk”) TCP ACK (“OK, we’re talking) SSL (“Someone might be listening!”) SSL (“Here’s a decoder ring”) HTTP GET / (“Can I have your home page?”) HTTP 200 OK (“Sure!”) (thinks [index.html] (“Here it is!”) a bit) (Renders furiously) Bump, bump. [img js css] (“Have this too!”)
  97. 97. Browser Data center Server TCP SYN (“let’s talk”) TCP SYN ACK (“Agreed: let’s talk”) TCP ACK (“OK, we’re talking) SSL (“Someone might be listening!”) SSL (“Here’s a decoder ring”) HTTP GET / (“Can I have your home page?”) HTTP 200 OK (“Sure!”) (thinks [index.html] (“Here it is!”) a bit) (Renders furiously) Bump, bump. [img js css] (“Have this too!”) TCP FIN (“Thanks! I’m done now.”) TCP FIN ACK (“You’re welcome. Have a nice day.”)
  98. 98. “Page load time” isn’t simple Documents versus event models AJAX Mobility CDNs Third-party content Embedded objects and plug-ins
  99. 99. Connections to load Connection 0 - www.bitcurrent.com (67.205.65.12) Connection 1 - www.bitcurrent.com (67.205.65.12) Connection 2 - 4qinvite.4q.iperceptions.com (64.18.71.70) Connection 3 - static.slideshare.net (66.114.49.24) Connection 4 - static.slideshare.net (66.114.49.24) Connection 5 - www.feedburner.com (66.150.96.123) Connection 6 - static.getclicky.com (204.13.8.18) Connection 7 - cetrk.com (208.67.183.100) Connection 8 - in.getclicky.com (204.13.8.18) Connection 9 - crazyegg.com (208.67.180.236) Connection 10 - www.google-analytics.com (72.14.223.147) Connection 11 - www.apture.com (67.192.46.19) Connection 12 - static.apture.com (67.192.46.25) Connection 13 - s.clicktale.net (66.114.49.24) Connection 14 - www.clicktale.net (75.125.82.70)
  100. 100. Analytics site Server Data center Browser Server Mashup Server site
  101. 101. Analytics site Server Data center Browser Server Snore. Mashup Server site
  102. 102. What ultimately matters: When can the user start using the application as its designers intended?
  103. 103. Part of the problem You control You’re blamed for Server latency Page rendering Network latency for Total network latency known content and User environment network parameters
  104. 104. Part of the problem You control You’re blamed for Server latency Page rendering Network latency for Total network latency known content and User environment network parameters You need diagnostic metrics so you can fix it.
  105. 105. Part of the problem You control You’re blamed for Server latency Page rendering Network latency for Total network latency known content and User environment network parameters You need escalation You need metrics so you can prove diagnostic metrics it and make it someone so you can fix it. else’s problem.
  106. 106. Why measure Tactical, to find and fix Strategic, to plan/trend What to measure: How long until a user can use the app as you intended? Part three Where to measure
  107. 107. Three tiers of data WAN accessibility: One test from many locations Can everybody get here? App functionality: Several tests of key processes Is my business model working correctly? Tiered tests: Frequent metrics of each tier Is network, service, CPU, data I/O to blame?
  108. 108. WAN accessibility Place A Task B Client Goal C ... Load Web App Internet balancer server server DB Client
  109. 109. Analytics can tell you a lot.
  110. 110. App functionality Page A Page B Client Event C Load Web App Internet balancer server server DB Client
  111. 111. http://www.flickr.com/photos/tinfoilraccoon/197640807/
  112. 112. Places and Tasks.
  113. 113. Landing page: View one story
  114. 114. Landing page: View one story Task: Log in Enter credentials Verify Recovery
  115. 115. Landing page: View one story Task: Log in Enter credentials Verify Recovery Task: Forward a story Enter recipients Enter message Send
  116. 116. Landing page: Task: View one story Create account Task: Log in Pick name Check if free Enter credentials Set Password Verify CAPTCHA Recovery Send mail Get confirm Task: Forward a story Enter recipients Enter message Send
  117. 117. Landing page: Task: View one story Create account Task: Log in Pick name Check if free Enter credentials Set Password Verify CAPTCHA Recovery Send mail Get confirm Task: Forward a story Task: Submit Enter recipients a new story Enter message Send Enter URL Describe Deduplicate Post it
  118. 118. Landing page: Task: View one story Create account Task: Log in Pick name Place: View stories Check if free Enter credentials Vote up Next 25 Set Password Verify Vote down Last 25 CAPTCHA Recovery Send mail Get confirm Task: Forward a story Task: Submit Enter recipients a new story Enter message Send Enter URL Describe Deduplicate Post it
  119. 119. Landing page: Task: View one story Create account Task: Log in Pick name Place: View stories Check if free Enter credentials Vote up Next 25 Set Password Verify Vote down Last 25 CAPTCHA Recovery Send mail Place: Read Get confirm poster comments Vote up Next 25 Task: Vote down Last 25 Forward a story Task: Submit Enter recipients a new story Enter message Send Enter URL Describe Deduplicate Post it
  120. 120. Landing page: Task: View one story Create account Task: Log in Pick name Place: View stories Check if free Enter credentials Vote up Next 25 Set Password Verify Vote down Last 25 CAPTCHA Recovery Send mail Place: Read Get confirm poster comments Vote up Next 25 Task: Vote down Last 25 Forward a story Task: Submit Enter recipients a new story Place: My Enter message Enter URL account Send Describe Change My address comments Deduplicate Change PW See karma Post it
  121. 121. Landing page: Create acct. View one story Task: Log in Place: View stories Place: Read poster comments Task: Forward a story Task: Submit a new story Place: My account
  122. 122. Landing page: Create acct. Create acct. View one story Form uptime Place: View stories Task: Log in # started Bad form Place: Read # CAPTCHA poster comments Mail uptime Task: Forward a story Mail bounced Task: Submit a new story Place: My Confirm & return account Return 3x
  123. 123. Landing page: Create acct. View one story Task: Log in Place: View stories Place: View stories Stories/visit Place: Read # up/down poster comments Time/story Top stories Task: Forward a story Task: Submit Refresh time Views/page a new story Place: My account
  124. 124. Landing page: Create acct. View one story Task: Log in Place: View stories Place: Read poster comments Task: Forward a story Task: Submit a new story Place: My account
  125. 125. Places Efficiency matters How quickly, how many, productivity Learning curve OK Leave when they’re bored Collect “aha” feedback A/B test content for pages/session, exits
  126. 126. Tasks Effectiveness matters Completion, abandonment Intuitiveness rules Leave when they change their mind or it breaks Collect “motivation” feedback A/B test layouts for conversion
  127. 127. 2 sides of the same coin End user Web analytics monitoring What did Could they visitors do? do it?
  128. 128. For e-commerce sites Can people buy things?
  129. 129. For media sites Are ads loading quickly and successfully clicked through? Is content loading fast enough for visitors?
  130. 130. For collaboration sites Can visitors contribute (posting content, voting?) Is bad content being mitigated (trolling, spam)?
  131. 131. For SaaS sites Are your end users productive? Are they making fewer mistakes? Is the site working during customers’ business hours?
  132. 132. Tiered tests Place A Task B Client Goal C Load Web App Internet balancer server server DB Client
  133. 133. Testing the tiers Load Web App Internet balancer server server DB Client Request Do some Search a Request a uncached heavy dataset for big object object computing a string (Or watch (Or track CPU) query time)
  134. 134. ,)-$(&./01+2(3/04(#$+#+( &)$ %,$ %+$ %*$ !"#$%&'()%(*+( %&$ %)$ ,$ '""#$($ +$ '""#$&$ *$ '""#$%$ &$ )$ !""#$&$ -./01$2341$ !""#$%$ 53"67$2341$ 8!9$2341$ ':$2341$
  135. 135. Why measure Tactical, to find and fix Strategic, to plan/trend What to measure: How long until a user can use the app as you intended? Part four Where to measure: How to measure WAN, from everywhere Core app functionality performance data Tiers of components
  136. 136. Synthetic testing.
  137. 137. Load Web App Internet balancer server server DB Client
  138. 138. Management tool Load Web App Internet balancer server server DB Client
  139. 139. Load Web App Internet balancer server server DB Client
  140. 140. Load Web App Internet balancer server server DB Client
  141. 141. Test Testing config node Data center Testing node Website Testing node
  142. 142. Test Testing config node Data center Testing node Website Testing node
  143. 143. Test Testing config node Data center Testing node Website Testing node
  144. 144. Test Testing config node Data center Testing node Website Reporting service Testing node
  145. 145. Three things to watch for Cached vs. uncached Scripts vs. puppetry Simultaneous vs. sequential
  146. 146. 0 10 Load time (seconds) Cached Uncached
  147. 147. 10 Load time (seconds) 3.157s 0 Cached Uncached
  148. 148. 13.349s 10 Load time (seconds) 3.157s 0 Cached Uncached
  149. 149. Testing script Script interpreter
  150. 150. Testing script Site: test.com Page: index.html Script interpreter
  151. 151. Testing script Site: test.com Page: index.html Script interpreter HTTP GET www.test.com/index.html
  152. 152. Testing script Site: test.com Page: index.html Script interpreter 200 OK index.html image.gif stylesheet.css etc...
  153. 153. Testing script Site: test.com Test complete Page: index.html Script interpreter
  154. 154. Browser controller Actual browser
  155. 155. Browser controller DOM actions (“click on button 4”) Actual browser
  156. 156. Browser controller DOM actions (“click on button 4”) Actual browser HTTP GET www.test.com/index.html
  157. 157. Browser controller DOM actions (“click on button 4”) Actual browser 200 OK index.html image.gif stylesheet.css etc...
  158. 158. Browser controller DOM actions DOM contents (“click on button 4”) (“DIV contains ‘error’”) Actual browser
  159. 159. Simultaneous 5 tests at 15:00
  160. 160. Simultaneous Sequential 5 tests from 5 tests at 15:00 15:00 to 15:05
  161. 161. Synthetic pros & cons Pros Cons Easy to set up Brittle Only way to test without Detects macro outages, not actual visitor traffic user events Can compare to Good geographic & network competitors coverage costs money, Easy baseline establishment generates load Detects a problem before No measurement of traffic visitors sees it volume Consistent data over time Places load on the site under test
  162. 162. Ultimately, Synthetic testing shows you if the site’s working.
  163. 163. Real User Monitoring.
  164. 164. Synthetic isn’t enough
  165. 165. Synthetic isn’t enough
  166. 166. Browser Web server
  167. 167. Browser Load Web balancer server
  168. 168. Browser Load Web Network balancer server tap
  169. 169. Browser Load Web Network balancer server tap
  170. 170. Browser Load Web Network balancer server tap
  171. 171. Browser Load Web Network balancer server tap
  172. 172. Browser Load Web Network balancer server tap
  173. 173. Browser Load Web Network balancer server tap
  174. 174. Browser Load Web Network balancer server tap User A
  175. 175. Browser Load Web Network balancer server tap User A User B User C
  176. 176. Browser Load Web Network balancer server tap User A User B User C Visit history P1 P2 P3
  177. 177. Browser Load Web Network balancer server tap User A User B User C Visit Aggregate history reports P1 P2 P3
  178. 178. Browser Load Web Network balancer server tap User A User B User C Visit Aggregate Alerts history reports ! P1 P2 P3
  179. 179. TopN, worstN RUM tools are excellent for more qualitative data What’s most broken? What’s biggest? What’s slowest? What’s most inconsistent?
  180. 180. RUM pros & cons Pros Cons Directly correlated with May require physical clickstream, analytics installation Watches everything, not just Can be a privacy risk the things you know about Doesn’t work if there’s no Can be used to reproduce traffic problems Need to filter out your own Measures traffic as well as visits, crawlers, etc. performance
  181. 181. Ultimately RUM shows you if the site’s working.
  182. 182. Why measure Tactical, to find and fix Strategic, to plan/trend What to measure: How long until a user can use the app as you intended? Part five Where to measure: Getting the math right WAN, from everywhere Core app functionality Tiers of components How to measure it: Synth, to ensure it’s working RUM, to see where it’s broken
  183. 183. http://upload.wikimedia.org/wikipedia/commons/0/0e/Count-von-count.jpg
  184. 184. 0 10 20 30 40 50 60 70 80 90 Age
  185. 185. Average age = 10 0 10 20 30 40 50 60 70 80 90 Age
  186. 186. 20 Average age = 10 Count 0 0 10 20 30 40 50 60 70 80 90 Age
  187. 187. 20 Average age = 10 Count 0 0 10 20 30 40 50 60 70 80 90 Age
  188. 188. 20 Average age = 10 Count 0 0 10 20 30 40 50 60 70 80 90 Age
  189. 189. Average varies wildly, making it hard to threshold properly or see a real slow-down.
  190. 190. 80th percentile only spikes once for a legitimate slow-down (20% of users affected)
  191. 191. Setting a useful threshold on percentiles gives less false positives and more real alerts
  192. 192. 200 # of requests 0 0 2 4 6 8 10 12 14 16 18 20 Page load time (in seconds)
  193. 193. 200 Average latency = 5s # of requests 0 0 2 4 6 8 10 12 14 16 18 20 Page load time (in seconds)
  194. 194. 0 # of requests 200 0 2 4 Average latency = 5s 6 Page load time (in seconds) 8 10 12 14 16 18 95th percentile latency = 19s 20
  195. 195. KISS
  196. 196. “It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience.” http://media.photobucket.com/image/einstein/derekabril/einstein_010.png
  197. 197. “As simple as possible, but no simpler.” (FYI, this is irony.)
  198. 198. http://www.flickr.com/photos/evilerin/3540381299/ http://www.flickr.com/photos/golf_pictures/2538894627/
  199. 199. Login 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Checkout 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Invite 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s
  200. 200. Login Average 4s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Checkout Average 6s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Invite Average 9s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s
  201. 201. Login Average 4s 95% 8s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Checkout Average 6s 95% 10s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Invite Average 9s 95% 12s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s
  202. 202. Login Average 4s 95% 8s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Mode 2s Checkout Average 6s 95% 10s Mode 5s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Invite Average 9s 95% 12s Mode 1s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s
  203. 203. Login Average 4s 95% 8s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Mode 2s Checkout Average 6s 95% 10s Mode 5s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Invite Average 9s 95% 12s Mode 1s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Aggregate? Average 6s 95% 12s Mode 5s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s
  204. 204. 740 260 Login: <=4s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s
  205. 205. 740 260 Total samples 1000 Login: Below threshold 740 <=4s Percent below 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s target threshold 74%
  206. 206. 740 260 Total samples 1000 Login: Below threshold 740 <=4s Percent below 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s target threshold 74% 370 630 Checkout: Total samples 1000 Below threshold 370 <=5s Percent below 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s target threshold 37% 610 390 Invite: Total samples 1000 <=8s Below threshold 610 Percent below 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s target threshold 61%
  207. 207. 740 260 Total samples 1000 Login: Below threshold 740 <=4s Percent below 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s 74% target threshold 370 630 Checkout: Total samples 1000 Below threshold 370 <=5s Percent below 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s 37% target threshold 610 390 Invite: Total samples 1000 Below threshold 610 <=8s Percent below 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s 61% target threshold Aggregate? Total samples 3000 Below threshold 1720 Percent below 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s 57% target threshold
  208. 208. 740 260 Total samples 1000 Login: Below threshold 740 <=4s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s 148 252 Checkout: Total samples 400 Below threshold 148 <=5s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s 366 366 Invite: Total samples 600 <=8s Below threshold 366 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Total samples 2000 Below threshold 1254 Percent below 63% target threshold
  209. 209. 740 260 Total samples 1000 Login: Below threshold 740 <=4s Weight 1 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s 148 252 Checkout: Total samples 400 Below threshold 148 <=5s Weight 5 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s 366 366 Invite: Total samples 600 <=8s Below threshold 366 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Weight 2
  210. 210. Login: <=4s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Checkout: <=5s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Invite: <=8s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Total requests inside target Login page 740/1000 Checkout page 148/400 Invite process 366/600
  211. 211. Login: <=4s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Checkout: <=5s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Invite: <=8s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Total requests inside target Weight Weighted Login page 740/1000 1 740/1000 Checkout page 148/400 5 740/2000 Invite process 366/600 2 732/1200
  212. 212. Login: <=4s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Checkout: <=5s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Invite: <=8s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Total requests inside target Weight Weighted Login page 740/1000 1 740/1000 Checkout page 148/400 5 740/2000 Invite process 366/600 2 732/1200 Total score 2212/4200 53%
  213. 213. (Snore?)
  214. 214. )" &!!!" !"#$%&'()*+(,&-*.*/0".1& !*2,&'.,%0)3.1& (" %#!!" #" %!!!" '" $#!!" &" $!!!" %" #!!" $" !" !" $" %" &" '" #" (" )" *" +" $!"$$"$%"$&"$'"$#"$("$)"$*"$+"%!"%$"%%"%&"%'"%#"%("%)"%*"%+"&!"&$"&%"&&"&'"&#"&("&)"&*" 4#5& ,-./0" 12-34-5.602" 14789:12-34-5.602;"
  215. 215. )" &!!!" !"#$%&'()*+(,&-*.*/0".1& !*2,&'.,%0)3.1& (" %#!!" #" 71% correlation %!!!" '" $#!!" between traffic &" $!!!" %" $" !" #!!" !" and latency. $" %" &" '" #" (" )" *" +" $!"$$"$%"$&"$'"$#"$("$)"$*"$+"%!"%$"%%"%&"%'"%#"%("%)"%*"%+"&!"&$"&%"&&"&'"&#"&("&)"&*" 4#5& ,-./0" 12-34-5.602" 14789:12-34-5.602;" If you have traffic predictions, and latency is correlated with performance, you may be able to estimate performance in the future from the business plan.* *It’s seldom this simple.
  216. 216. Baselines Establish an agreed-upon set of metrics, and always compare to these baselines. What does “normal” look like? Weekly variance? Seasonality?
  217. 217. Why measure Tactical, to find and fix Strategic, to plan/trend What to measure: How long until a user can use the app as you intended? Part six Where to measure: Targeting metrics WAN, from everywhere Core app functionality to your audience Tiers of components How to measure it: Synth, to ensure it’s working RUM, to see where it’s broken Get the math right
  218. 218. Your goal is to be clearly understood.
  219. 219. How technical are they? Your goal is to be clearly understood.
  220. 220. How technical are they? Your goal is How will they to be clearly use it? understood.
  221. 221. How technical are they? To fix Your goal is something How will they to be clearly use it? understood.
  222. 222. How technical are they? To fix Your goal is something How will they To escalate to be clearly use it? to others understood.
  223. 223. How technical are they? To fix Your goal is something How will they To escalate to be clearly use it? to others understood. To plan the future
  224. 224. How technical are they? To fix Your goal is something How will they To escalate to be clearly use it? to others understood. To plan the future Translate to their jargon
  225. 225. How technical are they? To fix Your goal is something How will they To escalate to be clearly use it? to others understood. To plan the future What words do Translate to they use? their jargon
  226. 226. By timeframe Type of metric Timeframe Delivery Detail
  227. 227. By timeframe Type of metric Timeframe Delivery Detail Break/fix monitoring
  228. 228. By timeframe Type of metric Timeframe Delivery Detail Break/fix monitoring Daily reports
  229. 229. By timeframe Type of metric Timeframe Delivery Detail Break/fix monitoring Daily reports Quarterly planning
  230. 230. By timeframe Type of metric Timeframe Delivery Detail Break/fix Push alerts Simple Urgent monitoring to PDA messages Daily reports Quarterly planning
  231. 231. By timeframe Type of metric Timeframe Delivery Detail Break/fix Push alerts Simple Urgent monitoring to PDA messages Daily Historical Automated Mail PDF reports context Quarterly planning
  232. 232. By timeframe Type of metric Timeframe Delivery Detail Break/fix Push alerts Simple Urgent monitoring to PDA messages Daily Historical Automated Mail PDF reports context Quarterly Part of big Prepared Slide deck planning picture
  233. 233. By medium Where will this wind up? Dashboard NOC screen Log file Someone’s spreadsheet Inbox http://www.flickr.com/photos/warrenski/4190341621/
  234. 234. Why measure Tactical, to find and fix Strategic, to plan/trend What to measure: How long until a user can use the app as you intended? Part seven Where to measure: Marching orders WAN, from everywhere Core app functionality Tiers of components How to measure it: Synth, to ensure it’s working RUM, to see where it’s broken Get the math right
  235. 235. So what should you Some homework. do?
  236. 236. First Meet your analytics team Find out What are the key goals they’re monitoring? Where are visitors coming from? What are the most common entrance and exit pages?
  237. 237. Second Pick the three processes, pages, or functions that matter most to you Landing pages, or part of a conversion funnel
  238. 238. Third Set up monitoring of: Your site from many places (synthetic testing) Your top 3 core business processes (synthetic or RUM) Your important infrastructure tiers (from agents + synthetic, or RUM)
  239. 239. Fourth Wait a week or two To establish a baseline To detect seasonal variance To show others and get buy-in
  240. 240. Fifth While you’re waiting, understand the elements latency and how they affect your performance DNS SSL Network latency Host (server) latency Client page load time
  241. 241. Set a target threshold Now that you have an idea of what “normal” is, set a threshold ... but not just any threshold.
  242. 242. The login page Function will have a total latency Metric of under 4 seconds Target with a cached browser copy User situation from any US branch office Testing point 95% of the time Percentile weekdays, 8AM ET to 6M PST Time window by synth test at 5m intervals Collection type
  243. 243. Apdex score = ( ) ( Satisfied requests + Tolerating requests /2 ) All requests
  244. 244. How Apdex works Frustrated: over 8 seconds Tolerating: 2-8 seconds Satisfied: 0-2 seconds
  245. 245. How Apdex works Frustrated: 5 hits Total requests: 100 Tolerating: 30 hits (65) + (30/2) = 0.80 100 Satisfied: 65 hits
  246. 246. Train your audience Visit key stakeholders and walk them through the report Get them used to the information In the same format At the same time From the same place
  247. 247. Put monitoring into your release cycle Talk to the development team Adding instrumentation Identifying new code functions that need testing Verifying whether optimization worked
  248. 248. Part eight Some tools to check out
  249. 249. Paid
  250. 250. Synthetic Keynote Systems Gomez Webmetrics Alertsite Dotcom Monitor Pingdom ...and many others
  251. 251. RUM Client-side AJAX (Gomez, Coradiant TrueSight Edge) Full Agent-based (Aternity) disclosure: We both worked at Inline (sniffer/tap) Coradiant. Coradiant, Tealeaf, Beatbox(HP), Atomic Labs, Compuware Apdex Server-side (logfile, agent)
  252. 252. Analytics Omniture Webtrends Coremetrics Woopra etc. (lots of specialization)
  253. 253. Open Source
  254. 254. Firebug getfirebug.com
  255. 255. Firebug Also: Webkit inspector, getfirebug.com Google Page Speed
  256. 256. Google Analytics analytics.google.com
  257. 257. webpagetest.org
  258. 258. Monitor.us (Free ain’t pretty, and pretty ain’t free, but it works.) mon.itor.us
  259. 259. AJAX measurement libraries Collecting from visitors: Jiffy (http://code.google.com/p/jiffy-web/) AJAX client sends measurements to Apache collector Other resources ZK-Gazer (http://code.google.com/p/zk-gazer/) http://www.ajaxperformance.com/ (Ryan Breen) http://www.opensourcetesting.org/performance.php
  260. 260. YSlow http://justtalkaboutweb.com/wp-content/uploads/2008/06/yslow.gif http://events.stanford.edu/events/196/19695/souders.jpg
  261. 261. Sites Dashboard Juice analytics’ Dashboard spy blog Insight’s gallery Simple Complexity
  262. 262. Part nine Planning for the future
  263. 263. AJAX
  264. 264. AJAX As for your male and female slaves whom you may have: you may buy male and female slaves from among the nations that are around you. - Leviticus 25:44
  265. 265. http://www.flickr.com/photos/farhannasir/4577508824/ Mobility
  266. 266. http://www.flickr.com/photos/andrewparnell/2738598951/
  267. 267. GET index.html HTTP/1.1 Host: www.stockprice.com Cookie: sessionID=KDF74INED6 Accept: */* <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/ xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <head profile="http://gmpg.org/xfn/11"> <title>Stock price for Apple</title> <script type="text/javascript" src="http:// www.bitcurrent.com/wp-content/themes/ grid_focus_public/js/perftracker.js"></script> <script> <body id=gsr topmargin=3 marginheight=3> AAPL:243.20 </body> </html>
  268. 268. GET index.html Host: www.stockprice.com Cookie: sessionID=KDF74INED6 AAPL:243.20
  269. 269. Web of Web of documents events (circa 1999) (circa 2008) http://www.flickr.com/photos/dnorman/2781572080/ http://www.flickr.com/photos/adamkr/4650637393/
  270. 270. Recap What you need to go and do now.
  271. 271. Metrics must be Relevant: related to a core business assumption Actionable: the basis for a decision or improvement Reproducible: documented and generated cleanly Understandable: easy for stakeholders to grok Accurate: providing the correct view of what happened
  272. 272. Visit your analytics team & read your business model Pick three core business functions to watch Start monitoring One page from many places Key business functions Infrastructure tiers Take a deep breath and establish a baseline Analyze elements of latency while you wait Set target thresholds using a meaningful SLA Calculate a consistent score & train your audience Make it part of the release cycle
  273. 273. Metric Source Target
  274. 274. Metric Source Target Onload time
  275. 275. Metric Source Target Onload From many time places
  276. 276. Metric Source Target Onload From many To the top time places landing page
  277. 277. Metric Source Target Onload From many To the top Uncached time places landing page Cached
  278. 278. Metric Source Target Onload From many To the top Uncached time places landing page Cached Server time
  279. 279. Metric Source Target Onload From many To the top Uncached time places landing page Cached Server From one time place, often
  280. 280. Metric Source Target Onload From many To the top Uncached time places landing page Cached Server From one To your core time place, often business process
  281. 281. Metric Source Target Onload From many To the top Uncached time places landing page Cached Server From one To your core time place, often business process Server time
  282. 282. Metric Source Target Onload From many To the top Uncached time places landing page Cached Server From one To your core time place, often business process Server From one time place, often
  283. 283. Metric Source Target Onload From many To the top Uncached time places landing page Cached Server From one To your core time place, often business process Server From one To each tier time place, often (web, I/O, CPU)
  284. 284. Metric Source Target Onload From many To the top Uncached time places landing page Cached Server From one To your core time place, often business process Server From one To each tier time place, often (web, I/O, CPU) List Criteria Segmentation
  285. 285. Metric Source Target Onload From many To the top Uncached time places landing page Cached Server From one To your core time place, often business process Server From one To each tier time place, often (web, I/O, CPU) List Criteria Segmentation TopN
  286. 286. Metric Source Target Onload From many To the top Uncached time places landing page Cached Server From one To your core time place, often business process Server From one To each tier time place, often (web, I/O, CPU) List Criteria Segmentation Worst TopN pages
  287. 287. Metric Source Target Onload From many To the top Uncached time places landing page Cached Server From one To your core time place, often business process Server From one To each tier time place, often (web, I/O, CPU) List Criteria Segmentation Worst TopN pages by
  288. 288. Metric Source Target Onload From many To the top Uncached time places landing page Cached Server From one To your core time place, often business process Server From one To each tier time place, often (web, I/O, CPU) List Criteria Segmentation Worst TopN pages by Error rate
  289. 289. Metric Source Target Onload From many To the top Uncached time places landing page Cached Server From one To your core time place, often business process Server From one To each tier time place, often (web, I/O, CPU) List Criteria Segmentation Worst Server TopN pages by Error rate latency
  290. 290. Metric Source Target Onload From many To the top Uncached time places landing page Cached Server From one To your core time place, often business process Server From one To each tier time place, often (web, I/O, CPU) List Criteria Segmentation Worst Server Network TopN pages by Error rate latency latency
  291. 291. Got one report? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  292. 292. Got one report? 5,000 Unique page views 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  293. 293. Got one report? 5,000 Unique page views >4s or error 2-4s 0 <2s 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  294. 294. Got one report? 5,000 Unique page views >4s or error 2-4s 0 <2s 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  295. 295. Got one report? 5,000 Unique page views >4s or error 2-4s 0 <2s 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  296. 296. Got one report? 5,000 Unique page views >4s or error 2-4s 0 <2s 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  297. 297. Got one report? 5,000 Unique page views >4s or error 2-4s 0 <2s 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  298. 298. Got one report? 5,000 $10,000 Revenue (total sales) Unique page views >4s or error 2-4s 0 $0 <2s 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  299. 299. Got one report? 5,000 $10,000 Revenue (total sales) Unique page views Conversions >4s or error 2-4s 0 $0 <2s 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  300. 300. Got one report? 5,000 $10,000 Revenue (total sales) Unique page views Conversions >4s or error 2-4s 0 $0 <2s 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  301. 301. Thanks! @seanpower sean@httpd.org @acroll alistair@bitcurrent.com www.watchingwebsites.com (and go buy this.)
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×