Metrics 101

29,433 views
29,343 views

Published on

Slides from the Velocity 2010 presentation, "Metrics 101" by Alistair Croll and Sean Power, authors of Complete Web Monitoring (O'Reilly, 2010)

Published in: Technology, Business
4 Comments
87 Likes
Statistics
Notes
No Downloads
Views
Total views
29,433
On SlideShare
0
From Embeds
0
Number of Embeds
33
Actions
Shares
0
Downloads
1,439
Comments
4
Likes
87
Embeds 0
No embeds

No notes for slide

Metrics 101

  1. 1. Metrics 101 What to watch
  2. 2. What we’ll cover Why collect metrics Understanding web latency How to target your findings Concrete steps to get started
  3. 3. Part one Why collect metrics?
  4. 4. http://www.flickr.com/photos/chidorian/12411641/
  5. 5. Downtime costs
  6. 6. Downtime costs eBay offline ($90K/h) 22h outage at eBay cost $2M ($90,909/h) (Internetnews, 1999)
  7. 7. Downtime costs eBay offline ($90K/h) 22h outage at eBay cost $2M ($90,909/h) (Internetnews, 1999) Financial company down ($100K/h) 53.2% of finance companies lose over $100,000/hour (nextslm.org)
  8. 8. Downtime costs Amazon offline ($1M/h) Amazon loses nearly $1M/hour if down (NYT, 2008) eBay offline ($90K/h) 22h outage at eBay cost $2M ($90,909/h) (Internetnews, 1999) Financial company down ($100K/h) 53.2% of finance companies lose over $100,000/hour (nextslm.org)
  9. 9. Downtime costs Amazon offline ($1M/h) Amazon loses nearly $1M/hour if down (NYT, 2008) Network downtime ($42K/h) 1 hour of network downtime costs $42,000 (Gartner, 2003) eBay offline ($90K/h) 22h outage at eBay cost $2M ($90,909/h) (Internetnews, 1999) Financial company down ($100K/h) 53.2% of finance companies lose over $100,000/hour (nextslm.org)
  10. 10. Downtime costs Amazon offline ($1M/h) Amazon loses nearly $1M/hour if down (NYT, 2008) Network downtime ($42K/h) 1 hour of network downtime costs $42,000 (Gartner, 2003) eBay offline ($90K/h) 22h outage at eBay cost $2M ($90,909/h) (Internetnews, 1999) Financial company down ($100K/h) 53.2% of finance companies lose over $100,000/hour (nextslm.org) Let’s say $50K/h if you’re serious.
  11. 11. Availability Downtime/year Loss @$50K/h 90% % 36.5 days Can$43,800,000 95% 18.25 days Can$21,900,000 98% 7.30 days Can$8,760,000 99% 3.65 days Can$4,380,000 99.5% 1.83 days Can$2,196,000 99.8% 17.52 hours Can$876,000 99.9% 8.76 hours Can$438,000 99.95% 4.38 hours Can$219,000 99.99% 52.6 minutes Can$43,833 99.999% 5.26 minutes Can$4,383 99.9999% 31.5 seconds Can$438
  12. 12. Availability Downtime/year Loss @$50K/h 90% % 36.5 days Can$43,800,000 95% 18.25 days Can$21,900,000 98% 7.30 days Can$8,760,000 99% 3.65 days Can$4,380,000 99.5% 1.83 days Can$2,196,000 99.8% 17.52 hours Can$876,000 99.9% 8.76 hours Can$438,000 Less than 99.95% 4.38 hours Can$219,000 an hour a 99.99% 52.6 minutes Can$43,833 year 99.999% 5.26 minutes Can$4,383 99.9999% 31.5 seconds Can$438
  13. 13. Availability Downtime/year Loss @$50K/h 90% % 36.5 days Can$43,800,000 95% 18.25 days Can$21,900,000 98% 7.30 days Can$8,760,000 99% 3.65 days Can$4,380,000 99.5% 1.83 days Can$2,196,000 99.8% 17.52 hours Can$876,000 99.9% 8.76 hours Can$438,000 Less than 99.95% 4.38 hours Can$219,000 an hour a 99.99% 52.6 minutes Can$43,833 year 99.999% 5.26 minutes Can$4,383 Less than 99.9999% 31.5 seconds Can$438 a minute a year
  14. 14. Harris poll conducted by Tealeaf in 2008
  15. 15. You really don’t want web users to call you. $15 $12 $9 $6 $3 $0 Web self-service IVR Email Live phone Cost estimates BiT Group White Paper: “Web Self-Service Lowers Call Center Costs and Improves Customer Service” Low Average High
  16. 16. You really don’t want web users to call you. $15 $12 $9 $6 $3 Can$0.24 $0 Web self-service IVR Email Live phone Cost estimates BiT Group White Paper: “Web Self-Service Lowers Call Center Costs and Improves Customer Service” Low Average High
  17. 17. You really don’t want web users to call you. $15 $12 $9 $6 $3 Can$0.24 Can$0.45 $0 Web self-service IVR Email Live phone Cost estimates BiT Group White Paper: “Web Self-Service Lowers Call Center Costs and Improves Customer Service” Low Average High
  18. 18. You really don’t want web users to call you. $15 $12 $9 $6 Can$3.00 $3 Can$0.24 Can$0.45 $0 Web self-service IVR Email Live phone Cost estimates BiT Group White Paper: “Web Self-Service Lowers Call Center Costs and Improves Customer Service” Low Average High
  19. 19. You really don’t want web users to call you. $15 $12 $9 $6 Can$5.50 Can$3.00 $3 Can$0.24 Can$0.45 $0 Web self-service IVR Email Live phone Cost estimates BiT Group White Paper: “Web Self-Service Lowers Call Center Costs and Improves Customer Service” Low Average High
  20. 20. http://www.flickr.com/photos/pagedooley/2811157950/
  21. 21. If you don’t know the past you can’t know the future. If you don’t know the future, you can’t budget for it. Photo by Alan Cleaver from his Flicker Freestock set. Thanks, Alan! http://www.flickr.com/photos/alancleaver/2638883650/
  22. 22. “A plan so crazy, it just might work.”
  23. 23. http://www.flickr.com/photos/genewolf/147722350
  24. 24. http://www.flickr.com/photos/billselak/366692332/
  25. 25. Everything starts with a baseline.
  26. 26. Everything starts with a baseline. Know what’s worst.
  27. 27. Everything starts with a baseline. Know what’s Prove you worst. made it better.
  28. 28. The cycle of optimization Metrics & strategy
  29. 29. The cycle of optimization Metrics & strategy Collection
  30. 30. The cycle of optimization Metrics & strategy Collection Reporting
  31. 31. The cycle of optimization Metrics & strategy Collection Reporting Institutionalizing the results
  32. 32. The cycle of optimization Metrics & strategy Collection Link to KPI/ Reporting ROI Institutionalizing the results
  33. 33. The cycle of optimization Metrics & strategy Optimization Collection & change Link to KPI/ Reporting ROI Institutionalizing the results
  34. 34. The cycle of optimization Metrics & strategy Optimization Collection & change Link to KPI/ Reporting ROI Institutionalizing the results
  35. 35. http://www.flickr.com/photos/elsie/8229790/
  36. 36. Understanding your goals. http://www.flickr.com/photos/itsgreg/446061432/
  37. 37. Organic Ad Campaigns search network $ 1 1 1 Advertiser site Visitor 2 O er 3 $ 8 Upselling 4 Abandonment Reach 5 Purchase step $ Mailing, alerts, Purchase step $ 9 promotions $ Conversion $ Disengagement 7 Enrolment 6 Impact on site $ Positive $ Negative
  38. 38. Bad $ 4 content Social Search Invitation network link results 4 Good content 1 $ 1 1 Collaboration site 2 Visitor Content creation Moderation $ 3 Spam & trolls $ Engagement 5 Viral 6 Social graph spread 7 Disengagement $ Impact on site $ Positive $ Negative
  39. 39. Enterprise subscriber $ 1 End user (employee) $ Refund $ 2 Renewal, upsell, SLA reference SaaS site violation Performance Good Bad 3 Helpdesk Support 5 $ Usability escalation costs 7 4 Good Bad Productivity Good Bad 6 Churn $ Impact on site $ Positive $ Negative
  40. 40. $ Media site Enrolment Targeted 2 embedded ad 5 $ 6 1 Ad Visitor network 4 3 5 Advertiser $ Departure $ site Impact on site $ Positive $ Negative
  41. 41. Why measure Tactical, to find and fix Strategic, to plan/trend Part two The elements of web latency
  42. 42. Slow sites suck
  43. 43. Slow sites suck Lower conversion rates
  44. 44. Slow sites suck Lower conversion rates Less likely to attract a loyal following
  45. 45. Slow sites suck Lower conversion rates Less likely to attract a loyal following Liable for damages
  46. 46. Slow sites suck Lower conversion rates Less likely to attract a loyal following Liable for damages Liable for refunds or service credits
  47. 47. Slow sites suck Lower conversion rates Less likely to attract a loyal following Liable for damages Liable for refunds or service credits Customers find other channels that cost more
  48. 48. Why the web is slow A crash course in performance & availability.
  49. 49. Load Web App Internet balancer server server DB Client www.example.com
  50. 50. Your website Load Web App Internet balancer server server DB Client www.example.com
  51. 51. DNS Load Web App Internet balancer server server DB Client DNS “www.example.com”
  52. 52. DNS DNS lookup Load Web App Internet balancer server server DB Client DNS “www.example.com”
  53. 53. DNS DNS lookup Load Web App Internet balancer server server DB Client DNS “www.example.com”
  54. 54. IP IP Load Web App Internet balancer server server DB Client
  55. 55. IP IP Load Web App Internet balancer server server DB Client Internet routing
  56. 56. IP R IP R Load Web App Internet R balancer server server DB Client R R Internet routing
  57. 57. IP R IP R Load Web App Internet R balancer server server DB Client R R TCP session
  58. 58. IP R IP R Load Web App Internet R balancer server server DB Client R R TCP session
  59. 59. Letter writing Postal service
  60. 60. You Them (sender) (receiver)
  61. 61. This is a sentence You Them (sender) (receiver)
  62. 62. This is a sentence You Them (sender) (receiver)
  63. 63. You Them (sender) (receiver)
  64. 64. You Them (sender) (receiver)
  65. 65. sentence This is a You Them (sender) (receiver)
  66. 66. You Them (sender) (receiver)
  67. 67. This is a sentence You Them (sender) (receiver)
  68. 68. This is a sentence 3 2 1 4 You Them (sender) (receiver)
  69. 69. You Them (sender) (receiver)
  70. 70. This is a sentence You Them (sender) (receiver)
  71. 71. This is sentence 2 1 You Them (sender) (receiver)
  72. 72. This is sentence 2 1 4 You Them (sender) (receiver)
  73. 73. This WTF? is sentence 2 1 4 You Them (sender) (receiver)
  74. 74. sentence a This 4 3 1
  75. 75. sentence a This 4 3 1 “Can you send #2 again?”
  76. 76. sentence a This 4 3 1 “Can you send #2 again?” is “Sure. Here you go.” 2
  77. 77. How computers “connect”
  78. 78. IP IP Load Web App Internet balancer server server DB Client
  79. 79. The HTTP “stack” IP IP Load Web App Internet balancer server server DB Client
  80. 80. The HTTP “stack” TCP TCP IP IP Load Web App Internet balancer server server DB Client
  81. 81. The HTTP “stack” SSL SSL TCP TCP IP IP Load Web App Internet balancer server server DB Client
  82. 82. The HTTP “stack” HTTP HTTP SSL SSL TCP TCP IP IP Load Web App Internet balancer server server DB Client
  83. 83. Getting a page by hand
  84. 84. Getting a page by hand Trying 67.205.65.12... Connected to bitcurrent.com. Escape character is '^]'.
  85. 85. Getting a page by hand Trying 67.205.65.12... Connected to bitcurrent.com. Escape character is '^]'. GET / <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/ xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head profile="http://gmpg.org/xfn/11"> <script type="text/javascript" src="http:// www.bitcurrent.com/wp-content/themes/ grid_focus_public/js/perftracker.js"></script> <script> </body> </html> Connection closed by foreign host.
  86. 86. Static content HTTP HTTP HTTP SSL SSL TCP TCP IP IP Load Web App Internet balancer server server DB Client image.gif GET www.example.com/image.gif
  87. 87. Static content HTTP HTTP HTTP SSL SSL TCP TCP IP IP Load Web App Internet balancer server server DB Client image.gif GET www.example.com/image.gif
  88. 88. Static content Dynamic content HTTP HTTP HTTP SSL SSL TCP TCP IP IP Load Web App Internet balancer server server DB Client dynamic.jsp GET www.example.com/dynamic.jsp
  89. 89. Static content Dynamic content HTTP HTTP HTTP SSL SSL TCP TCP IP IP Load Web App Internet balancer server server DB Client dynamic.jsp GET www.example.com/dynamic.jsp
  90. 90. Static content Dynamic Stored content data HTTP HTTP HTTP SSL SSL TCP TCP IP IP Load Web App Internet balancer server server DB Client (Database) POST www.example.com/data.cgi
  91. 91. Static content Dynamic Stored content data HTTP HTTP HTTP SSL SSL TCP TCP IP IP Load Web App Internet balancer server server DB Client (Database) POST www.example.com/data.cgi
  92. 92. Browser Data center Server
  93. 93. Browser Data center Server
  94. 94. Browser Data center Server TCP SYN (“let’s talk”) TCP SYN ACK (“Agreed: let’s talk”) TCP ACK (“OK, we’re talking)
  95. 95. Browser Data center Server TCP SYN (“let’s talk”) TCP SYN ACK (“Agreed: let’s talk”) TCP ACK (“OK, we’re talking) SSL (“Someone might be listening!”) SSL (“Here’s a decoder ring”)
  96. 96. Browser Data center Server TCP SYN (“let’s talk”) TCP SYN ACK (“Agreed: let’s talk”) TCP ACK (“OK, we’re talking) SSL (“Someone might be listening!”) SSL (“Here’s a decoder ring”) HTTP GET / (“Can I have your home page?”) HTTP 200 OK (“Sure!”) (thinks [index.html] (“Here it is!”) a bit) (Renders furiously) Bump, bump. [img js css] (“Have this too!”)
  97. 97. Browser Data center Server TCP SYN (“let’s talk”) TCP SYN ACK (“Agreed: let’s talk”) TCP ACK (“OK, we’re talking) SSL (“Someone might be listening!”) SSL (“Here’s a decoder ring”) HTTP GET / (“Can I have your home page?”) HTTP 200 OK (“Sure!”) (thinks [index.html] (“Here it is!”) a bit) (Renders furiously) Bump, bump. [img js css] (“Have this too!”) TCP FIN (“Thanks! I’m done now.”) TCP FIN ACK (“You’re welcome. Have a nice day.”)
  98. 98. “Page load time” isn’t simple Documents versus event models AJAX Mobility CDNs Third-party content Embedded objects and plug-ins
  99. 99. Connections to load Connection 0 - www.bitcurrent.com (67.205.65.12) Connection 1 - www.bitcurrent.com (67.205.65.12) Connection 2 - 4qinvite.4q.iperceptions.com (64.18.71.70) Connection 3 - static.slideshare.net (66.114.49.24) Connection 4 - static.slideshare.net (66.114.49.24) Connection 5 - www.feedburner.com (66.150.96.123) Connection 6 - static.getclicky.com (204.13.8.18) Connection 7 - cetrk.com (208.67.183.100) Connection 8 - in.getclicky.com (204.13.8.18) Connection 9 - crazyegg.com (208.67.180.236) Connection 10 - www.google-analytics.com (72.14.223.147) Connection 11 - www.apture.com (67.192.46.19) Connection 12 - static.apture.com (67.192.46.25) Connection 13 - s.clicktale.net (66.114.49.24) Connection 14 - www.clicktale.net (75.125.82.70)
  100. 100. Analytics site Server Data center Browser Server Mashup Server site
  101. 101. Analytics site Server Data center Browser Server Snore. Mashup Server site
  102. 102. What ultimately matters: When can the user start using the application as its designers intended?
  103. 103. Part of the problem You control You’re blamed for Server latency Page rendering Network latency for Total network latency known content and User environment network parameters
  104. 104. Part of the problem You control You’re blamed for Server latency Page rendering Network latency for Total network latency known content and User environment network parameters You need diagnostic metrics so you can fix it.
  105. 105. Part of the problem You control You’re blamed for Server latency Page rendering Network latency for Total network latency known content and User environment network parameters You need escalation You need metrics so you can prove diagnostic metrics it and make it someone so you can fix it. else’s problem.
  106. 106. Why measure Tactical, to find and fix Strategic, to plan/trend What to measure: How long until a user can use the app as you intended? Part three Where to measure
  107. 107. Three tiers of data WAN accessibility: One test from many locations Can everybody get here? App functionality: Several tests of key processes Is my business model working correctly? Tiered tests: Frequent metrics of each tier Is network, service, CPU, data I/O to blame?
  108. 108. WAN accessibility Place A Task B Client Goal C ... Load Web App Internet balancer server server DB Client
  109. 109. Analytics can tell you a lot.
  110. 110. App functionality Page A Page B Client Event C Load Web App Internet balancer server server DB Client
  111. 111. http://www.flickr.com/photos/tinfoilraccoon/197640807/
  112. 112. Places and Tasks.
  113. 113. Landing page: View one story
  114. 114. Landing page: View one story Task: Log in Enter credentials Verify Recovery
  115. 115. Landing page: View one story Task: Log in Enter credentials Verify Recovery Task: Forward a story Enter recipients Enter message Send
  116. 116. Landing page: Task: View one story Create account Task: Log in Pick name Check if free Enter credentials Set Password Verify CAPTCHA Recovery Send mail Get confirm Task: Forward a story Enter recipients Enter message Send
  117. 117. Landing page: Task: View one story Create account Task: Log in Pick name Check if free Enter credentials Set Password Verify CAPTCHA Recovery Send mail Get confirm Task: Forward a story Task: Submit Enter recipients a new story Enter message Send Enter URL Describe Deduplicate Post it
  118. 118. Landing page: Task: View one story Create account Task: Log in Pick name Place: View stories Check if free Enter credentials Vote up Next 25 Set Password Verify Vote down Last 25 CAPTCHA Recovery Send mail Get confirm Task: Forward a story Task: Submit Enter recipients a new story Enter message Send Enter URL Describe Deduplicate Post it
  119. 119. Landing page: Task: View one story Create account Task: Log in Pick name Place: View stories Check if free Enter credentials Vote up Next 25 Set Password Verify Vote down Last 25 CAPTCHA Recovery Send mail Place: Read Get confirm poster comments Vote up Next 25 Task: Vote down Last 25 Forward a story Task: Submit Enter recipients a new story Enter message Send Enter URL Describe Deduplicate Post it
  120. 120. Landing page: Task: View one story Create account Task: Log in Pick name Place: View stories Check if free Enter credentials Vote up Next 25 Set Password Verify Vote down Last 25 CAPTCHA Recovery Send mail Place: Read Get confirm poster comments Vote up Next 25 Task: Vote down Last 25 Forward a story Task: Submit Enter recipients a new story Place: My Enter message Enter URL account Send Describe Change My address comments Deduplicate Change PW See karma Post it
  121. 121. Landing page: Create acct. View one story Task: Log in Place: View stories Place: Read poster comments Task: Forward a story Task: Submit a new story Place: My account
  122. 122. Landing page: Create acct. Create acct. View one story Form uptime Place: View stories Task: Log in # started Bad form Place: Read # CAPTCHA poster comments Mail uptime Task: Forward a story Mail bounced Task: Submit a new story Place: My Confirm & return account Return 3x
  123. 123. Landing page: Create acct. View one story Task: Log in Place: View stories Place: View stories Stories/visit Place: Read # up/down poster comments Time/story Top stories Task: Forward a story Task: Submit Refresh time Views/page a new story Place: My account
  124. 124. Landing page: Create acct. View one story Task: Log in Place: View stories Place: Read poster comments Task: Forward a story Task: Submit a new story Place: My account
  125. 125. Places Efficiency matters How quickly, how many, productivity Learning curve OK Leave when they’re bored Collect “aha” feedback A/B test content for pages/session, exits
  126. 126. Tasks Effectiveness matters Completion, abandonment Intuitiveness rules Leave when they change their mind or it breaks Collect “motivation” feedback A/B test layouts for conversion
  127. 127. 2 sides of the same coin End user Web analytics monitoring What did Could they visitors do? do it?
  128. 128. For e-commerce sites Can people buy things?
  129. 129. For media sites Are ads loading quickly and successfully clicked through? Is content loading fast enough for visitors?
  130. 130. For collaboration sites Can visitors contribute (posting content, voting?) Is bad content being mitigated (trolling, spam)?
  131. 131. For SaaS sites Are your end users productive? Are they making fewer mistakes? Is the site working during customers’ business hours?
  132. 132. Tiered tests Place A Task B Client Goal C Load Web App Internet balancer server server DB Client
  133. 133. Testing the tiers Load Web App Internet balancer server server DB Client Request Do some Search a Request a uncached heavy dataset for big object object computing a string (Or watch (Or track CPU) query time)
  134. 134. ,)-$(&./01+2(3/04(#$+#+( &)$ %,$ %+$ %*$ !"#$%&'()%(*+( %&$ %)$ ,$ '""#$($ +$ '""#$&$ *$ '""#$%$ &$ )$ !""#$&$ -./01$2341$ !""#$%$ 53"67$2341$ 8!9$2341$ ':$2341$
  135. 135. Why measure Tactical, to find and fix Strategic, to plan/trend What to measure: How long until a user can use the app as you intended? Part four Where to measure: How to measure WAN, from everywhere Core app functionality performance data Tiers of components
  136. 136. Synthetic testing.
  137. 137. Load Web App Internet balancer server server DB Client
  138. 138. Management tool Load Web App Internet balancer server server DB Client
  139. 139. Load Web App Internet balancer server server DB Client
  140. 140. Load Web App Internet balancer server server DB Client
  141. 141. Test Testing config node Data center Testing node Website Testing node
  142. 142. Test Testing config node Data center Testing node Website Testing node
  143. 143. Test Testing config node Data center Testing node Website Testing node
  144. 144. Test Testing config node Data center Testing node Website Reporting service Testing node
  145. 145. Three things to watch for Cached vs. uncached Scripts vs. puppetry Simultaneous vs. sequential
  146. 146. 0 10 Load time (seconds) Cached Uncached
  147. 147. 10 Load time (seconds) 3.157s 0 Cached Uncached
  148. 148. 13.349s 10 Load time (seconds) 3.157s 0 Cached Uncached
  149. 149. Testing script Script interpreter
  150. 150. Testing script Site: test.com Page: index.html Script interpreter
  151. 151. Testing script Site: test.com Page: index.html Script interpreter HTTP GET www.test.com/index.html
  152. 152. Testing script Site: test.com Page: index.html Script interpreter 200 OK index.html image.gif stylesheet.css etc...
  153. 153. Testing script Site: test.com Test complete Page: index.html Script interpreter
  154. 154. Browser controller Actual browser
  155. 155. Browser controller DOM actions (“click on button 4”) Actual browser
  156. 156. Browser controller DOM actions (“click on button 4”) Actual browser HTTP GET www.test.com/index.html
  157. 157. Browser controller DOM actions (“click on button 4”) Actual browser 200 OK index.html image.gif stylesheet.css etc...
  158. 158. Browser controller DOM actions DOM contents (“click on button 4”) (“DIV contains ‘error’”) Actual browser
  159. 159. Simultaneous 5 tests at 15:00
  160. 160. Simultaneous Sequential 5 tests from 5 tests at 15:00 15:00 to 15:05
  161. 161. Synthetic pros & cons Pros Cons Easy to set up Brittle Only way to test without Detects macro outages, not actual visitor traffic user events Can compare to Good geographic & network competitors coverage costs money, Easy baseline establishment generates load Detects a problem before No measurement of traffic visitors sees it volume Consistent data over time Places load on the site under test
  162. 162. Ultimately, Synthetic testing shows you if the site’s working.
  163. 163. Real User Monitoring.
  164. 164. Synthetic isn’t enough
  165. 165. Synthetic isn’t enough
  166. 166. Browser Web server
  167. 167. Browser Load Web balancer server
  168. 168. Browser Load Web Network balancer server tap
  169. 169. Browser Load Web Network balancer server tap
  170. 170. Browser Load Web Network balancer server tap
  171. 171. Browser Load Web Network balancer server tap
  172. 172. Browser Load Web Network balancer server tap
  173. 173. Browser Load Web Network balancer server tap
  174. 174. Browser Load Web Network balancer server tap User A
  175. 175. Browser Load Web Network balancer server tap User A User B User C
  176. 176. Browser Load Web Network balancer server tap User A User B User C Visit history P1 P2 P3
  177. 177. Browser Load Web Network balancer server tap User A User B User C Visit Aggregate history reports P1 P2 P3
  178. 178. Browser Load Web Network balancer server tap User A User B User C Visit Aggregate Alerts history reports ! P1 P2 P3
  179. 179. TopN, worstN RUM tools are excellent for more qualitative data What’s most broken? What’s biggest? What’s slowest? What’s most inconsistent?
  180. 180. RUM pros & cons Pros Cons Directly correlated with May require physical clickstream, analytics installation Watches everything, not just Can be a privacy risk the things you know about Doesn’t work if there’s no Can be used to reproduce traffic problems Need to filter out your own Measures traffic as well as visits, crawlers, etc. performance
  181. 181. Ultimately RUM shows you if the site’s working.
  182. 182. Why measure Tactical, to find and fix Strategic, to plan/trend What to measure: How long until a user can use the app as you intended? Part five Where to measure: Getting the math right WAN, from everywhere Core app functionality Tiers of components How to measure it: Synth, to ensure it’s working RUM, to see where it’s broken
  183. 183. http://upload.wikimedia.org/wikipedia/commons/0/0e/Count-von-count.jpg
  184. 184. 0 10 20 30 40 50 60 70 80 90 Age
  185. 185. Average age = 10 0 10 20 30 40 50 60 70 80 90 Age
  186. 186. 20 Average age = 10 Count 0 0 10 20 30 40 50 60 70 80 90 Age
  187. 187. 20 Average age = 10 Count 0 0 10 20 30 40 50 60 70 80 90 Age
  188. 188. 20 Average age = 10 Count 0 0 10 20 30 40 50 60 70 80 90 Age
  189. 189. Average varies wildly, making it hard to threshold properly or see a real slow-down.
  190. 190. 80th percentile only spikes once for a legitimate slow-down (20% of users affected)
  191. 191. Setting a useful threshold on percentiles gives less false positives and more real alerts
  192. 192. 200 # of requests 0 0 2 4 6 8 10 12 14 16 18 20 Page load time (in seconds)
  193. 193. 200 Average latency = 5s # of requests 0 0 2 4 6 8 10 12 14 16 18 20 Page load time (in seconds)
  194. 194. 0 # of requests 200 0 2 4 Average latency = 5s 6 Page load time (in seconds) 8 10 12 14 16 18 95th percentile latency = 19s 20
  195. 195. KISS
  196. 196. “It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience.” http://media.photobucket.com/image/einstein/derekabril/einstein_010.png
  197. 197. “As simple as possible, but no simpler.” (FYI, this is irony.)
  198. 198. http://www.flickr.com/photos/evilerin/3540381299/ http://www.flickr.com/photos/golf_pictures/2538894627/
  199. 199. Login 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Checkout 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Invite 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s
  200. 200. Login Average 4s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Checkout Average 6s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Invite Average 9s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s
  201. 201. Login Average 4s 95% 8s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Checkout Average 6s 95% 10s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Invite Average 9s 95% 12s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s
  202. 202. Login Average 4s 95% 8s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Mode 2s Checkout Average 6s 95% 10s Mode 5s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Invite Average 9s 95% 12s Mode 1s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s
  203. 203. Login Average 4s 95% 8s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Mode 2s Checkout Average 6s 95% 10s Mode 5s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Invite Average 9s 95% 12s Mode 1s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Aggregate? Average 6s 95% 12s Mode 5s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s
  204. 204. 740 260 Login: <=4s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s
  205. 205. 740 260 Total samples 1000 Login: Below threshold 740 <=4s Percent below 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s target threshold 74%
  206. 206. 740 260 Total samples 1000 Login: Below threshold 740 <=4s Percent below 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s target threshold 74% 370 630 Checkout: Total samples 1000 Below threshold 370 <=5s Percent below 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s target threshold 37% 610 390 Invite: Total samples 1000 <=8s Below threshold 610 Percent below 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s target threshold 61%
  207. 207. 740 260 Total samples 1000 Login: Below threshold 740 <=4s Percent below 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s 74% target threshold 370 630 Checkout: Total samples 1000 Below threshold 370 <=5s Percent below 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s 37% target threshold 610 390 Invite: Total samples 1000 Below threshold 610 <=8s Percent below 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s 61% target threshold Aggregate? Total samples 3000 Below threshold 1720 Percent below 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s 57% target threshold
  208. 208. 740 260 Total samples 1000 Login: Below threshold 740 <=4s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s 148 252 Checkout: Total samples 400 Below threshold 148 <=5s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s 366 366 Invite: Total samples 600 <=8s Below threshold 366 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Total samples 2000 Below threshold 1254 Percent below 63% target threshold
  209. 209. 740 260 Total samples 1000 Login: Below threshold 740 <=4s Weight 1 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s 148 252 Checkout: Total samples 400 Below threshold 148 <=5s Weight 5 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s 366 366 Invite: Total samples 600 <=8s Below threshold 366 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Weight 2
  210. 210. Login: <=4s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Checkout: <=5s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Invite: <=8s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Total requests inside target Login page 740/1000 Checkout page 148/400 Invite process 366/600
  211. 211. Login: <=4s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Checkout: <=5s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Invite: <=8s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Total requests inside target Weight Weighted Login page 740/1000 1 740/1000 Checkout page 148/400 5 740/2000 Invite process 366/600 2 732/1200
  212. 212. Login: <=4s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Checkout: <=5s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Invite: <=8s 1s 2s 3s 4s 5s 6s 7s 8s 9s 10s 11s 12s Total requests inside target Weight Weighted Login page 740/1000 1 740/1000 Checkout page 148/400 5 740/2000 Invite process 366/600 2 732/1200 Total score 2212/4200 53%
  213. 213. (Snore?)
  214. 214. )" &!!!" !"#$%&'()*+(,&-*.*/0".1& !*2,&'.,%0)3.1& (" %#!!" #" %!!!" '" $#!!" &" $!!!" %" #!!" $" !" !" $" %" &" '" #" (" )" *" +" $!"$$"$%"$&"$'"$#"$("$)"$*"$+"%!"%$"%%"%&"%'"%#"%("%)"%*"%+"&!"&$"&%"&&"&'"&#"&("&)"&*" 4#5& ,-./0" 12-34-5.602" 14789:12-34-5.602;"
  215. 215. )" &!!!" !"#$%&'()*+(,&-*.*/0".1& !*2,&'.,%0)3.1& (" %#!!" #" 71% correlation %!!!" '" $#!!" between traffic &" $!!!" %" $" !" #!!" !" and latency. $" %" &" '" #" (" )" *" +" $!"$$"$%"$&"$'"$#"$("$)"$*"$+"%!"%$"%%"%&"%'"%#"%("%)"%*"%+"&!"&$"&%"&&"&'"&#"&("&)"&*" 4#5& ,-./0" 12-34-5.602" 14789:12-34-5.602;" If you have traffic predictions, and latency is correlated with performance, you may be able to estimate performance in the future from the business plan.* *It’s seldom this simple.
  216. 216. Baselines Establish an agreed-upon set of metrics, and always compare to these baselines. What does “normal” look like? Weekly variance? Seasonality?
  217. 217. Why measure Tactical, to find and fix Strategic, to plan/trend What to measure: How long until a user can use the app as you intended? Part six Where to measure: Targeting metrics WAN, from everywhere Core app functionality to your audience Tiers of components How to measure it: Synth, to ensure it’s working RUM, to see where it’s broken Get the math right
  218. 218. Your goal is to be clearly understood.
  219. 219. How technical are they? Your goal is to be clearly understood.
  220. 220. How technical are they? Your goal is How will they to be clearly use it? understood.
  221. 221. How technical are they? To fix Your goal is something How will they to be clearly use it? understood.
  222. 222. How technical are they? To fix Your goal is something How will they To escalate to be clearly use it? to others understood.
  223. 223. How technical are they? To fix Your goal is something How will they To escalate to be clearly use it? to others understood. To plan the future
  224. 224. How technical are they? To fix Your goal is something How will they To escalate to be clearly use it? to others understood. To plan the future Translate to their jargon
  225. 225. How technical are they? To fix Your goal is something How will they To escalate to be clearly use it? to others understood. To plan the future What words do Translate to they use? their jargon
  226. 226. By timeframe Type of metric Timeframe Delivery Detail
  227. 227. By timeframe Type of metric Timeframe Delivery Detail Break/fix monitoring
  228. 228. By timeframe Type of metric Timeframe Delivery Detail Break/fix monitoring Daily reports
  229. 229. By timeframe Type of metric Timeframe Delivery Detail Break/fix monitoring Daily reports Quarterly planning
  230. 230. By timeframe Type of metric Timeframe Delivery Detail Break/fix Push alerts Simple Urgent monitoring to PDA messages Daily reports Quarterly planning
  231. 231. By timeframe Type of metric Timeframe Delivery Detail Break/fix Push alerts Simple Urgent monitoring to PDA messages Daily Historical Automated Mail PDF reports context Quarterly planning
  232. 232. By timeframe Type of metric Timeframe Delivery Detail Break/fix Push alerts Simple Urgent monitoring to PDA messages Daily Historical Automated Mail PDF reports context Quarterly Part of big Prepared Slide deck planning picture
  233. 233. By medium Where will this wind up? Dashboard NOC screen Log file Someone’s spreadsheet Inbox http://www.flickr.com/photos/warrenski/4190341621/
  234. 234. Why measure Tactical, to find and fix Strategic, to plan/trend What to measure: How long until a user can use the app as you intended? Part seven Where to measure: Marching orders WAN, from everywhere Core app functionality Tiers of components How to measure it: Synth, to ensure it’s working RUM, to see where it’s broken Get the math right
  235. 235. So what should you Some homework. do?
  236. 236. First Meet your analytics team Find out What are the key goals they’re monitoring? Where are visitors coming from? What are the most common entrance and exit pages?
  237. 237. Second Pick the three processes, pages, or functions that matter most to you Landing pages, or part of a conversion funnel
  238. 238. Third Set up monitoring of: Your site from many places (synthetic testing) Your top 3 core business processes (synthetic or RUM) Your important infrastructure tiers (from agents + synthetic, or RUM)
  239. 239. Fourth Wait a week or two To establish a baseline To detect seasonal variance To show others and get buy-in
  240. 240. Fifth While you’re waiting, understand the elements latency and how they affect your performance DNS SSL Network latency Host (server) latency Client page load time
  241. 241. Set a target threshold Now that you have an idea of what “normal” is, set a threshold ... but not just any threshold.
  242. 242. The login page Function will have a total latency Metric of under 4 seconds Target with a cached browser copy User situation from any US branch office Testing point 95% of the time Percentile weekdays, 8AM ET to 6M PST Time window by synth test at 5m intervals Collection type
  243. 243. Apdex score = ( ) ( Satisfied requests + Tolerating requests /2 ) All requests
  244. 244. How Apdex works Frustrated: over 8 seconds Tolerating: 2-8 seconds Satisfied: 0-2 seconds
  245. 245. How Apdex works Frustrated: 5 hits Total requests: 100 Tolerating: 30 hits (65) + (30/2) = 0.80 100 Satisfied: 65 hits
  246. 246. Train your audience Visit key stakeholders and walk them through the report Get them used to the information In the same format At the same time From the same place
  247. 247. Put monitoring into your release cycle Talk to the development team Adding instrumentation Identifying new code functions that need testing Verifying whether optimization worked
  248. 248. Part eight Some tools to check out
  249. 249. Paid
  250. 250. Synthetic Keynote Systems Gomez Webmetrics Alertsite Dotcom Monitor Pingdom ...and many others
  251. 251. RUM Client-side AJAX (Gomez, Coradiant TrueSight Edge) Full Agent-based (Aternity) disclosure: We both worked at Inline (sniffer/tap) Coradiant. Coradiant, Tealeaf, Beatbox(HP), Atomic Labs, Compuware Apdex Server-side (logfile, agent)
  252. 252. Analytics Omniture Webtrends Coremetrics Woopra etc. (lots of specialization)
  253. 253. Open Source
  254. 254. Firebug getfirebug.com
  255. 255. Firebug Also: Webkit inspector, getfirebug.com Google Page Speed
  256. 256. Google Analytics analytics.google.com
  257. 257. webpagetest.org
  258. 258. Monitor.us (Free ain’t pretty, and pretty ain’t free, but it works.) mon.itor.us
  259. 259. AJAX measurement libraries Collecting from visitors: Jiffy (http://code.google.com/p/jiffy-web/) AJAX client sends measurements to Apache collector Other resources ZK-Gazer (http://code.google.com/p/zk-gazer/) http://www.ajaxperformance.com/ (Ryan Breen) http://www.opensourcetesting.org/performance.php
  260. 260. YSlow http://justtalkaboutweb.com/wp-content/uploads/2008/06/yslow.gif http://events.stanford.edu/events/196/19695/souders.jpg
  261. 261. Sites Dashboard Juice analytics’ Dashboard spy blog Insight’s gallery Simple Complexity
  262. 262. Part nine Planning for the future
  263. 263. AJAX
  264. 264. AJAX As for your male and female slaves whom you may have: you may buy male and female slaves from among the nations that are around you. - Leviticus 25:44
  265. 265. http://www.flickr.com/photos/farhannasir/4577508824/ Mobility
  266. 266. http://www.flickr.com/photos/andrewparnell/2738598951/
  267. 267. GET index.html HTTP/1.1 Host: www.stockprice.com Cookie: sessionID=KDF74INED6 Accept: */* <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/ xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <head profile="http://gmpg.org/xfn/11"> <title>Stock price for Apple</title> <script type="text/javascript" src="http:// www.bitcurrent.com/wp-content/themes/ grid_focus_public/js/perftracker.js"></script> <script> <body id=gsr topmargin=3 marginheight=3> AAPL:243.20 </body> </html>
  268. 268. GET index.html Host: www.stockprice.com Cookie: sessionID=KDF74INED6 AAPL:243.20
  269. 269. Web of Web of documents events (circa 1999) (circa 2008) http://www.flickr.com/photos/dnorman/2781572080/ http://www.flickr.com/photos/adamkr/4650637393/
  270. 270. Recap What you need to go and do now.
  271. 271. Metrics must be Relevant: related to a core business assumption Actionable: the basis for a decision or improvement Reproducible: documented and generated cleanly Understandable: easy for stakeholders to grok Accurate: providing the correct view of what happened
  272. 272. Visit your analytics team & read your business model Pick three core business functions to watch Start monitoring One page from many places Key business functions Infrastructure tiers Take a deep breath and establish a baseline Analyze elements of latency while you wait Set target thresholds using a meaningful SLA Calculate a consistent score & train your audience Make it part of the release cycle
  273. 273. Metric Source Target
  274. 274. Metric Source Target Onload time
  275. 275. Metric Source Target Onload From many time places
  276. 276. Metric Source Target Onload From many To the top time places landing page
  277. 277. Metric Source Target Onload From many To the top Uncached time places landing page Cached
  278. 278. Metric Source Target Onload From many To the top Uncached time places landing page Cached Server time
  279. 279. Metric Source Target Onload From many To the top Uncached time places landing page Cached Server From one time place, often
  280. 280. Metric Source Target Onload From many To the top Uncached time places landing page Cached Server From one To your core time place, often business process
  281. 281. Metric Source Target Onload From many To the top Uncached time places landing page Cached Server From one To your core time place, often business process Server time
  282. 282. Metric Source Target Onload From many To the top Uncached time places landing page Cached Server From one To your core time place, often business process Server From one time place, often
  283. 283. Metric Source Target Onload From many To the top Uncached time places landing page Cached Server From one To your core time place, often business process Server From one To each tier time place, often (web, I/O, CPU)
  284. 284. Metric Source Target Onload From many To the top Uncached time places landing page Cached Server From one To your core time place, often business process Server From one To each tier time place, often (web, I/O, CPU) List Criteria Segmentation
  285. 285. Metric Source Target Onload From many To the top Uncached time places landing page Cached Server From one To your core time place, often business process Server From one To each tier time place, often (web, I/O, CPU) List Criteria Segmentation TopN
  286. 286. Metric Source Target Onload From many To the top Uncached time places landing page Cached Server From one To your core time place, often business process Server From one To each tier time place, often (web, I/O, CPU) List Criteria Segmentation Worst TopN pages
  287. 287. Metric Source Target Onload From many To the top Uncached time places landing page Cached Server From one To your core time place, often business process Server From one To each tier time place, often (web, I/O, CPU) List Criteria Segmentation Worst TopN pages by
  288. 288. Metric Source Target Onload From many To the top Uncached time places landing page Cached Server From one To your core time place, often business process Server From one To each tier time place, often (web, I/O, CPU) List Criteria Segmentation Worst TopN pages by Error rate
  289. 289. Metric Source Target Onload From many To the top Uncached time places landing page Cached Server From one To your core time place, often business process Server From one To each tier time place, often (web, I/O, CPU) List Criteria Segmentation Worst Server TopN pages by Error rate latency
  290. 290. Metric Source Target Onload From many To the top Uncached time places landing page Cached Server From one To your core time place, often business process Server From one To each tier time place, often (web, I/O, CPU) List Criteria Segmentation Worst Server Network TopN pages by Error rate latency latency
  291. 291. Got one report? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  292. 292. Got one report? 5,000 Unique page views 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  293. 293. Got one report? 5,000 Unique page views >4s or error 2-4s 0 <2s 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  294. 294. Got one report? 5,000 Unique page views >4s or error 2-4s 0 <2s 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  295. 295. Got one report? 5,000 Unique page views >4s or error 2-4s 0 <2s 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  296. 296. Got one report? 5,000 Unique page views >4s or error 2-4s 0 <2s 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  297. 297. Got one report? 5,000 Unique page views >4s or error 2-4s 0 <2s 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  298. 298. Got one report? 5,000 $10,000 Revenue (total sales) Unique page views >4s or error 2-4s 0 $0 <2s 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  299. 299. Got one report? 5,000 $10,000 Revenue (total sales) Unique page views Conversions >4s or error 2-4s 0 $0 <2s 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  300. 300. Got one report? 5,000 $10,000 Revenue (total sales) Unique page views Conversions >4s or error 2-4s 0 $0 <2s 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  301. 301. Thanks! @seanpower sean@httpd.org @acroll alistair@bitcurrent.com www.watchingwebsites.com (and go buy this.)

×