1. Why measure?
boomerang
data data data
Measuring the web with boomerang
Philip Tellis / philip@bluesmoon.info
Boston Performance Meetup / 2010-09-16
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
2. Why measure?
boomerang
data data data
$ finger philip
Philip Tellis
philip@bluesmoon.info
@bluesmoon
yahoo
geek
http://bluesmoon.info/
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
3. Why measure? The slow web
boomerang Measurements
data data data Measuring with javascript
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
4. Why measure? The slow web
boomerang Measurements
data data data Measuring with javascript
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
5. Why measure? The slow web
boomerang Measurements
data data data Measuring with javascript
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
6. Why measure? The slow web
boomerang Measurements
data data data Measuring with javascript
Less than 20% of page load time is something we can measure
and fix during development
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
7. Why measure? The slow web
boomerang Measurements
data data data Measuring with javascript
It’s what we can’t control that bites us
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
8. Why measure? The slow web
boomerang Measurements
data data data Measuring with javascript
Too many variations
browsers
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
9. Why measure? The slow web
boomerang Measurements
data data data Measuring with javascript
Too many variations
plugins
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
10. Why measure? The slow web
boomerang Measurements
data data data Measuring with javascript
Too many variations
OSes
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
11. Why measure? The slow web
boomerang Measurements
data data data Measuring with javascript
Too many variations
viruses
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
12. Why measure? The slow web
boomerang Measurements
data data data Measuring with javascript
Too many variations
antiviruses
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
13. Why measure? The slow web
boomerang Measurements
data data data Measuring with javascript
Too many variations
microwaves
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
14. Why measure? The slow web
boomerang Measurements
data data data Measuring with javascript
Too many variations
baby monitors
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
15. Why measure? The slow web
boomerang Measurements
data data data Measuring with javascript
Too many variations
naughty neighbours
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
16. Why measure? The slow web
boomerang Measurements
data data data Measuring with javascript
Too many variations
file shares
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
17. Why measure? The slow web
boomerang Measurements
data data data Measuring with javascript
Too many variations
governments
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
18. Why measure? The slow web
boomerang Measurements
data data data Measuring with javascript
Too many variations
rodents
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
19. Why measure? The slow web
boomerang Measurements
data data data Measuring with javascript
Too many variations
Try simulating all that in the lab!
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
20. Why measure? The slow web
boomerang Measurements
data data data Measuring with javascript
We need to measure real end-user performance
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
21. Why measure? The slow web
boomerang Measurements
data data data Measuring with javascript
We need to measure real end-user performance from the real
end-user’s box
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
22. Why measure? The slow web
boomerang Measurements
data data data Measuring with javascript
Ask the user?
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
23. Why measure? The slow web
boomerang Measurements
data data data Measuring with javascript
While this might work, it isn’t necessarily representative
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
24. Why measure? The slow web
boomerang Measurements
data data data Measuring with javascript
Why not just use javascript in each page that you want to
measure?
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
25. Why measure? The slow web
boomerang Measurements
data data data Measuring with javascript
But javascript can’t measure everything... we get as close as
we can
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
26. Why measure? The slow web
boomerang Measurements
data data data Measuring with javascript
It also adds page weight
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
27. Why measure? The slow web
boomerang Measurements
data data data Measuring with javascript
However, it is available everywhere and gives us flexibility
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
28. Why measure? What?
boomerang How does it work?
data data data Accuracy
boomerang is...
A piece of javascript that you add to your web page where it
measures and beacons back to you, the end user’s perceived
performance of your page
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
29. Why measure? What?
boomerang How does it work?
data data data Accuracy
How?
<script src="boomerang.js" type="text/javascript">
</script>
<script type="text/javascript">
BOOMR.init({
user_ip: "<user’s ip address>",
beacon_url: "http://yoursite.com/beacon.php"
});
</script>
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
30. Why measure? What?
boomerang How does it work?
data data data Accuracy
What does it do?
About once a week, measures user’s bandwidth and
latency to your server
On (almost) every request, measures the time it took to
load the current page
Beacons these results back to your server
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
31. Why measure? What?
boomerang How does it work?
data data data Accuracy
How does it do it?
Let’s take that one at a time
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
32. Why measure? What?
boomerang How does it work?
data data data Accuracy
Measuring latency
Download a 32 byte gif 10 times in sequence
Measure the time to download each
Discard the first measurement because it’s overpriced
Calculate the arithmetic mean, standard deviation and
margin of error of the rest
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
33. Why measure? What?
boomerang How does it work?
data data data Accuracy
Wait, did you say overpriced?
The first image might require a DNS lookup and TCP
handshake
Slow start is not an issue since 32 bytes fits in 1 packet
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
34. Why measure? What?
boomerang How does it work?
data data data Accuracy
Measuring bandwidth
After the latency test is done, we download progressively
larger images
Stop at the first image that times out
Redownload that image a few more times
Calculate the median, standard deviation and margin of
error of the largest images
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
35. Why measure? What?
boomerang How does it work?
data data data Accuracy
Measuring latency before bandwidth helps here
Those 10 latency images do a lot to widen the TCP
window size
The bandwidth images make much better use of bandwidth
The image we end with uses the most bandwidth
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
36. Why measure? What?
boomerang How does it work?
data data data Accuracy
How do we measure page load time?
In the onbeforeunload event, measure the time and
store it in a cookie
In the onload event, check the cookie, and measure the
difference with the current time
We also make sure that the page that set the cookie is the
referrer of the current page
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
37. Why measure? What?
boomerang How does it work?
data data data Accuracy
What? Two pages?
Yes, this needs two pages and cookies. If those aren’t
supported, we try to use the WebTiming API.
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
38. Why measure? What?
boomerang How does it work?
data data data Accuracy
How accurate is it?
Latency measurements are very accurate (±1%)
Bandwidth is to an order of magnitude. For bad
connections can be ±30%
Page load time sometimes has outliers, you need
post-filtering
The margin of error tells you how good your data is
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
39. Why measure? Filtering
boomerang Grouping
data data data Data
What do we do with the data?
Sanity checking to:
Remove fake data
Remove abusive data
Maybe just rate limiting
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
40. Why measure? Filtering
boomerang Grouping
data data data Data
What do we do with the data?
Statistical analysis to:
Remove outliers
Aggregate based on bandwidth blocks
Measure trends over time and correlate them with code
changes
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
41. Why measure? Filtering
boomerang Grouping
data data data Data
Bandwidth blocks
0-100 kbps
100-300 kbps
300-2000 kbps
2-6 Mbps
6+ Mbps
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
42. Why measure? Filtering
boomerang Grouping
data data data Data
Bandwidth blocks
Group page load times based on bandwidth block
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
43. Why measure? Filtering
boomerang Grouping
data data data Data
Bandwidth blocks
Ref: Analysing Bandwidth & Latency – YUI Blog
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
44. Why measure? Filtering
boomerang Grouping
data data data Data
Bandwidth blocks
Data points from some countries may require narrower bands
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
45. Why measure? Filtering
boomerang Grouping
data data data Data
Geographic data
Looking at latency from different geographic locations can tell
you where to put your next CDN
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
46. Why measure? Filtering
boomerang Grouping
data data data Data
ISPs
Grouping data by ISP can tell you who’s behaving badly
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
47. Why measure? Filtering
boomerang Grouping
data data data Data
Storing the data
We log all beacon requests to apache’s log access file
Low traffic sites could write directly to a DB
Others have suggested using CouchDB as the beacon
server
Daily summaries can be sent across to ShowSlow
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
48. Why measure? Filtering
boomerang Grouping
data data data Data
More data
Write plugins to get more performance data
We already have a DNS plugin
I’m thinking of an IPv6 v/s IPv4 plugin
What about a full WebTiming plugin?
Can we measure connection setup time?
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
49. Why measure? Filtering
boomerang Grouping
data data data Data
You decide
Once you have the data, you can do anything with it
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
50. Why measure? Filtering
boomerang Grouping
data data data Data
Thank you
http://github.com/yahoo/boomerang
http://yahoo.github.com/boomerang/doc/
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
51. Why measure? Filtering
boomerang Grouping
data data data Data
Photo credits
flickr.com/photos/21233184@N02/4389412851
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
52. Why measure? Filtering
boomerang Grouping
data data data Data
Contact me
Philip Tellis
yahoo
geek
@bluesmoon
http://bluesmoon.info/
slideshare.net/bluesmoon
philip@bluesmoon.info
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang
53. Why measure? Filtering
boomerang Grouping
data data data Data
References
github.com/yahoo/boomerang
More bandwidth doesn’t matter (much) – Mike Belshe
Analysing Bandwidth & Latency – YUI Blog
It’s the latency, stupid – Stuart Cheshire
The statistics of web performance
Boston Performance Meetup / 2010-09-16 Measuring the web with boomerang