Have you ever thought about how your site’s performance compares to the web as a whole? Or maybe you’re curious how popular a particular web feature is. How much is too much JavaScript? The HTTP Archive has been keeping track of how the web is built since 2010. It enables you to find answers to questions about the state of the web past and present. In this talk we’ll explore how the HTTP Archive works, some of the ways people are using this dataset, and sneak a peek at things to come.
Rick Viscomi (@rick_viscomi) is an engineer with Google's developer relations team, focusing on web transparency and maintaining the HTTP Archive. In a past life Rick helped make YouTube fast and co-authored the O'Reilly book "Using WebPageTest".
2. 22
Rick Viscomi
● Maintainer, HTTP Archive
● Developer Programs Engineer, Google
● Former Web Dev, YouTube
● Co-author, Using WebPageTest@rick_viscomi
@HTTPArchive
rick@httparchive.org
8. 8
How it Works
● Alexa’s top 500,000 websites
○ Home pages
○ Desktop and emulated mobile
● Powered by WebPageTest
○ Records HAR trace
○ Executes custom metrics
○ Records Lighthouse audits
● httparchive.org
○ Trends and stats
○ Discussion forum
● BigQuery and Cloud Storage
○ Queryable database
○ Raw HARs
12. 12
goo.gl/kxgzM1HTTP Archive referenced in research papers
. . .
In this article we utilize
the httparchive.org [9]
publicly available
dataset of captured
web performance
metrics
. . .
Desktop and mobile web page
comparison: characteristics,
trends, and implications
IEEE Communications Magazine (
Volume: 52, Issue: 9, September 2014 )
. . .
Recent stats from
httparchive.org show
that the top 300K URLs
in the world need on
average 38(!) TCP
connections to display
the site
. . .
HTTP2 explained
Computer Communication Review 44.3
(2014): 120-128.
. . .
We make extensive use
of the [...] data
available at HTTP
Archive to expose the
characteristics of 3rd
Party assets embedded
into the top 16,000
Alexa webpages
. . .
Are 3rd Parties Slowing Down the
Mobile Web?
Proceedings of the Eighth Wireless of the
Students, by the Students, and for the
Students Workshop. ACM, 2016.
19. 19
goo.gl/3NJnWXQuerying the average number of bytes per page
SELECT
INTEGER(AVG(bytesTotal)/1024)
FROM
[httparchive:runs.2017_07_01_pages]
1
2
3
4
3034 KB
20. 20
goo.gl/nD3whqQuerying the median number of bytes per page
SELECT
INTEGER(NTH(501, QUANTILES(bytesTotal, 1001))/1024)
FROM
[httparchive:runs.2017_07_01_pages]
1
2
3
4
1402 KB
21. 21
Case Study #2
Popular JavaScript libraries
1. What are the top libraries?
2. How fast are they?
22. 22
goo.gl/xJztDbFinding the top 10 JS libraries detected by HTTP Archive
SELECT
APPROX_TOP_COUNT(lib.name, 10)
FROM
`httparchive.scratchspace.2017_07_01_js_libs`
1
2
3
4
27. 27goo.gl/Wo9VFYJoining libraries with their respective performance data
SELECT
RANK() OVER(ORDER BY COUNT(0) DESC) AS row,
lib.name AS lib_name,
COUNT(0) AS volume,
APPROX_QUANTILES(page.renderStart, 1000)[OFFSET(500)] AS renderStart,
APPROX_QUANTILES(page.bytesJs, 1000)[OFFSET(500)] AS bytesJs
FROM (
SELECT url, lib.name AS name
FROM `httparchive.scratchspace.2017_07_01_js_libs`) AS lib
JOIN (
SELECT url, renderStart, bytesJs
FROM `httparchive.runs.2017_07_01_pages`) AS page
ON
lib.url = page.url
GROUP BY
lib_name
ORDER BY
volume DESC
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
28. 28
row lib_name volume renderStart bytesJs
1 jQuery 395652 2300 273931
2 jQuery UI 104406 2700 354205
3 Modernizr 75964 2500 341983
4 Bootstrap 65775 2364 263412
5 yepnope 53493 2538 336586
6 FlexSlider 39389 2600 360767
7 SWFObject 20616 2391 347912
8 Underscore 19835 2533 407601
9 Google Maps 16788 2900 471497
10 Moment.js 15671 2634 425747
Morbid curiosity
How are JS library popularity and perf metrics correlated?
1. Run in BigQuery
2. Export data to
Google Sheets
3. =CORREL(A2:A, D2:D)
formula to get
correlation coefficient
goo.gl/ovJwT2
29. 29
Morbid curiosity
How are JS library popularity and perf metrics correlated?
row-render row-bytes volume-render volume-bytes
0.093 0.189 -0.055 -0.108
WEAK SAUCE
goo.gl/ovJwT2
30. 30
1. What % are invalid?
2. Why are they invalid?
3. How can they be fixed?
Case Study #3
The lang attribute
31. 31goo.gl/VBJpYEQuerying ~500K Lighthouse audits with HTTP Archive on BigQuery
SELECT
JSON_EXTRACT_SCALAR(
report, "$.audits.valid-lang.score") AS valid,
COUNT(0) AS volume
FROM
[httparchive:har.latest_lighthouse_mobile]
WHERE
report IS NOT NULL
GROUP BY
valid
HAVING
valid IS NOT NULL
ORDER BY
valid
1
2
3
4
5
6
7
8
9
10
11
12
13
14
33. 33Extracting invalid lang attribute values
SELECT
REGEXP_EXTRACT_ALL(
LOWER(
JSON_EXTRACT(
report,
"$.audits.valid-lang.details.items"
)
),
r'blang="([^"]*)"'
) AS langs
FROM
`httparchive.har.2017_06_15_android_lighthouse`
1
2
3
4
5
6
7
8
9
10
11
12
34. 34Counting distinct invalid lang attribute values
SELECT
COUNT(0) AS volume,
lang
FROM (
SELECT
REGEXP_EXTRACT_ALL(LOWER(JSON_EXTRACT(report,
"$.audits.valid-lang.details.items")),
r'blang="([^"]*)"') AS langs
FROM
`httparchive.har.2017_06_15_android_lighthouse`)
CROSS JOIN
UNNEST(langs) AS lang
GROUP BY
lang
ORDER BY
volume DESC
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
goo.gl/2ZfMyr