SlideShare a Scribd company logo
1 of 49
© AKAMAI - EDGE 2017
Tracking the Performance of the Web with HTTP Archive
October 12, 2017
2
Rick Viscomi
@rick_viscomi
HTTP Archive
Paul Calvano
@paulcalvano
Akamai
About Us
3
“
httparchive.org
The HTTP Archive tracks
how the Web is built.
4httparchive.org/interesting.php httparchive.org/trends.php
5
How it Works
● Alexa’s top 500,000 websites
○ Home pages
○ Desktop and emulated mobile
● Powered by WebPageTest
○ Records HAR trace
○ Executes custom metrics
○ Records Lighthouse audits
● httparchive.org
○ Trends and stats
○ Discussion forum
● BigQuery and Cloud Storage
○ Queryable database
○ Raw HARs
6
HTTP Archive Pipeline
134
500K
GCS
BQ
Biweekly
7
discuss.httparchive.orgThe tip of the databerg
Curated stats/trends
Raw data
8
Who uses the
HTTP Archive?
● Scholars
● Community
● Industry
9
goo.gl/kxgzM1HTTP Archive referenced in research papers
. . .
In this article we utilize
the httparchive.org
[9] publicly available
dataset of captured
web performance
metrics
. . .
Desktop and mobile web page
comparison: characteristics,
trends, and implications
IEEE Communications Magazine (
Volume: 52, Issue: 9, September 2014 )
. . .
Recent stats from
httparchive.org show
that the top 300K URLs
in the world need on
average 38(!) TCP
connections to display
the site
. . .
HTTP2 explained
Computer Communication Review 44.3
(2014): 120-128.
. . .
We make extensive use
of the [...] data
available at HTTP
Archive to expose the
characteristics of 3rd
Party assets embedded
into the top 16,000
Alexa webpages
. . .
Are 3rd Parties Slowing Down the
Mobile Web?
Proceedings of the Eighth Wireless of
the Students, by the Students, and for
the Students Workshop. ACM, 2016.
10goo.gl/oJSIjm
The web community uses HTTP Archive to answer questions about the state of the web
goo.gl/PvVHhL
11
goo.gl/YPdLJuRemember when the average page size exceeded Doom?
12
goo.gl/Lnp1ne goo.gl/e8kXq9Industrial tools use HTTP Archive data for calibration
13
Case Study #1
Compression
1. Updating Akamai Gzip Defaults
2. Brotli Compression
14
Last Mile Acceleration (LMA)
● Gzip Compression at the Edge.
● Compression is based on HTTP Content Type
● Default Content Types were not sufficient and usually required
updating…
● When Property Manager was released, we decided to update this...
15bit.ly/2hXNjbEQuerying the HTTP Archive for Compression Stats
SELECT mimeType, COUNT(*) total,
SUM(IF(resp_content_encoding = "gzip",1,0)) gzip,
SUM(IF(resp_content_encoding = "deflate",1,0)) deflate,
SUM(IF(resp_content_encoding IN("gzip", "deflate"),0,1)) NoCompression,
ROUND(
SUM(
IF(resp_content_encoding IN("gzip", "deflate"),1,0)
) / COUNT(*),2) CompressedPercentage
FROM httparchive.runs.2017_09_01_requests
GROUP BY mimeType
HAVING total > 10000
ORDER BY CompressedPercentage DESC
1
2
3
4
5
6
7
8
9
10
11
12
16
Some New LMA Defaults
● text/javascript
● font/ttf
● application/javascript
● text/xml
● application/json
● application/xml
● ...
Many Content-Types that
did not match the original
defaults!
17
Modern Set of Last Mile Acceleration Defaults
18
What About Brotli?
● New compression algorithm developed by Google researchers
● 5% - 25% Reduction over Gzip Compression
● Supported by most browsers
● Let’s extend the previous query to include Brotli compression
19
SELECT mimeType, count(*) total,
SUM(IF(resp_content_encoding = "gzip",1,0)) gzip,
SUM(IF(resp_content_encoding = "br",1,0)) brotli,
SUM(IF(resp_content_encoding = "deflate",1,0)) deflate,
SUM(IF(resp_content_encoding IN("gzip",”deflate",“br”),0,1)) NoCompression,
ROUND(
SUM(
IF(resp_content_encoding IN("gzip", "deflate", "br"),1,0)
) / COUNT(*),2) CompressedPercentage,
ROUND(
SUM(
IF(resp_content_encoding = "br",1,0)
) / COUNT(*),2) BrotliCompressedPercentage
FROM httparchive.runs.2017_09_01_requests
GROUP BY mimeType
HAVING total > 1000
ORDER BY brotli DESC
bit.ly/2fUK6W9Compression Stats - gzip and brotli
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
20
Examining Compression By Content Type - With Brotli
Brotli use has been growing, but where is it most prevalent?
21
Brotli Usage: Mostly JavaScript, CSS and HTML Resources
When we exclude Google and Facebook content, the bulk of Brotli encoded content is JS and CSS
22
Compression Level = Overhead
Data on compression speeds from quixdb.github.io/squash-benchmark/#results
Most byte savings are
obtained by using the
highest compression level.
23
Resource Optimizer: Automated Brotli Compression at the Edge
● Automatically compresses CSS and JS
● Resources are Compressed Offline and Cached
● Brotli Compression Level 11, without the overhead!
+ =
24
Case Study #2
Server Technologies
1. Akamai Varnish Connector
2. Security Vulnerability Research
25
Akamai Varnish Connector
developer.akamai.com/connector-for-varnish/
26
Explore the Data…
● Schema and Preview tabs will show you what data is captured.
● For this example, we are interested in anything that will indicate whether a webpage is using
Varnish.
27
These Columns Contain Some Useful Information
We are interested in firstHTML where cdn=Akamai and some of the other headers indicate Varnish is in use…
28
How Many Akamai Customers Use Varnish at the Origin?
● HTTP Archive data helped determine what proportion of Akamai’s customers were using Varnish at
the origin.
● Provided a list of sites for Akamai Product Management to discuss desirable functionality with
customers.
29
Investigating Security Threats
● HTTP Archive
○ Investigate other sites that contain similar characteristics
■ Server, Via, Url Regex Patterns, Other Headers
○ Export a list of sites that appear vulnerable
○ Cross Reference with Akamai Account Data
■ Notify 24x7 Security Contacts,
■ Help customers proactively protect themselves
● No CVE or Historical Attack Patterns
○ Target attack observed and mitigated (Kona Managed Security)
○ Akamai WAF rules prepared
○ Vendor notified
● Example: 0 Day Vulnerability on an Ecommerce App Server
30
Taking HTTP Archive + Security to the Next Level
● HTTP Archive tracks Lighthouse reports for mobile sites
● As of Lighthouse 2.5.0 (Oct 2017), Lighthouse includes Synk security audits.
31
Case Study #3
Third Party Research
1. How 3rd Parties Influence Render Time?
2. Researching a specific 3rd party
32bit.ly/2xqlorDWhich 3rd Party Content Types Load Before Render Start?
SELECT mimeType,
COUNT(*) num_requests,
SUM(
IF(req.startedDateTime < (pages.startedDateTime + (renderStart/1000) ),1,0)
) BeforeRenderStart,
SUM(
IF(req.startedDateTime < (pages.startedDateTime + (renderStart/1000) ),0,1)
) AfterRenderStart
FROM httparchive.runs.2017_09_01_requests req
JOIN (
SELECT rank, NET.HOST(url) hostname, url, pageid, startedDateTime, renderStart
FROM httparchive.runs.2017_09_01_pages
) pages ON pages.pageid = req.pageid
WHERE NET.HOST(req.url) != pages.hostname AND rank > 0 AND rank < 100000
GROUP BY mimeType
HAVING num_requests > 1000
ORDER BY BeforeRenderStart DESC
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
33
What 3rd Party Content Loads Before RenderStart?
discuss.httparchive.org/t/which-3rd-party-content-loads-before-render-start/1084
34bit.ly/2glVquXWhich 3rd Parties Load Before Render Start?
SELECT NET.HOST(req.url) thirdparty,
mimeType,
COUNT(*) num_requests,
SUM(
IF(req.startedDateTime < (pages.startedDateTime + (renderStart/1000) ),1,0)
) BeforeRenderStart,
SUM(
IF(req.startedDateTime < (pages.startedDateTime + (renderStart/1000) ),0,1)
) AfterRenderStart
FROM httparchive.runs.2017_09_01_requests req
JOIN (
SELECT rank, NET.HOST(url) hostname, url, pageid, startedDateTime, renderStart
FROM httparchive.runs.2017_09_01_pages
) pages ON pages.pageid = req.pageid
WHERE NET.HOST(req.url) != pages.hostname AND rank > 0 AND rank < 100000
GROUP BY thirdparty, mimeType
HAVING num_requests > 100
ORDER BY BeforeRenderStart DESC
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
35
What 3rd Party Domains Loads Before RenderStart?
36bit.ly/2xsz00PWhich Websites Load <3rd Party> Before vs After Render Time?
SELECT rank, site,
SUM(
IF(req.startedDateTime < (pages.startedDateTime + (renderStart/1000) ),1,0)
) BeforeRenderStart,
SUM(
IF(req.startedDateTime < (pages.startedDateTime + (renderStart/1000) ),0,1)
) AfterRenderStart
FROM httparchive.runs.2017_09_01_requests req
INNER JOIN (
SELECT rank, NET.HOST(url) site, pageid, startedDateTime, renderStart
FROM httparchive.runs.2017_09_01_pages
) pages ON pages.pageid = req.pageid
WHERE NET.HOST(req.url) LIKE "%ensighten.com%" AND rank > 0
GROUP BY rank, site
HAVING AfterRenderStart>0
ORDER BY rank ASC
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
37
What 3rd Party Content Loads Before RenderStart?
● This query outputs a summary
containing the following
information:
○ Which sites use the 3rd
party?
○ How many resources are
served by it?
○ How many of them are
loaded before/after the page
renders?
● Results can help sites learn from
each other’s best practices
○ Even across industries!!!
38
Getting Started
1. Google Cloud project
2. BigQuery config
39
bit.ly/httparchive-bigqueryUp and running - Create a Google Cloud Project
● console.cloud.google.com/projectcreate
40
Up and running - Add the HTTP Archive Tables
● bigquery.cloud.google.com
bit.ly/httparchive-bigquery
41
http://bit.ly/httparchive-bigqueryUp and running - Start Exploring!
● Detailed Setup Instructions:
○ bit.ly/httparchive-bigquery
● BigQuery Standard SQL Documentation
○ cloud.google.com/bigquery/docs/reference/standard-sql/
● HTTP Archive Discussion Forum
○ discuss.httparchive.org
42
What’s New
● Lighthouse integration
● Vulnerability detection
● Third party JS libs
43
Coming Soon
● Platform detection
● Curated reports
● Improved data viz
● Video series
44
Wishlist ● Website categorization
● Millions of tests
45
HTTP Archive Sponsors
Contact us at team@httparchive.org if interested
46
Work With Us
bit.ly/ha-slack
Chat
github.com/HTTPArchive
Contribute
discuss.httparchive.org
Collaborate
47
Continuing the Discussion…
discuss.httparchive.org
48
Thanks!
bit.ly/ha-slack
Chat
github.com/HTTPArchive
Contribute
discuss.httparchive.org
Collaborate
@rick_viscomi
@paulcalvano
rick@httparchive.org
pacalvan@akamai.com
© AKAMAI - EDGE 2017

More Related Content

What's hot

MongoDB Scalability Best Practices
MongoDB Scalability Best PracticesMongoDB Scalability Best Practices
MongoDB Scalability Best PracticesJason Terpko
 
InterPlanetary File System (IPFS)
InterPlanetary File System (IPFS)InterPlanetary File System (IPFS)
InterPlanetary File System (IPFS)Gene Leybzon
 
Java 9 Security Enhancements in Practice
Java 9 Security Enhancements in PracticeJava 9 Security Enhancements in Practice
Java 9 Security Enhancements in PracticeMartin Toshev
 
Semantic Technology In Oracle Database 12c
Semantic Technology In Oracle Database 12cSemantic Technology In Oracle Database 12c
Semantic Technology In Oracle Database 12cMartin Toshev
 
MongoDB for Analytics
MongoDB for AnalyticsMongoDB for Analytics
MongoDB for AnalyticsMongoDB
 
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...MongoDB
 
Doing Joins in MongoDB: Best Practices for Using $lookup
Doing Joins in MongoDB: Best Practices for Using $lookupDoing Joins in MongoDB: Best Practices for Using $lookup
Doing Joins in MongoDB: Best Practices for Using $lookupMongoDB
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLMongoDB
 
Using MongoDB and Python
Using MongoDB and PythonUsing MongoDB and Python
Using MongoDB and PythonMike Bright
 
OSMC 2018 | Logging is coming to Grafana by David kaltschmidt
OSMC 2018 | Logging is coming to Grafana by David kaltschmidtOSMC 2018 | Logging is coming to Grafana by David kaltschmidt
OSMC 2018 | Logging is coming to Grafana by David kaltschmidtNETWAYS
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsTim Ysewyn
 
nHibernate Caching
nHibernate CachingnHibernate Caching
nHibernate CachingGuo Albert
 
Back to Basics, webinar 2: La tua prima applicazione MongoDB
Back to Basics, webinar 2: La tua prima applicazione MongoDBBack to Basics, webinar 2: La tua prima applicazione MongoDB
Back to Basics, webinar 2: La tua prima applicazione MongoDBMongoDB
 
Real Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at WishReal Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at WishMongoDB
 
(Fios#02) 2. elk 포렌식 분석
(Fios#02) 2. elk 포렌식 분석(Fios#02) 2. elk 포렌식 분석
(Fios#02) 2. elk 포렌식 분석INSIGHT FORENSIC
 
When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012Jimmy Lai
 
MongoDB World 2016: The Best IoT Analytics with MongoDB
MongoDB World 2016: The Best IoT Analytics with MongoDBMongoDB World 2016: The Best IoT Analytics with MongoDB
MongoDB World 2016: The Best IoT Analytics with MongoDBMongoDB
 

What's hot (20)

MongoDB Scalability Best Practices
MongoDB Scalability Best PracticesMongoDB Scalability Best Practices
MongoDB Scalability Best Practices
 
Jdk 10 sneak peek
Jdk 10 sneak peekJdk 10 sneak peek
Jdk 10 sneak peek
 
InterPlanetary File System (IPFS)
InterPlanetary File System (IPFS)InterPlanetary File System (IPFS)
InterPlanetary File System (IPFS)
 
Java 9 Security Enhancements in Practice
Java 9 Security Enhancements in PracticeJava 9 Security Enhancements in Practice
Java 9 Security Enhancements in Practice
 
Semantic Technology In Oracle Database 12c
Semantic Technology In Oracle Database 12cSemantic Technology In Oracle Database 12c
Semantic Technology In Oracle Database 12c
 
MongoDB for Analytics
MongoDB for AnalyticsMongoDB for Analytics
MongoDB for Analytics
 
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
 
Doing Joins in MongoDB: Best Practices for Using $lookup
Doing Joins in MongoDB: Best Practices for Using $lookupDoing Joins in MongoDB: Best Practices for Using $lookup
Doing Joins in MongoDB: Best Practices for Using $lookup
 
MongoDB and Python
MongoDB and PythonMongoDB and Python
MongoDB and Python
 
Python and MongoDB
Python and MongoDB Python and MongoDB
Python and MongoDB
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
 
Using MongoDB and Python
Using MongoDB and PythonUsing MongoDB and Python
Using MongoDB and Python
 
OSMC 2018 | Logging is coming to Grafana by David kaltschmidt
OSMC 2018 | Logging is coming to Grafana by David kaltschmidtOSMC 2018 | Logging is coming to Grafana by David kaltschmidt
OSMC 2018 | Logging is coming to Grafana by David kaltschmidt
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka Streams
 
nHibernate Caching
nHibernate CachingnHibernate Caching
nHibernate Caching
 
Back to Basics, webinar 2: La tua prima applicazione MongoDB
Back to Basics, webinar 2: La tua prima applicazione MongoDBBack to Basics, webinar 2: La tua prima applicazione MongoDB
Back to Basics, webinar 2: La tua prima applicazione MongoDB
 
Real Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at WishReal Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at Wish
 
(Fios#02) 2. elk 포렌식 분석
(Fios#02) 2. elk 포렌식 분석(Fios#02) 2. elk 포렌식 분석
(Fios#02) 2. elk 포렌식 분석
 
When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012
 
MongoDB World 2016: The Best IoT Analytics with MongoDB
MongoDB World 2016: The Best IoT Analytics with MongoDBMongoDB World 2016: The Best IoT Analytics with MongoDB
MongoDB World 2016: The Best IoT Analytics with MongoDB
 

Similar to Akamai Edge: Tracking the Performance of the Web with HTTP Archive

Fluent 2018: Tracking Performance of the Web with HTTP Archive
Fluent 2018: Tracking Performance of the Web with HTTP ArchiveFluent 2018: Tracking Performance of the Web with HTTP Archive
Fluent 2018: Tracking Performance of the Web with HTTP ArchivePaul Calvano
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performanceDataWorks Summit
 
Traefik on Kubernetes at MySocialApp (CNCF Paris Meetup)
Traefik on Kubernetes at MySocialApp (CNCF Paris Meetup)Traefik on Kubernetes at MySocialApp (CNCF Paris Meetup)
Traefik on Kubernetes at MySocialApp (CNCF Paris Meetup)Pierre Mavro
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSKimmo Kantojärvi
 
Website & Internet + Performance testing
Website & Internet + Performance testingWebsite & Internet + Performance testing
Website & Internet + Performance testingRoman Ananev
 
Monitoring web application response times^lj a hybrid approach for windows
Monitoring web application response times^lj a hybrid approach for windowsMonitoring web application response times^lj a hybrid approach for windows
Monitoring web application response times^lj a hybrid approach for windowsMark Friedman
 
OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoNathaniel Braun
 
ALT-F1.BE : The Accelerator (Google Cloud Platform)
ALT-F1.BE : The Accelerator (Google Cloud Platform)ALT-F1.BE : The Accelerator (Google Cloud Platform)
ALT-F1.BE : The Accelerator (Google Cloud Platform)Abdelkrim Boujraf
 
202107 - Orion introduction - COSCUP
202107 - Orion introduction - COSCUP202107 - Orion introduction - COSCUP
202107 - Orion introduction - COSCUPRonald Hsu
 
Set Up a Million-Core Cluster to Accelerate HPC Workloads (CMP404) - AWS re:I...
Set Up a Million-Core Cluster to Accelerate HPC Workloads (CMP404) - AWS re:I...Set Up a Million-Core Cluster to Accelerate HPC Workloads (CMP404) - AWS re:I...
Set Up a Million-Core Cluster to Accelerate HPC Workloads (CMP404) - AWS re:I...Amazon Web Services
 
Scaling 100PB Data Warehouse in Cloud
Scaling 100PB Data Warehouse in CloudScaling 100PB Data Warehouse in Cloud
Scaling 100PB Data Warehouse in CloudChangshu Liu
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceHBaseCon
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce Argus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce HBaseCon
 
Time series denver an introduction to prometheus
Time series denver   an introduction to prometheusTime series denver   an introduction to prometheus
Time series denver an introduction to prometheusBob Cotton
 

Similar to Akamai Edge: Tracking the Performance of the Web with HTTP Archive (20)

Fluent 2018: Tracking Performance of the Web with HTTP Archive
Fluent 2018: Tracking Performance of the Web with HTTP ArchiveFluent 2018: Tracking Performance of the Web with HTTP Archive
Fluent 2018: Tracking Performance of the Web with HTTP Archive
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performance
 
Sprint 62
Sprint 62Sprint 62
Sprint 62
 
Traefik on Kubernetes at MySocialApp (CNCF Paris Meetup)
Traefik on Kubernetes at MySocialApp (CNCF Paris Meetup)Traefik on Kubernetes at MySocialApp (CNCF Paris Meetup)
Traefik on Kubernetes at MySocialApp (CNCF Paris Meetup)
 
JAMStack
JAMStackJAMStack
JAMStack
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
 
Website & Internet + Performance testing
Website & Internet + Performance testingWebsite & Internet + Performance testing
Website & Internet + Performance testing
 
Monitoring web application response times^lj a hybrid approach for windows
Monitoring web application response times^lj a hybrid approach for windowsMonitoring web application response times^lj a hybrid approach for windows
Monitoring web application response times^lj a hybrid approach for windows
 
OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ Criteo
 
ALT-F1.BE : The Accelerator (Google Cloud Platform)
ALT-F1.BE : The Accelerator (Google Cloud Platform)ALT-F1.BE : The Accelerator (Google Cloud Platform)
ALT-F1.BE : The Accelerator (Google Cloud Platform)
 
202107 - Orion introduction - COSCUP
202107 - Orion introduction - COSCUP202107 - Orion introduction - COSCUP
202107 - Orion introduction - COSCUP
 
Set Up a Million-Core Cluster to Accelerate HPC Workloads (CMP404) - AWS re:I...
Set Up a Million-Core Cluster to Accelerate HPC Workloads (CMP404) - AWS re:I...Set Up a Million-Core Cluster to Accelerate HPC Workloads (CMP404) - AWS re:I...
Set Up a Million-Core Cluster to Accelerate HPC Workloads (CMP404) - AWS re:I...
 
Sprint 68
Sprint 68Sprint 68
Sprint 68
 
Scaling 100PB Data Warehouse in Cloud
Scaling 100PB Data Warehouse in CloudScaling 100PB Data Warehouse in Cloud
Scaling 100PB Data Warehouse in Cloud
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce Argus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
 
Sprint 78
Sprint 78Sprint 78
Sprint 78
 
Sprint 73
Sprint 73Sprint 73
Sprint 73
 
Time series denver an introduction to prometheus
Time series denver   an introduction to prometheusTime series denver   an introduction to prometheus
Time series denver an introduction to prometheus
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 

Recently uploaded

Li-Cycle-Investor-Presentation-February-2021.pdf
Li-Cycle-Investor-Presentation-February-2021.pdfLi-Cycle-Investor-Presentation-February-2021.pdf
Li-Cycle-Investor-Presentation-February-2021.pdfAylinAnarat1
 
Camil Institutional Presentation_Mai24.pdf
Camil Institutional Presentation_Mai24.pdfCamil Institutional Presentation_Mai24.pdf
Camil Institutional Presentation_Mai24.pdfCAMILRI
 
Investor Day 2024 Presentation Sysco 2024
Investor Day 2024 Presentation Sysco 2024Investor Day 2024 Presentation Sysco 2024
Investor Day 2024 Presentation Sysco 2024Sysco_Investors
 
Investor Day 2024 Presentation Sysco 2024
Investor Day 2024 Presentation Sysco 2024Investor Day 2024 Presentation Sysco 2024
Investor Day 2024 Presentation Sysco 2024Sysco_Investors
 
Binance crypto market report Q1 2024 You should read to understand the trend
Binance crypto market report Q1 2024 You should read to understand the trendBinance crypto market report Q1 2024 You should read to understand the trend
Binance crypto market report Q1 2024 You should read to understand the trendsaygkmuaku8
 
BRL - Investor Presentation - May 2024.pdf
BRL - Investor Presentation - May 2024.pdfBRL - Investor Presentation - May 2024.pdf
BRL - Investor Presentation - May 2024.pdfTyler Hosey
 
Pradhan Mantri Awas Yojana (PMAY) - Urban
Pradhan Mantri Awas Yojana (PMAY) - UrbanPradhan Mantri Awas Yojana (PMAY) - Urban
Pradhan Mantri Awas Yojana (PMAY) - Urbanmaheswarip13
 
Collective Mining | Corporate Presentation - May 2024
Collective Mining | Corporate Presentation - May 2024Collective Mining | Corporate Presentation - May 2024
Collective Mining | Corporate Presentation - May 2024CollectiveMining1
 

Recently uploaded (8)

Li-Cycle-Investor-Presentation-February-2021.pdf
Li-Cycle-Investor-Presentation-February-2021.pdfLi-Cycle-Investor-Presentation-February-2021.pdf
Li-Cycle-Investor-Presentation-February-2021.pdf
 
Camil Institutional Presentation_Mai24.pdf
Camil Institutional Presentation_Mai24.pdfCamil Institutional Presentation_Mai24.pdf
Camil Institutional Presentation_Mai24.pdf
 
Investor Day 2024 Presentation Sysco 2024
Investor Day 2024 Presentation Sysco 2024Investor Day 2024 Presentation Sysco 2024
Investor Day 2024 Presentation Sysco 2024
 
Investor Day 2024 Presentation Sysco 2024
Investor Day 2024 Presentation Sysco 2024Investor Day 2024 Presentation Sysco 2024
Investor Day 2024 Presentation Sysco 2024
 
Binance crypto market report Q1 2024 You should read to understand the trend
Binance crypto market report Q1 2024 You should read to understand the trendBinance crypto market report Q1 2024 You should read to understand the trend
Binance crypto market report Q1 2024 You should read to understand the trend
 
BRL - Investor Presentation - May 2024.pdf
BRL - Investor Presentation - May 2024.pdfBRL - Investor Presentation - May 2024.pdf
BRL - Investor Presentation - May 2024.pdf
 
Pradhan Mantri Awas Yojana (PMAY) - Urban
Pradhan Mantri Awas Yojana (PMAY) - UrbanPradhan Mantri Awas Yojana (PMAY) - Urban
Pradhan Mantri Awas Yojana (PMAY) - Urban
 
Collective Mining | Corporate Presentation - May 2024
Collective Mining | Corporate Presentation - May 2024Collective Mining | Corporate Presentation - May 2024
Collective Mining | Corporate Presentation - May 2024
 

Akamai Edge: Tracking the Performance of the Web with HTTP Archive

  • 1. © AKAMAI - EDGE 2017 Tracking the Performance of the Web with HTTP Archive October 12, 2017
  • 2. 2 Rick Viscomi @rick_viscomi HTTP Archive Paul Calvano @paulcalvano Akamai About Us
  • 3. 3 “ httparchive.org The HTTP Archive tracks how the Web is built.
  • 5. 5 How it Works ● Alexa’s top 500,000 websites ○ Home pages ○ Desktop and emulated mobile ● Powered by WebPageTest ○ Records HAR trace ○ Executes custom metrics ○ Records Lighthouse audits ● httparchive.org ○ Trends and stats ○ Discussion forum ● BigQuery and Cloud Storage ○ Queryable database ○ Raw HARs
  • 7. 7 discuss.httparchive.orgThe tip of the databerg Curated stats/trends Raw data
  • 8. 8 Who uses the HTTP Archive? ● Scholars ● Community ● Industry
  • 9. 9 goo.gl/kxgzM1HTTP Archive referenced in research papers . . . In this article we utilize the httparchive.org [9] publicly available dataset of captured web performance metrics . . . Desktop and mobile web page comparison: characteristics, trends, and implications IEEE Communications Magazine ( Volume: 52, Issue: 9, September 2014 ) . . . Recent stats from httparchive.org show that the top 300K URLs in the world need on average 38(!) TCP connections to display the site . . . HTTP2 explained Computer Communication Review 44.3 (2014): 120-128. . . . We make extensive use of the [...] data available at HTTP Archive to expose the characteristics of 3rd Party assets embedded into the top 16,000 Alexa webpages . . . Are 3rd Parties Slowing Down the Mobile Web? Proceedings of the Eighth Wireless of the Students, by the Students, and for the Students Workshop. ACM, 2016.
  • 10. 10goo.gl/oJSIjm The web community uses HTTP Archive to answer questions about the state of the web goo.gl/PvVHhL
  • 11. 11 goo.gl/YPdLJuRemember when the average page size exceeded Doom?
  • 12. 12 goo.gl/Lnp1ne goo.gl/e8kXq9Industrial tools use HTTP Archive data for calibration
  • 13. 13 Case Study #1 Compression 1. Updating Akamai Gzip Defaults 2. Brotli Compression
  • 14. 14 Last Mile Acceleration (LMA) ● Gzip Compression at the Edge. ● Compression is based on HTTP Content Type ● Default Content Types were not sufficient and usually required updating… ● When Property Manager was released, we decided to update this...
  • 15. 15bit.ly/2hXNjbEQuerying the HTTP Archive for Compression Stats SELECT mimeType, COUNT(*) total, SUM(IF(resp_content_encoding = "gzip",1,0)) gzip, SUM(IF(resp_content_encoding = "deflate",1,0)) deflate, SUM(IF(resp_content_encoding IN("gzip", "deflate"),0,1)) NoCompression, ROUND( SUM( IF(resp_content_encoding IN("gzip", "deflate"),1,0) ) / COUNT(*),2) CompressedPercentage FROM httparchive.runs.2017_09_01_requests GROUP BY mimeType HAVING total > 10000 ORDER BY CompressedPercentage DESC 1 2 3 4 5 6 7 8 9 10 11 12
  • 16. 16 Some New LMA Defaults ● text/javascript ● font/ttf ● application/javascript ● text/xml ● application/json ● application/xml ● ... Many Content-Types that did not match the original defaults!
  • 17. 17 Modern Set of Last Mile Acceleration Defaults
  • 18. 18 What About Brotli? ● New compression algorithm developed by Google researchers ● 5% - 25% Reduction over Gzip Compression ● Supported by most browsers ● Let’s extend the previous query to include Brotli compression
  • 19. 19 SELECT mimeType, count(*) total, SUM(IF(resp_content_encoding = "gzip",1,0)) gzip, SUM(IF(resp_content_encoding = "br",1,0)) brotli, SUM(IF(resp_content_encoding = "deflate",1,0)) deflate, SUM(IF(resp_content_encoding IN("gzip",”deflate",“br”),0,1)) NoCompression, ROUND( SUM( IF(resp_content_encoding IN("gzip", "deflate", "br"),1,0) ) / COUNT(*),2) CompressedPercentage, ROUND( SUM( IF(resp_content_encoding = "br",1,0) ) / COUNT(*),2) BrotliCompressedPercentage FROM httparchive.runs.2017_09_01_requests GROUP BY mimeType HAVING total > 1000 ORDER BY brotli DESC bit.ly/2fUK6W9Compression Stats - gzip and brotli 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
  • 20. 20 Examining Compression By Content Type - With Brotli Brotli use has been growing, but where is it most prevalent?
  • 21. 21 Brotli Usage: Mostly JavaScript, CSS and HTML Resources When we exclude Google and Facebook content, the bulk of Brotli encoded content is JS and CSS
  • 22. 22 Compression Level = Overhead Data on compression speeds from quixdb.github.io/squash-benchmark/#results Most byte savings are obtained by using the highest compression level.
  • 23. 23 Resource Optimizer: Automated Brotli Compression at the Edge ● Automatically compresses CSS and JS ● Resources are Compressed Offline and Cached ● Brotli Compression Level 11, without the overhead! + =
  • 24. 24 Case Study #2 Server Technologies 1. Akamai Varnish Connector 2. Security Vulnerability Research
  • 26. 26 Explore the Data… ● Schema and Preview tabs will show you what data is captured. ● For this example, we are interested in anything that will indicate whether a webpage is using Varnish.
  • 27. 27 These Columns Contain Some Useful Information We are interested in firstHTML where cdn=Akamai and some of the other headers indicate Varnish is in use…
  • 28. 28 How Many Akamai Customers Use Varnish at the Origin? ● HTTP Archive data helped determine what proportion of Akamai’s customers were using Varnish at the origin. ● Provided a list of sites for Akamai Product Management to discuss desirable functionality with customers.
  • 29. 29 Investigating Security Threats ● HTTP Archive ○ Investigate other sites that contain similar characteristics ■ Server, Via, Url Regex Patterns, Other Headers ○ Export a list of sites that appear vulnerable ○ Cross Reference with Akamai Account Data ■ Notify 24x7 Security Contacts, ■ Help customers proactively protect themselves ● No CVE or Historical Attack Patterns ○ Target attack observed and mitigated (Kona Managed Security) ○ Akamai WAF rules prepared ○ Vendor notified ● Example: 0 Day Vulnerability on an Ecommerce App Server
  • 30. 30 Taking HTTP Archive + Security to the Next Level ● HTTP Archive tracks Lighthouse reports for mobile sites ● As of Lighthouse 2.5.0 (Oct 2017), Lighthouse includes Synk security audits.
  • 31. 31 Case Study #3 Third Party Research 1. How 3rd Parties Influence Render Time? 2. Researching a specific 3rd party
  • 32. 32bit.ly/2xqlorDWhich 3rd Party Content Types Load Before Render Start? SELECT mimeType, COUNT(*) num_requests, SUM( IF(req.startedDateTime < (pages.startedDateTime + (renderStart/1000) ),1,0) ) BeforeRenderStart, SUM( IF(req.startedDateTime < (pages.startedDateTime + (renderStart/1000) ),0,1) ) AfterRenderStart FROM httparchive.runs.2017_09_01_requests req JOIN ( SELECT rank, NET.HOST(url) hostname, url, pageid, startedDateTime, renderStart FROM httparchive.runs.2017_09_01_pages ) pages ON pages.pageid = req.pageid WHERE NET.HOST(req.url) != pages.hostname AND rank > 0 AND rank < 100000 GROUP BY mimeType HAVING num_requests > 1000 ORDER BY BeforeRenderStart DESC 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
  • 33. 33 What 3rd Party Content Loads Before RenderStart? discuss.httparchive.org/t/which-3rd-party-content-loads-before-render-start/1084
  • 34. 34bit.ly/2glVquXWhich 3rd Parties Load Before Render Start? SELECT NET.HOST(req.url) thirdparty, mimeType, COUNT(*) num_requests, SUM( IF(req.startedDateTime < (pages.startedDateTime + (renderStart/1000) ),1,0) ) BeforeRenderStart, SUM( IF(req.startedDateTime < (pages.startedDateTime + (renderStart/1000) ),0,1) ) AfterRenderStart FROM httparchive.runs.2017_09_01_requests req JOIN ( SELECT rank, NET.HOST(url) hostname, url, pageid, startedDateTime, renderStart FROM httparchive.runs.2017_09_01_pages ) pages ON pages.pageid = req.pageid WHERE NET.HOST(req.url) != pages.hostname AND rank > 0 AND rank < 100000 GROUP BY thirdparty, mimeType HAVING num_requests > 100 ORDER BY BeforeRenderStart DESC 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
  • 35. 35 What 3rd Party Domains Loads Before RenderStart?
  • 36. 36bit.ly/2xsz00PWhich Websites Load <3rd Party> Before vs After Render Time? SELECT rank, site, SUM( IF(req.startedDateTime < (pages.startedDateTime + (renderStart/1000) ),1,0) ) BeforeRenderStart, SUM( IF(req.startedDateTime < (pages.startedDateTime + (renderStart/1000) ),0,1) ) AfterRenderStart FROM httparchive.runs.2017_09_01_requests req INNER JOIN ( SELECT rank, NET.HOST(url) site, pageid, startedDateTime, renderStart FROM httparchive.runs.2017_09_01_pages ) pages ON pages.pageid = req.pageid WHERE NET.HOST(req.url) LIKE "%ensighten.com%" AND rank > 0 GROUP BY rank, site HAVING AfterRenderStart>0 ORDER BY rank ASC 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  • 37. 37 What 3rd Party Content Loads Before RenderStart? ● This query outputs a summary containing the following information: ○ Which sites use the 3rd party? ○ How many resources are served by it? ○ How many of them are loaded before/after the page renders? ● Results can help sites learn from each other’s best practices ○ Even across industries!!!
  • 38. 38 Getting Started 1. Google Cloud project 2. BigQuery config
  • 39. 39 bit.ly/httparchive-bigqueryUp and running - Create a Google Cloud Project ● console.cloud.google.com/projectcreate
  • 40. 40 Up and running - Add the HTTP Archive Tables ● bigquery.cloud.google.com bit.ly/httparchive-bigquery
  • 41. 41 http://bit.ly/httparchive-bigqueryUp and running - Start Exploring! ● Detailed Setup Instructions: ○ bit.ly/httparchive-bigquery ● BigQuery Standard SQL Documentation ○ cloud.google.com/bigquery/docs/reference/standard-sql/ ● HTTP Archive Discussion Forum ○ discuss.httparchive.org
  • 42. 42 What’s New ● Lighthouse integration ● Vulnerability detection ● Third party JS libs
  • 43. 43 Coming Soon ● Platform detection ● Curated reports ● Improved data viz ● Video series
  • 44. 44 Wishlist ● Website categorization ● Millions of tests
  • 45. 45 HTTP Archive Sponsors Contact us at team@httparchive.org if interested
  • 49. © AKAMAI - EDGE 2017