SlideShare a Scribd company logo

Fluent 2018: Tracking Performance of the Web with HTTP Archive

Have you ever thought about how your site’s performance compares to the web as a whole? Or maybe you’re curious how popular a particular web feature is. How much is too much JavaScript? The HTTP Archive has been keeping track of how the web is built since 2010. It enables you to find answers to questions about the state of the web past and present. Paul Calvano explores how the HTTP Archive works, how people are using this dataset, and some ways that Akamai has leveraged data within the HTTP Archive to help its customers.

1 of 52
Download to read offline
©2016 AKAMAI | FASTER FORWARDTM
Tracking Performance of the Web with HTTP Archive
6/13/2018
Paul Calvano
Senior Web Performance Architect
@paulcalvano
pacalvan@akamai.com
2
Paul Calvano
@paulcalvano
Akamai
About Me
● Web Performance Architect @ Akamai
● HTTP Archive / BigQuery Addict :)
● Working on #WebPerf since 2000
● https://paulcalvano.com
● @paulcalvano on Twitter
3
“
httparchive.org
4https://www.igvita.com/2013/06/20/http-archive-bigquery-web-performance-answers/
5
How it Works
● Alexa’s top 500,000 websites
○ Home pages
○ Desktop and emulated mobile
○ Increasing to 1,000,000 soon!
● Powered by WebPageTest
○ Records HAR trace
○ Executes custom metrics
○ Records Lighthouse audits
● httparchive.org
○ Trends and stats
○ Discussion forum
● BigQuery and Cloud Storage
○ Queryable database
○ Raw HARs
6
HTTP Archive Pipeline
230
500K
GCS
BQ
Biweekly

Recommended

Real User Measurement Insights, London WebPerf 2018-Nov-06
Real User Measurement Insights, London WebPerf 2018-Nov-06Real User Measurement Insights, London WebPerf 2018-Nov-06
Real User Measurement Insights, London WebPerf 2018-Nov-06Paul Calvano
 
NYC WebPerf Meetup Feb 2020 - Measuring the Adoption of Web Performance Techn...
NYC WebPerf Meetup Feb 2020 - Measuring the Adoption of Web Performance Techn...NYC WebPerf Meetup Feb 2020 - Measuring the Adoption of Web Performance Techn...
NYC WebPerf Meetup Feb 2020 - Measuring the Adoption of Web Performance Techn...Paul Calvano
 
Common Traits of High Performing Websites, BairesWeb - Argentina
Common Traits of High Performing Websites,  BairesWeb  - ArgentinaCommon Traits of High Performing Websites,  BairesWeb  - Argentina
Common Traits of High Performing Websites, BairesWeb - ArgentinaPaul Calvano
 
Common Traits of High Performing Websites, WebPerfDays Amsterdam 07-Nov-2018
Common Traits of High Performing Websites, WebPerfDays Amsterdam 07-Nov-2018Common Traits of High Performing Websites, WebPerfDays Amsterdam 07-Nov-2018
Common Traits of High Performing Websites, WebPerfDays Amsterdam 07-Nov-2018Paul Calvano
 
News From the Front Lines - an update on Front-End Tech
News From the Front Lines - an update on Front-End TechNews From the Front Lines - an update on Front-End Tech
News From the Front Lines - an update on Front-End TechKevin Bruce
 
Comparing JVM Web Frameworks - TSSJS 2011
Comparing JVM Web Frameworks - TSSJS 2011Comparing JVM Web Frameworks - TSSJS 2011
Comparing JVM Web Frameworks - TSSJS 2011Matt Raible
 
Comparing JVM Web Frameworks - Spring I/O 2012
Comparing JVM Web Frameworks - Spring I/O 2012Comparing JVM Web Frameworks - Spring I/O 2012
Comparing JVM Web Frameworks - Spring I/O 2012Matt Raible
 

More Related Content

What's hot

Ingress? That’s So 2020! Introducing the Kubernetes Gateway API
Ingress? That’s So 2020! Introducing the Kubernetes Gateway APIIngress? That’s So 2020! Introducing the Kubernetes Gateway API
Ingress? That’s So 2020! Introducing the Kubernetes Gateway APIVMware Tanzu
 
Creating macOS Build Infrastructure in the Cloud
Creating macOS Build Infrastructure in the CloudCreating macOS Build Infrastructure in the Cloud
Creating macOS Build Infrastructure in the CloudMacStadium
 
Enterprise git - the hard bits
Enterprise git -  the hard bitsEnterprise git -  the hard bits
Enterprise git - the hard bitsMatthew Barr
 
Full-Stack Development with Spring Boot and VueJS
Full-Stack Development with Spring Boot and VueJSFull-Stack Development with Spring Boot and VueJS
Full-Stack Development with Spring Boot and VueJSVMware Tanzu
 
What’s new in grails framework 5?
What’s new in grails framework 5?What’s new in grails framework 5?
What’s new in grails framework 5?Puneet Behl
 
How to convert your Full Trust Solutions to the SharePoint Framework (SPFx)
How to convert your Full Trust Solutions to the SharePoint Framework (SPFx)How to convert your Full Trust Solutions to the SharePoint Framework (SPFx)
How to convert your Full Trust Solutions to the SharePoint Framework (SPFx)Brian Culver
 
Continuous Code Quality with the sonar ecosystem
Continuous Code Quality with the sonar ecosystemContinuous Code Quality with the sonar ecosystem
Continuous Code Quality with the sonar ecosystemRoman Pickl
 
Graph ql subscriptions on the jvm
Graph ql subscriptions on the jvmGraph ql subscriptions on the jvm
Graph ql subscriptions on the jvmGerard Klijs
 
Rendering: Or why your perfectly optimized content doesn't rank
Rendering: Or why your perfectly optimized content doesn't rankRendering: Or why your perfectly optimized content doesn't rank
Rendering: Or why your perfectly optimized content doesn't rankWeLoveSEO
 
Secrets of Successful Digital Transformers
Secrets of Successful Digital TransformersSecrets of Successful Digital Transformers
Secrets of Successful Digital TransformersVMware Tanzu
 
DOES SFO 2016 - Matthew Barr - Enterprise Git - the hard bits
DOES SFO 2016 - Matthew Barr - Enterprise Git - the hard bits DOES SFO 2016 - Matthew Barr - Enterprise Git - the hard bits
DOES SFO 2016 - Matthew Barr - Enterprise Git - the hard bits Gene Kim
 
Choose Your Own Adventure with JHipster & Kubernetes - Denver JUG 2020
Choose Your Own Adventure with JHipster & Kubernetes - Denver JUG 2020Choose Your Own Adventure with JHipster & Kubernetes - Denver JUG 2020
Choose Your Own Adventure with JHipster & Kubernetes - Denver JUG 2020Matt Raible
 
9 steps to awesome with kubernetes
9 steps to awesome with kubernetes9 steps to awesome with kubernetes
9 steps to awesome with kubernetesBaraniBuuny
 
Evolution of GitLab Frontend
Evolution of GitLab FrontendEvolution of GitLab Frontend
Evolution of GitLab FrontendFatih Acet
 
EMC World 2016 12 Factor Apps FTW
EMC World 2016 12 Factor Apps FTWEMC World 2016 12 Factor Apps FTW
EMC World 2016 12 Factor Apps FTWTommy Trogden
 
To Microservices and Beyond
To Microservices and BeyondTo Microservices and Beyond
To Microservices and BeyondMatt Stine
 
Streaming Media West: Webrtc the future of low latency streaming
Streaming Media West: Webrtc the future of low latency streamingStreaming Media West: Webrtc the future of low latency streaming
Streaming Media West: Webrtc the future of low latency streamingAlexandre Gouaillard
 

What's hot (20)

Ingress? That’s So 2020! Introducing the Kubernetes Gateway API
Ingress? That’s So 2020! Introducing the Kubernetes Gateway APIIngress? That’s So 2020! Introducing the Kubernetes Gateway API
Ingress? That’s So 2020! Introducing the Kubernetes Gateway API
 
Creating macOS Build Infrastructure in the Cloud
Creating macOS Build Infrastructure in the CloudCreating macOS Build Infrastructure in the Cloud
Creating macOS Build Infrastructure in the Cloud
 
Enterprise git - the hard bits
Enterprise git -  the hard bitsEnterprise git -  the hard bits
Enterprise git - the hard bits
 
Full-Stack Development with Spring Boot and VueJS
Full-Stack Development with Spring Boot and VueJSFull-Stack Development with Spring Boot and VueJS
Full-Stack Development with Spring Boot and VueJS
 
What’s new in grails framework 5?
What’s new in grails framework 5?What’s new in grails framework 5?
What’s new in grails framework 5?
 
How to convert your Full Trust Solutions to the SharePoint Framework (SPFx)
How to convert your Full Trust Solutions to the SharePoint Framework (SPFx)How to convert your Full Trust Solutions to the SharePoint Framework (SPFx)
How to convert your Full Trust Solutions to the SharePoint Framework (SPFx)
 
Continuous Code Quality with the sonar ecosystem
Continuous Code Quality with the sonar ecosystemContinuous Code Quality with the sonar ecosystem
Continuous Code Quality with the sonar ecosystem
 
Graph ql subscriptions on the jvm
Graph ql subscriptions on the jvmGraph ql subscriptions on the jvm
Graph ql subscriptions on the jvm
 
Rendering: Or why your perfectly optimized content doesn't rank
Rendering: Or why your perfectly optimized content doesn't rankRendering: Or why your perfectly optimized content doesn't rank
Rendering: Or why your perfectly optimized content doesn't rank
 
Secrets of Successful Digital Transformers
Secrets of Successful Digital TransformersSecrets of Successful Digital Transformers
Secrets of Successful Digital Transformers
 
DOES SFO 2016 - Matthew Barr - Enterprise Git - the hard bits
DOES SFO 2016 - Matthew Barr - Enterprise Git - the hard bits DOES SFO 2016 - Matthew Barr - Enterprise Git - the hard bits
DOES SFO 2016 - Matthew Barr - Enterprise Git - the hard bits
 
VizEx View HTML5 workshop 2017
VizEx View HTML5 workshop 2017VizEx View HTML5 workshop 2017
VizEx View HTML5 workshop 2017
 
Choose Your Own Adventure with JHipster & Kubernetes - Denver JUG 2020
Choose Your Own Adventure with JHipster & Kubernetes - Denver JUG 2020Choose Your Own Adventure with JHipster & Kubernetes - Denver JUG 2020
Choose Your Own Adventure with JHipster & Kubernetes - Denver JUG 2020
 
9 steps to awesome with kubernetes
9 steps to awesome with kubernetes9 steps to awesome with kubernetes
9 steps to awesome with kubernetes
 
Evolution of GitLab Frontend
Evolution of GitLab FrontendEvolution of GitLab Frontend
Evolution of GitLab Frontend
 
Future of Grails
Future of GrailsFuture of Grails
Future of Grails
 
EMC World 2016 12 Factor Apps FTW
EMC World 2016 12 Factor Apps FTWEMC World 2016 12 Factor Apps FTW
EMC World 2016 12 Factor Apps FTW
 
Graalvm with Groovy and Kotlin - Madrid GUG 2019
Graalvm with Groovy and Kotlin - Madrid GUG 2019Graalvm with Groovy and Kotlin - Madrid GUG 2019
Graalvm with Groovy and Kotlin - Madrid GUG 2019
 
To Microservices and Beyond
To Microservices and BeyondTo Microservices and Beyond
To Microservices and Beyond
 
Streaming Media West: Webrtc the future of low latency streaming
Streaming Media West: Webrtc the future of low latency streamingStreaming Media West: Webrtc the future of low latency streaming
Streaming Media West: Webrtc the future of low latency streaming
 

Similar to Fluent 2018: Tracking Performance of the Web with HTTP Archive

Website & Internet + Performance testing
Website & Internet + Performance testingWebsite & Internet + Performance testing
Website & Internet + Performance testingRoman Ananev
 
Tracking the Performance of the Web Over Time with the HTTP Archive
Tracking the Performance of the Web Over Time with the HTTP ArchiveTracking the Performance of the Web Over Time with the HTTP Archive
Tracking the Performance of the Web Over Time with the HTTP ArchiveAkamai Developers & Admins
 
Akamai Edge: Tracking the Performance of the Web with HTTP Archive
Akamai Edge: Tracking the Performance of the Web with HTTP ArchiveAkamai Edge: Tracking the Performance of the Web with HTTP Archive
Akamai Edge: Tracking the Performance of the Web with HTTP ArchiveRick Viscomi
 
NY WebPerf Sept '22 - Performance Mistakes - An HTTP Archive Deep Dive
NY WebPerf Sept '22 - Performance Mistakes - An HTTP Archive Deep DiveNY WebPerf Sept '22 - Performance Mistakes - An HTTP Archive Deep Dive
NY WebPerf Sept '22 - Performance Mistakes - An HTTP Archive Deep DivePaul Calvano
 
Why reinvent the wheel at Criteo?
Why reinvent the wheel at Criteo? Why reinvent the wheel at Criteo?
Why reinvent the wheel at Criteo? Criteolabs
 
UI5 with Akamai - Introduction to the Content Delivery Network
UI5 with Akamai - Introduction to the Content Delivery NetworkUI5 with Akamai - Introduction to the Content Delivery Network
UI5 with Akamai - Introduction to the Content Delivery NetworkGokul Anand E, PMP®
 
Web performance optimization - MercadoLibre
Web performance optimization - MercadoLibreWeb performance optimization - MercadoLibre
Web performance optimization - MercadoLibrePablo Moretti
 
What is Nginx and Why You Should to Use it with Wordpress Hosting
What is Nginx and Why You Should to Use it with Wordpress HostingWhat is Nginx and Why You Should to Use it with Wordpress Hosting
What is Nginx and Why You Should to Use it with Wordpress HostingWPSFO Meetup Group
 
There is something about serverless
There is something about serverlessThere is something about serverless
There is something about serverlessgjdevos
 
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...DataBench
 
Exploratory Analysis of Spark Structured Streaming
Exploratory Analysis of Spark Structured StreamingExploratory Analysis of Spark Structured Streaming
Exploratory Analysis of Spark Structured Streamingt_ivanov
 
Make Drupal Run Fast - increase page load speed
Make Drupal Run Fast - increase page load speedMake Drupal Run Fast - increase page load speed
Make Drupal Run Fast - increase page load speedPromet Source
 
JS digest. Decemebr 2017
JS digest. Decemebr 2017JS digest. Decemebr 2017
JS digest. Decemebr 2017ElifTech
 
Altitude San Francisco 2018: HTTP Invalidation Workshop
Altitude San Francisco 2018: HTTP Invalidation WorkshopAltitude San Francisco 2018: HTTP Invalidation Workshop
Altitude San Francisco 2018: HTTP Invalidation WorkshopFastly
 
Exploring Google APIs with Python
Exploring Google APIs with PythonExploring Google APIs with Python
Exploring Google APIs with Pythonwesley chun
 
Building Web Mobile App that don’t suck - FITC Web Unleashed - 2014-09-18
Building Web Mobile App that don’t suck - FITC Web Unleashed - 2014-09-18Building Web Mobile App that don’t suck - FITC Web Unleashed - 2014-09-18
Building Web Mobile App that don’t suck - FITC Web Unleashed - 2014-09-18Frédéric Harper
 
Web performance mercadolibre - ECI 2013
Web performance   mercadolibre - ECI 2013Web performance   mercadolibre - ECI 2013
Web performance mercadolibre - ECI 2013Santiago Aimetta
 

Similar to Fluent 2018: Tracking Performance of the Web with HTTP Archive (20)

Website & Internet + Performance testing
Website & Internet + Performance testingWebsite & Internet + Performance testing
Website & Internet + Performance testing
 
Tracking the Performance of the Web Over Time with the HTTP Archive
Tracking the Performance of the Web Over Time with the HTTP ArchiveTracking the Performance of the Web Over Time with the HTTP Archive
Tracking the Performance of the Web Over Time with the HTTP Archive
 
Akamai Edge: Tracking the Performance of the Web with HTTP Archive
Akamai Edge: Tracking the Performance of the Web with HTTP ArchiveAkamai Edge: Tracking the Performance of the Web with HTTP Archive
Akamai Edge: Tracking the Performance of the Web with HTTP Archive
 
Web Performance Optimization
Web Performance OptimizationWeb Performance Optimization
Web Performance Optimization
 
NY WebPerf Sept '22 - Performance Mistakes - An HTTP Archive Deep Dive
NY WebPerf Sept '22 - Performance Mistakes - An HTTP Archive Deep DiveNY WebPerf Sept '22 - Performance Mistakes - An HTTP Archive Deep Dive
NY WebPerf Sept '22 - Performance Mistakes - An HTTP Archive Deep Dive
 
JAMStack
JAMStackJAMStack
JAMStack
 
Why reinvent the wheel at Criteo?
Why reinvent the wheel at Criteo? Why reinvent the wheel at Criteo?
Why reinvent the wheel at Criteo?
 
UI5 with Akamai - Introduction to the Content Delivery Network
UI5 with Akamai - Introduction to the Content Delivery NetworkUI5 with Akamai - Introduction to the Content Delivery Network
UI5 with Akamai - Introduction to the Content Delivery Network
 
Web performance optimization - MercadoLibre
Web performance optimization - MercadoLibreWeb performance optimization - MercadoLibre
Web performance optimization - MercadoLibre
 
What is Nginx and Why You Should to Use it with Wordpress Hosting
What is Nginx and Why You Should to Use it with Wordpress HostingWhat is Nginx and Why You Should to Use it with Wordpress Hosting
What is Nginx and Why You Should to Use it with Wordpress Hosting
 
There is something about serverless
There is something about serverlessThere is something about serverless
There is something about serverless
 
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
 
Exploratory Analysis of Spark Structured Streaming
Exploratory Analysis of Spark Structured StreamingExploratory Analysis of Spark Structured Streaming
Exploratory Analysis of Spark Structured Streaming
 
Make Drupal Run Fast - increase page load speed
Make Drupal Run Fast - increase page load speedMake Drupal Run Fast - increase page load speed
Make Drupal Run Fast - increase page load speed
 
JS digest. Decemebr 2017
JS digest. Decemebr 2017JS digest. Decemebr 2017
JS digest. Decemebr 2017
 
Altitude San Francisco 2018: HTTP Invalidation Workshop
Altitude San Francisco 2018: HTTP Invalidation WorkshopAltitude San Francisco 2018: HTTP Invalidation Workshop
Altitude San Francisco 2018: HTTP Invalidation Workshop
 
Exploring Google APIs with Python
Exploring Google APIs with PythonExploring Google APIs with Python
Exploring Google APIs with Python
 
CloudStack Hyderabad Meetup: How the Apache community works
CloudStack Hyderabad Meetup: How the Apache community worksCloudStack Hyderabad Meetup: How the Apache community works
CloudStack Hyderabad Meetup: How the Apache community works
 
Building Web Mobile App that don’t suck - FITC Web Unleashed - 2014-09-18
Building Web Mobile App that don’t suck - FITC Web Unleashed - 2014-09-18Building Web Mobile App that don’t suck - FITC Web Unleashed - 2014-09-18
Building Web Mobile App that don’t suck - FITC Web Unleashed - 2014-09-18
 
Web performance mercadolibre - ECI 2013
Web performance   mercadolibre - ECI 2013Web performance   mercadolibre - ECI 2013
Web performance mercadolibre - ECI 2013
 

Recently uploaded

Augmented and Mixed Reality Solutions for Aerospace & Defense
Augmented and Mixed Reality Solutions for Aerospace & DefenseAugmented and Mixed Reality Solutions for Aerospace & Defense
Augmented and Mixed Reality Solutions for Aerospace & Defensethirdeyegen65
 
Red shadows ringing in Japan's Cyberspace
Red shadows ringing in Japan's CyberspaceRed shadows ringing in Japan's Cyberspace
Red shadows ringing in Japan's Cyberspacesttyk
 
Regulation is Coming - Trusted Media Summit 2023
Regulation is Coming - Trusted Media Summit 2023Regulation is Coming - Trusted Media Summit 2023
Regulation is Coming - Trusted Media Summit 2023Damar Juniarto
 
Augmented and Mixed Reality Solutions for Frontline Medical Professionals
Augmented and Mixed Reality Solutions for Frontline Medical ProfessionalsAugmented and Mixed Reality Solutions for Frontline Medical Professionals
Augmented and Mixed Reality Solutions for Frontline Medical Professionalsthirdeyegen65
 
Obstructive jaundice is a medical condition characterized by the yellowing of...
Obstructive jaundice is a medical condition characterized by the yellowing of...Obstructive jaundice is a medical condition characterized by the yellowing of...
Obstructive jaundice is a medical condition characterized by the yellowing of...ssuser7b7f4e
 
UGB INTERNETBANKING FACILITY LAUNCHED.pptx
UGB INTERNETBANKING FACILITY LAUNCHED.pptxUGB INTERNETBANKING FACILITY LAUNCHED.pptx
UGB INTERNETBANKING FACILITY LAUNCHED.pptxRitesh Sahu
 
Biometrics Technology Intresting PPT
Biometrics Technology Intresting PPTBiometrics Technology Intresting PPT
Biometrics Technology Intresting PPTPraveenKumarThota7
 
Model Jaringan network jaringan komputer.pdf
Model Jaringan network jaringan komputer.pdfModel Jaringan network jaringan komputer.pdf
Model Jaringan network jaringan komputer.pdfgalfinprihardiputra0
 
AWS Overview of AWS Clarify, Feature Store, Hyper parameter Tuning
AWS Overview of AWS  Clarify, Feature Store, Hyper parameter TuningAWS Overview of AWS  Clarify, Feature Store, Hyper parameter Tuning
AWS Overview of AWS Clarify, Feature Store, Hyper parameter TuningVarun Garg
 

Recently uploaded (9)

Augmented and Mixed Reality Solutions for Aerospace & Defense
Augmented and Mixed Reality Solutions for Aerospace & DefenseAugmented and Mixed Reality Solutions for Aerospace & Defense
Augmented and Mixed Reality Solutions for Aerospace & Defense
 
Red shadows ringing in Japan's Cyberspace
Red shadows ringing in Japan's CyberspaceRed shadows ringing in Japan's Cyberspace
Red shadows ringing in Japan's Cyberspace
 
Regulation is Coming - Trusted Media Summit 2023
Regulation is Coming - Trusted Media Summit 2023Regulation is Coming - Trusted Media Summit 2023
Regulation is Coming - Trusted Media Summit 2023
 
Augmented and Mixed Reality Solutions for Frontline Medical Professionals
Augmented and Mixed Reality Solutions for Frontline Medical ProfessionalsAugmented and Mixed Reality Solutions for Frontline Medical Professionals
Augmented and Mixed Reality Solutions for Frontline Medical Professionals
 
Obstructive jaundice is a medical condition characterized by the yellowing of...
Obstructive jaundice is a medical condition characterized by the yellowing of...Obstructive jaundice is a medical condition characterized by the yellowing of...
Obstructive jaundice is a medical condition characterized by the yellowing of...
 
UGB INTERNETBANKING FACILITY LAUNCHED.pptx
UGB INTERNETBANKING FACILITY LAUNCHED.pptxUGB INTERNETBANKING FACILITY LAUNCHED.pptx
UGB INTERNETBANKING FACILITY LAUNCHED.pptx
 
Biometrics Technology Intresting PPT
Biometrics Technology Intresting PPTBiometrics Technology Intresting PPT
Biometrics Technology Intresting PPT
 
Model Jaringan network jaringan komputer.pdf
Model Jaringan network jaringan komputer.pdfModel Jaringan network jaringan komputer.pdf
Model Jaringan network jaringan komputer.pdf
 
AWS Overview of AWS Clarify, Feature Store, Hyper parameter Tuning
AWS Overview of AWS  Clarify, Feature Store, Hyper parameter TuningAWS Overview of AWS  Clarify, Feature Store, Hyper parameter Tuning
AWS Overview of AWS Clarify, Feature Store, Hyper parameter Tuning
 

Fluent 2018: Tracking Performance of the Web with HTTP Archive

  • 1. ©2016 AKAMAI | FASTER FORWARDTM Tracking Performance of the Web with HTTP Archive 6/13/2018 Paul Calvano Senior Web Performance Architect @paulcalvano pacalvan@akamai.com
  • 2. 2 Paul Calvano @paulcalvano Akamai About Me ● Web Performance Architect @ Akamai ● HTTP Archive / BigQuery Addict :) ● Working on #WebPerf since 2000 ● https://paulcalvano.com ● @paulcalvano on Twitter
  • 5. 5 How it Works ● Alexa’s top 500,000 websites ○ Home pages ○ Desktop and emulated mobile ○ Increasing to 1,000,000 soon! ● Powered by WebPageTest ○ Records HAR trace ○ Executes custom metrics ○ Records Lighthouse audits ● httparchive.org ○ Trends and stats ○ Discussion forum ● BigQuery and Cloud Storage ○ Queryable database ○ Raw HARs
  • 10. 10 discuss.httparchive.orgThe tip of the databerg Curated stats/trends Raw data
  • 11. 11 A Peek Inside the Databerg… DataSet Description Size (GB) Rows summary_pages Summary of all Desktop and Mobile Pages ~340MB Desktop: ~460K Mobile: ~450K summary_requests Summary of all HTTP Requests for Desktop and Mobile ~45 GB Desktop: ~48 million Mobile: ~44 million pages JSON-encoded parent document HAR data ~5 GB Desktop: ~460K Mobile: ~450K requests JSON encoded subresource HAR data ~290 GB Desktop: ~48 million Mobile: ~44 million response_bodies JSON encoded response bodies for textual subresources ~915 GB Desktop: ~18 million Mobile: ~14 million lighthouse JSON encoded Lighthouse Report. Mobile only ~140 GB Mobile: ~450K * rows and size stats are based on 5/15/18 run
  • 12. 12 Who uses the HTTP Archive? ● Scholars ● Community ● Industry
  • 13. 13 goo.gl/kxgzM1HTTP Archive referenced in research papers . . . In this article we utilize the httparchive.org [9] publicly available dataset of captured web performance metrics . . . Desktop and mobile web page comparison: characteristics, trends, and implications IEEE Communications Magazine ( Volume: 52, Issue: 9, September 2014 ) . . . Recent stats from httparchive.org show that the top 300K URLs in the world need on average 38(!) TCP connections to display the site . . . HTTP2 explained Computer Communication Review 44.3 (2014): 120-128. . . . We make extensive use of the [...] data available at HTTP Archive to expose the characteristics of 3rd Party assets embedded into the top 16,000 Alexa webpages . . . Are 3rd Parties Slowing Down the Mobile Web? Proceedings of the Eighth Wireless of the Students, by the Students, and for the Students Workshop. ACM, 2016.
  • 14. 14goo.gl/oJSIjm The web community uses HTTP Archive to answer questions about the state of the web goo.gl/PvVHhL
  • 15. 15 goo.gl/YPdLJuRemember when the average page size exceeded Doom?
  • 16. 16 goo.gl/Lnp1ne goo.gl/e8kXq9Industrial tools use HTTP Archive data for calibration
  • 17. 17 bit.ly/2JohsOc bit.ly/2JsxN0mHTTP Archive is used for emerging Internet Standards
  • 18. 18 Case Study #1 Compression 1. Updating Akamai Gzip Defaults 2. Brotli Compression
  • 19. 19 Last Mile Acceleration (LMA) ● Akamai feature to gzip compress content at the CDN edge ○ Helps out when origins do not compress certain resources ● Compression is based on HTTP Content Type ● Old defaults were not sufficient and usually required updating… ● We updated this a few years ago, using HTTP Archive data
  • 20. 20 SELECT mimeType, count(*) total, SUM(IF(resp_content_encoding = "gzip",1,0)) gzip, SUM(IF(resp_content_encoding = "deflate",1,0)) deflate, SUM(IF(resp_content_encoding IN("gzip","deflate"),0,1)) NoCompression, ROUND( SUM( IF(resp_content_encoding IN("gzip", "deflate"),1,0) ) / COUNT(*),2) CompressedPercentage FROM httparchive.summary_requests.2018_05_15_desktop GROUP BY mimeType HAVING total > 1000 ORDER BY gzip DESC bit.ly/2y1fKNIQuerying the HTTP Archive for Compression Stats 1 2 3 4 5 6 7 8 9 10 11 12
  • 21. 21 Some New LMA Defaults ● text/javascript ● font/ttf ● application/javascript ● text/xml ● application/json ● application/xml ● ... Many Content-Types that did not match the original defaults!
  • 22. 22 Modern Set of Last Mile Acceleration Defaults
  • 23. 23 What About Brotli? ● New compression algorithm developed by Google researchers ● 5% - 25% Reduction over Gzip Compression ● Supported by most browsers ● Let’s extend the previous query to include Brotli compression
  • 24. 24 SELECT mimeType, count(*) total, SUM(IF(resp_content_encoding = "gzip",1,0)) gzip, SUM(IF(resp_content_encoding = "br",1,0)) brotli, SUM(IF(resp_content_encoding = "deflate",1,0)) deflate, SUM(IF(resp_content_encoding IN("gzip","deflate","br"),0,1)) NoCompression, ROUND( SUM( IF(resp_content_encoding IN("gzip", "deflate", "br"),1,0) ) / COUNT(*),2) CompressedPercentage, ROUND( SUM( IF(resp_content_encoding = "br",1,0) ) / COUNT(*),2) BrotliCompressedPercentage FROM httparchive.summary_requests.2018_05_15_desktop GROUP BY mimeType HAVING total > 1000 ORDER BY brotli DESC bit.ly/2Mcawl1Compression Stats - gzip and brotli 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
  • 25. 25 Examining Compression By Content Type - With Brotli Brotli use has been growing, but where is it most prevalent?
  • 26. 26 Brotli Usage: Mostly JavaScript, CSS and HTML Resources When we exclude Google and Facebook content, the bulk of Brotli encoded content is JS and CSS
  • 27. 27 Compression Level = Overhead Data on compression speeds from quixdb.github.io/squash-benchmark/#results Most byte savings are obtained by using the highest compression level.
  • 28. 28 Resource Optimizer: Automated Brotli Compression at the Edge ● Automatically compresses CSS and JS with Brotli ● Resources are compressed offline and then cached ● Brotli compression level 11, without the overhead! + =
  • 29. 29 Case Study #2 Server Technologies 1. Akamai Varnish Connector 2. Security Vulnerability Research
  • 31. 31 How Many Akamai Customers Use Varnish at the Origin? ● HTTP Archive data helped determine which Akamai’s customers were using Varnish at the origin. ● Akamai Product Management was able to discuss desirable functionality with existing customers.
  • 32. 32 Investigating Security Threats - 0 Day Vulnerability ● HTTP Archive ○ Investigate other sites that contain similar characteristics ■ Server, Via, Url Regex Patterns, Other Headers ○ Export a list of sites that appear vulnerable ○ Cross Reference with Akamai Account Data ■ Notify 24x7 Security Contacts, ■ Help customers proactively protect themselves ● No CVE or Historical Attack Patterns ○ Target attack observed and mitigated (Kona Managed Security) ○ Akamai WAF rules prepared ○ Vendor notified ● Example: 0 Day Vulnerability on an Ecommerce App Server
  • 33. 33 Another 0-Day: How Can We Identify Sites Running Drupal Drupal announced that a “highly critical” security release would be happening within a week Expectation is that it would give sites time to prepare for an emergency security patch before 0 Day exploits begin… https://www.drupal.org/sa-core-2018-002
  • 34. 34 Identifying Sites Running Drupal with HTTP Archive First Try: ● WHERE url LIKE "%drupal.js%" ● Found 97 sites using Akamai and Drupal Second Try: ● Expires header = 'Sun, 19 Nov 1978 05:00:00 GMT' ○ https://www.ostraining.com/blog/drupal/5-ways-drupal/ ● Found more sites using Akamai and Drupal ○ ~26K total requests with this expires header! What Did We do With this Data? ● Customer Outreach (Are you aware and prepared to patch?) ● Prepared WAF rules for those not able to apply patches immediately.
  • 35. 35 Investigating Security Threats - CryptoCurrency Miners ● Do any of my customers have cryptocurrency miners? ○ Are they aware? ○ Do they know how it got there? https://discuss.httparchive.org/t/the-performance-impact-of-cryptocurrency-mining-on-the-web/1126/
  • 36. 36 Now Easier with Wappalyzer! ● Wappalyzer is a Cross Platform utility that uncovers technologies used on websites. ● Integrated into HTTP Archive since April 2018 https://discuss.httparchive.org/t/using-wappalyzer-to-analyze-cpu-times-across-js-frameworks/1336/
  • 37. 37 Investigating Security Threats - What Domains are Serving Malware? ● Akamai’s ETP Service ○ Millions of malicious domains and IP addresses. ○ Are any of my customers serving 3rd party content from known malware hosts? ● HTTP Archive parsed against the ETP DB ○ Notified accounts if they served content to the HTTP Archive from known malware hosts
  • 38. 38 Case Study #3 Third Party Research 1. How 3rd Parties Influence Render Time? 2. Researching a specific 3rd party
  • 39. 39 Do Third Parties Impact Load Time? https://discuss.httparchive.org/t/analyzing-3rd-party-performance-via-http-archive-crux/1359 ● CrUX = Chrome User Experience Report ● JOINed’ w/ HTTP Archive data for Alexa Ranks ● Load times are faster for sites with less third party content.
  • 40. 40bit.ly/2sN2TJZWhich 3rd Party Content Types Load Before Render Start? SELECT mimeType, COUNT(*) num_requests, SUM( IF(req.startedDateTime < (pages.startedDateTime + (renderStart/1000) ),1,0) ) BeforeRenderStart, SUM( IF(req.startedDateTime < (pages.startedDateTime + (renderStart/1000) ),0,1) ) AfterRenderStart FROM httparchive.summary_requests.2017_09_01_desktop req JOIN ( SELECT rank, NET.HOST(url) hostname, url, pageid, startedDateTime, renderStart FROM httparchive.summary_pages.2017_09_01_desktop ) pages ON pages.pageid = req.pageid WHERE NET.HOST(req.url) != pages.hostname AND rank > 0 AND rank < 100000 GROUP BY mimeType HAVING num_requests > 1000 ORDER BY BeforeRenderStart DESC 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
  • 41. 41 What 3rd Party Content Loads Before RenderStart? discuss.httparchive.org/t/which-3rd-party-content-loads-before-render-start/1084
  • 42. 42bit.ly/2glVquXWhich 3rd Parties Load Before Render Start? SELECT NET.HOST(req.url) thirdparty, mimeType, COUNT(*) num_requests, SUM( IF(req.startedDateTime < (pages.startedDateTime + (renderStart/1000) ),1,0) ) BeforeRenderStart, SUM( IF(req.startedDateTime < (pages.startedDateTime + (renderStart/1000) ),0,1) ) AfterRenderStart FROM httparchive.summary_requests.2017_09_01_desktop req JOIN ( SELECT rank, NET.HOST(url) hostname, url, pageid, startedDateTime, renderStart FROM httparchive.summary_pages.2017_09_01_desktop ) pages ON pages.pageid = req.pageid WHERE NET.HOST(req.url) != pages.hostname AND rank > 0 AND rank < 100000 GROUP BY thirdparty, mimeType HAVING num_requests > 100 ORDER BY BeforeRenderStart DESC 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
  • 43. 43 What 3rd Party Domains Loads Before RenderStart?
  • 44. 44bit.ly/2LuD1JqWhich Websites Load <3rd Party> Before vs After Render Time? SELECT rank, site, SUM( IF(req.startedDateTime < (pages.startedDateTime + (renderStart/1000) ),1,0) ) BeforeRenderStart, SUM( IF(req.startedDateTime < (pages.startedDateTime + (renderStart/1000) ),0,1) ) AfterRenderStart FROM httparchive.summary_requests.2017_09_01_desktop req INNER JOIN ( SELECT rank, NET.HOST(url) site, pageid, startedDateTime, renderStart FROM httparchive.summary_pages.2017_09_01_desktop ) pages ON pages.pageid = req.pageid WHERE NET.HOST(req.url) LIKE "%ensighten.com%" AND rank > 0 GROUP BY rank, site HAVING AfterRenderStart>0 ORDER BY rank ASC 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  • 45. 45 Which Sites Load a Specific 3rd Party Before RenderStart? ● This query outputs a summary containing the following information: ○ Which sites use the 3rd party? ○ How many resources are served by it? ○ How many of them are loaded before/after the page renders? ● Results can help sites learn from each other’s best practices ○ Even across industries!!!
  • 46. 46 Getting Started 1. Google Cloud project 2. BigQuery config
  • 47. 47 bit.ly/httparchive-bigqueryUp and running - Create a Google Cloud Project ⦁ console.cloud.google.com/projectcreate
  • 48. 48 Up and running - Add the HTTP Archive Tables ⦁ bigquery.cloud.google.com bit.ly/httparchive-bigquery
  • 49. 49 http://bit.ly/httparchive-bigqueryUp and running - Start Exploring! ● Detailed Setup Instructions: ○ bit.ly/httparchive-bigquery ● BigQuery Standard SQL Documentation ○ cloud.google.com/bigquery/docs/reference/standard-sql/ ● HTTP Archive Discussion Forum ○ discuss.httparchive.org
  • 51. 51 HTTP Archive Sponsors Contact team@httparchive.org if interested