Http2 in practice

HTTP/2 in Practice
Patrick Meenan
@PatMeenan

Outline
• Browser
• HTTP/1.x vs HTTP/2
• Resource Prioritization
• Network
• TCP Buffering
• Network Buffering
• Server
• Future

https://developers.google.com/web/fundamentals/performance/critical-rendering-path/render-tree-construction

Basic Parser Rules
•Process 1 token at a time
•Stylesheets Block Render
•Non-Async Script tags block Parser/DOM until:
•Pending Stylesheets have loaded
•Script has loaded

Browser Events
•DOM Content Loaded
• When the parser reaches the end of the document
•Load
• When all of the document resources have finished loading

Late-Discovered Resources
• Fonts
• Background Images
• Script-injected content
• @import

(Simplified) Example
•1 HTML
•1 Stylesheet
•4 Scripts (2 blocking in the head)
•1 Web Font
•13 Images (5 visible)

In-browser Prioritization (Chrome)
HTML CSS
Script 1 Script 2
Image 1
VeryHigh
High
Medium
Async 1 Async 2Low
VeryLow
Image 2 Image 3 Image …

In-browser Prioritization (Chrome)
HTML CSS
Script 1 Script 2
Image 1
VeryHigh
High
Medium
Async 1 Async 2Low
VeryLow
Font

HTML CSS
Script 1 Script 2
Image 6
VeryHigh
High
Medium
Async 1 Async 2Low
VeryLow
Font
Image 1 Image 2 Image 3 Image 4 Image 5

HTTP/1.x Prioritization
• 6 Connections per origin
• Pick next-highest for each origin as a connection becomes available

HTMLVeryHigh
High
Medium
Low
VeryLow

HTML CSS
Script 1 Script 2
Image 1
VeryHigh
High
Medium
Async 1 Async 2Low
VeryLow

HTML CSS
Script 1 Script 2
Image 1
VeryHigh
High
Medium
Async 1 Async 2Low
VeryLow
Image 2 Image 3 Image 4 Image …

HTTP/2 Prioritization
•All requests sent to server immediately
•Priorities specified in dependency tree
•Any “stream” can depend on another stream
•Peers can be weighted
•Priority changes communicated with a PRIORITY
frame

https://developers.google.com/web/fundamentals/performance/http2/

Optimal Loading
Start Render
Visually Complete

Worst-Case
Start Render
Visually Complete

Chrome – Linear List
https://github.com/quicwg/wg-materials/raw/master/interim-19-05/priorities.pdf

Firefox - Groups

Safari - Weighted

Prioritizing across
connections

Cross-connection prioritization
•3rd-parties
•Domain sharding
•www.example.com
•static1.example.com
•static2.example.com

Prioritizing across connections

HTTP/2 Connection coalescing
•New domain resolution includes IP of active connection
•Active connection cert includes new domain
•Still pay DNS lookup cost
• ORIGIN frame/CERTIFICATE frame
•Multi-CDN

Edge/Origin
HTTP Load Balancer
(HA Proxy, Netscaler, etc)
HTTP/2 Broken
(and fixed) Here

Edge/Origin
Layer 4 Load Balancer
HTTP/2 Broken
(and fixed) Here

Edge/Origin
CDN
HTTP/2 Broken
(and fixed) Here

Edge/Upstream
Best Effort, Data sent as available

Same-origin fonts (wtf)
Start Render

Same-origin fonts (correct)
Start Render

https://twitter.com/zachleat/status/1055219667894259712

Testing for prioritization
https://www.webpagetest.org/http2priorities.html?image=
<url-encoded-image-URL>

Request 30
below-the-fold
Images
(low priority)

Sequentially Request 2
in-viewport images
(high priority)

Interrupts in-flight responses

Ideal response time would
match single-image download

Queued behind in-flight
Low-priority requests

Fast Responses
Data Interleaving?

ishttp2fastyet.com
NO – 9 of 25 pass

The Good
• Akamai
• CDNsun
• Cloudflare
• Fastly

The Bad
• Amazon Cloudfront
• Cachefly
• CDNetworks
• ChinaCache
• Edgecast
• Highwinds
• Incapsula
• Instart Logic
• KeyCDN
• LeaseWeb CDN
• Level 3
• Limelight
• Medianova
• Netlify
• Rocket CDN
• StackPath/NetDNA/MaxCDN
• Jetpack CDN
• Yottaa
• Zenedge

The Ugly (cloud load balancers)
• Amazon AWS
• Google Cloud
• Microsoft Azure

Testing/Throttling
https://calendar.perfplanet.com/2016/testing-with-realistic-networking-conditions/

Renderer Renderer
Browser Process

Renderer Renderer
Browser Process
Packet-
level
Shaper

Performance Monitoring
• Understand what (if any) traffic-shaping is
used
• Watch out for (avoid):
•Dev Tools
•Lighthouse
•Puppeteer
•Proxy

TCP
2 13 4 5 6 7 5 6 7 8
Receive Buffer Send BufferBDP

BDP
Round Trip Time : 100 ms
(divide by 1000 to get seconds)
= 0.100 second rtt

BDP
Bandwidth : 8 Mbps
(divide by 8 bits per byte)
= 1 MB per second

BDP
1 MB per second * 0.100 second RTT
= 100 KB BDP

Bandwidth Latency (RTT) BDP
400 Kbps 400 ms (400 / 8) KB * (400 / 1000) s 20 KB
1.6 Mbps 150 ms (1.6 / 8) MB * (150 / 1000) s 30 KB
8 Mbps 100 ms (8 / 8) MB * (100 / 1000) s 100 KB
50 Mbps 25 ms (50 / 8) MB * (25 / 1000) s 1.6 MB
1 Gbps 100 ms (1000 / 8) MB * (100 / 1000) s 12.5 MB
10 Gbps 200 ms (10 / 8) GB * (200 / 1000) s 250 MB

Congestion Window
2 13 4 5 6 7 5 6 7 8
Receive Buffer Send BufferBDP
MIN(RCV, BDP, SEND)

TCP Send Buffer
net.ipv4.tcp_wmem = 10240 87380 12582912
Min Default Max

TCP Auto Tuning
• Disabled if explicit socket buffer is set
• Starts at the Default
• Grows towards max as needed

TCP_NOTSENT_LOWAT
• Specify additional buffer above BDP
• Signals socket available only when send buffer drops
below threshold

TCP_NOTSENT_LOWAT
net.ipv4.tcp_notsent_lowat = 16384

Over-buffering (TCP)
• Impact on reprioritization
• Server metrics (send time artificially low)

TCP Congestion Control
• Start with initial Congestion Window (CWND)
• Ramp-up until network is “full”
• Reduce
• lather, rinse, repeat

BBR starvation
https://blog.apnic.net/2017/05/09/bbr-new-kid-tcp-block/

TCP Slow Start
• Start with Congestion Window (CW) = Initial Window (IW)
• Used to default to 4. 10 More common. Measured in packets.
• 10 ~ 15,000 bytes (assuming 1500 MTU)
• Increase CW by 1 for every
• Doubles every RTT
• Round Trips = log2(BDP / IW)

Bandwidth Latency (RTT) BDP Slow Start
Round Trips
Slow Start
Time
400 Kbps 400 ms 20 KB 1 400 ms
1.6 Mbps 150 ms 30 KB 1 150 ms
8 Mbps 100 ms 100 KB 3 300 ms
50 Mbps 25 ms 1.6 MB 7 175 ms
1 Gbps 100 ms 12.5 MB 16 1.6 s
10 Gbps 200 ms 250 MB 19 3.8 s
Assuming Initial Congestion Window 10

Query Initial Congestion Window

Configuring IW
Set on a per-route basis:
sudo ip route change default via 192.168.1.1 dev eth0 proto static initcwnd 10

Server Buffers
TLSTCP HTTP/2
HTTP TCP
HTTP TCP
fd

Upstream Limits
• 100+ simultaneous inbound requests
• Back-end request limits?

Server HTTP/2 implementations
• H2o, Apache good
• Nginx – not so much
• https://trac.nginx.org/nginx/ticket/1763

Online Certificate Status Protocol (OCSP)
Stop the world and query the OCSP server
● DNS lookup
● TCP connect
● Wait for server response
What if the OCSP check times out, gets blocked, etc? See “Revocation still doesn’t work.”
Has this certificate been revoked?

● Chrome blocks on EV certs only
● Other browsers may block on all (FF)
Eliminating OCSP latency
OCSP endpoint
Use OCSP stapling!
1. Server retrieves the OCSP response
2. Server “staples” response to certificate
3. Client verifies stapled response

TLS handshake with stapled OCSP response...
$> openssl s_client -connect example.com:443 -tls1 -tlsextdebug -status
OCSP Response Data:
OCSP Response Status: successful (0x0)
Response Type: Basic OCSP Response
Version: 1 (0x0)
Responder Id: C = IL, O = StartCom Ltd., CN = StartCom Class 1 Server OCSP Signer
Produced At: Feb 18 17:53:53 2014 GMT
Responses:
Certificate ID:
Hash Algorithm: sha1
Issuer Name Hash: 6568874F40750F016A3475625E1F5C93E5A26D58
Issuer Key Hash: EB4234D098B0AB9FF41B6B08F7CC642EEF0E2C45
Serial Number: 0B60D5
Cert Status: good
Stapled OCSP means no blocking!
OCSP stapling increases certificate size! Is this a problem for your site? Better check.

How many RTTs does your certificate incur?
● Average certificate chain depth: 2-3 certificates
● Average certificate size: ~1~1.5KB
● Plus OCSP response…
● Many cert chains overflow the old TCP (4 packet) CWND
● Upgrade your servers to use IW10!
3+ RTT TLS handshake due to 2 RTT cert?

Check your server, you may be surprised...
● Capture a tcpdump of your handshake and check the exchange
● Some servers will pause on “large certificates” until they get an ACK for the
first 4KB of the certificate (doh!)
nginx <1.5.6, HAProxy <1.5-dev22 incur extra RTT, even w/ IW10!

1-RTT non-resumed handshake with TLS False Start
Client sends application data
immediately after “Finished”.
● Eliminates 1RTT
● No protocol changes...
● Only timing is affected
In practice…
● Some servers break (ugh)
● Hence, opt-in behavior...

Deploying False Start...
● Chrome and Firefox
Chrome and Firefox
● ALPN advertisement - e.g. “http/1.1”
● Forward secrecy ciphersuite - e.g. ECDHE
Safari
● Forward secrecy ciphersuite
Internet Explorer
● Blacklist + timeout
● If handshake fails, retry without False Start
TL;DR: enable ALPN advertisement and forward secrecy to get 1RTT handshakes.

Ingredients for a 1-RTT TLS experience…
1. False Start = 1-RTT handshake for new visitors
a. New users have to perform public-key crypto handshake
1. Session resumption = 1-RTT handshake for returning visitors
a. Plus, we can skip public-key crypto by reusing previous parameters
1. OCSP Stapling
a. No OCSP blocking to verify certificate status
1. False Start + Session Resumption + OCSP stapling
a. 1-RTT handshake for new and returning visitors
b. Returning visitors can skip the public-key crypto

What’s wrong with this picture?
300ms RTT, 1.5Mbps...
● It’s a 2-RTT time to first byte!
Large records are buffered, which delays processing!
● It’s a 2-RTT handshake… we know better!
o At least there is no OSCP overhead!

TLS record size + latency gotchas...
This record is split across 8 TCP packets
TLS allows up to 16KB of application per record
● New connection + 16KB record = CWND overflow and an extra RTT
● Lost or delayed packet delays processing of entire record

Optimizing record size…
1. Google servers implement dynamic record sizing
a. New connections start with 1400 byte records (aka, single MTU)
b. After ~1MB is sent, switch to 16K records
c. After ~1s of inactivity, reset to 1400 byte records
1. Most servers don’t optimize this case at all...
a. HAProxy recently landed dynamic sizing patch - yay!
b. Nginx landed ssl_buffer_size: static override - better, but meh...
c. Cloudflare released patch for Nginx dynamic record sizing
TL;DR: there is no “perfect record size”. Adjust dynamically.

Quick sanity check...
theory is great, but does this all work in practice?

Tuning Nginx TLS Time To First Byte (TTTFB)
● Pre 1.5.7: bug for 4KB+ certs, resulting in 3RTT+ handshakes
● 1.7.1 added ssl_buffer_size: 4KB record size remove an RTT
● 1.7.1 with NPN and forward secrecy → 1RTT handshake
https://www.igvita.com/2013/12/16/optimizing-nginx-tls-time-to-first-byte/

https://hpbn.co/https://www.igvita.com/
@igrigorik

What to do?
•Use a “good” CDN
•Or…

What to do?
• Set reasonable default TCP send buffer sizes
• Enable TCP_NOTSENT_LOWAT
• Enable BBR
• Use a Web Server with good prioritization support

Linux
net.core.wmem_max = 250000000
net.ipv4.tcp_wmem = 10240 102400 250000000
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_notsent_lowat = 16384
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr

HTTP/2 PUSH
• Disable it.
• May be automatically triggered by “preload” headers

HTTP/2 PUSH – Only works when
• There is a back-end that is slow to generate HTML
• You know HOW slow it is going to be
• You know exactly how much data you can send
without delaying the HTML

Preloading
LayoutFonts
<link rel="preload" href="//www.i.cdn.cnn.com/.a/fonts/cnn/3.7.2/cnnsans-
regular.woff2" as="font" type="font/woff2" crossorigin="anonymous">

Preload/Appcache Chrome Bugs
Preloaded Fonts Block CSS
(blocking render)
https://andydavies.me/blog/2019/02/12/preloading-fonts-and-the-puzzle-of-priorities/

https://twitter.com/AndyDavies/status/1060558550492155909

Resource Unbundling
• Avoid temptation to stop bundling JS/CSS
• Image sprites?
• Lose cross-resource compression efficiency
• Browser overhead with lots of requests
• Check cache
• Cross-process IPCs
• Ad-blocker/AV overhead
https://engineering.khanacademy.org/posts/js-packaging-http2.htm

Server-initiated prioritization

HTTP/3 (and QUIC)
• UDP-based (port 443)
• Moves TCP logic into application layer
• Less OS-level failure modes
• More application responsibility
• Moves loss recovery from per-connection to per-stream
• Will NEVER reach 100% availability
• Treat it as a progressive enhancement

HTTP/3 Prioritization?
• Currently replicating HTTP/2 tree
• Join the discussion
•https://httpwg.org/

Thank You
Patrick Meenan
pmeenan@webpagetest.org
@PatMeenan

We’re Hiring!
https://www.facebook.com/careers/

Http2 in practice

More Related Content

What's hot

Similar to Http2 in practice

More from Patrick Meenan

Http2 in practice