This document discusses how the BBC scaled its use of Varnish caching across its infrastructure to handle high traffic periods like the 2012 Olympics. In 2009, Varnish was first used to cache the iPlayer application, but it has since expanded to cache hundreds of applications across the BBC platform. During the Olympics, Varnish helped serve over 10 million browsers per day to the sports section while handling both UK and international traffic on mobile and desktop. While HD streaming could not be cached due to high availability needs, other applications saw performance increases of up to 4 times by including dynamic content via ESI. The BBC aims to further expand Varnish caching across its platform and improve monitoring.
1. Varnish at the BBC
Winning Gold in the London 2012 Olympic Games
Graham Lyons
2. Varnish at the BBC
● First deployed in 2009
○ Specifically caching layer for iPlayer
○ New dynamic Platform
● Platform has grown to 100s of applications
How do we scale Varnish across the Platform?
(It served LOTS of traffic during the Olympics)
3. In the BBC Infrastructure
● bbc.co.uk is made up of lots of applications
● Load balancer in front
● Sends request to Varnish
● Varnish sends request to another load
balancer
● Second layer of load balancer distributes
load across application servers
○ All applications installed on all servers
5. Routing
● First load balancer adds header with name of
a pool of servers
● Varnish forwards it on
● Second load balancer knows what to do with
the header to route the request
6. How do we use Varnish
● General HTTP cache
● Make use of header manipulation for more
efficient caching, e.g.
○ GeoIP
○ Device detection
○ Cookie decomposition
8. Where should we take it?
● BBC Platform HTTP cache
● Platform-wide features
● Different requirements to application-
specific Varnish
9. ...2012 (What we changed)
● Removed application logic (mostly)
● Added features to be used generally
○ e.g. GeoIP, Device detection
● Features on by default - no special
configuration
● Try to stay vanilla and RFC2616(ish)
10. Features? What features?
● GeoIP lookup
● Device meta information
● Cookie decomposition
○ 'Signed in' header
All exposed as headers added to the request
Companion PHP libraries to manage header
access and Vary header on response
11. Geo and Device Information
● Looked up via an HTTP call to respective
services
● Logic in C library
● Cached locally (in process, in memory cache)
○ 70% hit for geoip
○ >95% hit for device data
12. Cookies?
● Incoming Cookie header split into a header
for each value
● e.g. Cookie: UID=4321...
○ ...becomes: X-Cookie-UID: 4321
Actually only operates on cookie values with
particular prefixes (introduced for the Great EU
Cookie Debacle)
13. 'Signed in' header
● Boolean
○ Signed in
○ Not signed in
● Allows caching of page for 'not signed in'
state
14. Cache Variations
All these features allow more efficient cache
variations.
Can cache variations based on:
● where the user is
● what type of device they're using
● any personalisations
e.g. Norwegian Android user who loves
Eastenders gets served straight from the cache
15. Response to outside world
● External caches don't know about request
headers Varnish adds
● Responses have to be reduced to being
privately cacheable
● GeoIP exception
○ lookup is done on the last step outside our
infrastructure
16. Vary: Cookie?
● Originally planned to send this out for
responses using X-Cookie-...
● Analytics cookie on site
● Changes on each page...
● Send responses out as uncacheable
17. Setting a Unique Cookie
● Previously sent from backend
● Generate unique ID cookie in Varnish
● Allows cookie to be set and content served
from cache
21. Olympic Requirements
● UK and non-UK versions
● Mobile and Desktop versions
● Traffic served by multiple applications
22. Olympic Requirements
● UK and non-UK versions
● Mobile and Desktop versions
● Traffic served by multiple applications
I think we can handle this...
28. Varnish and HD Streaming
● 24 HD streams
● Planned to use Varnish at the front
● Cached very, very well
● Needed to be highly available
● HA layer didn't hold up
● Had to use a load balancer instead and use
the cache there
29. What else has hurt?
ESI
● Increase in complexity
● Working out 'best practice'
● Seg faults!
○ Overflow of sess_workspace
30. However...
● Synthetic end point generated in Varnish
● Included as ESI
● Very good performance...
○ Almost 4 times previous load
31. Other pains
● No Saint mode
○ Load balancing behind and multiple apps
● Network bandwidth
○ As few boxes as possible
32. Next?
● Everywhere!
○ Ubiquitous caching layer
○ Already have most big players
● More monitoring
● Version 3
○ VMODs?
● Make it simpler
○ Remove anything we can
33. tl;dr
Took Varnish from being an application-
specific component to a Platform-wide essential