Inside election night at The New York Times | Altitude NYC
Over the past two decades, The New York Times has successfully made the transition to a digital-first company while maintaining its reputation as one of the most trusted news sources in the world. CTO Nick Rockwell discusses the latest steps in the Times’ journey: implementing Fastly in preparation for record traffic during the 2016 presidential election. He covers the impact the NYT saw to backend load and to global performance, as well as the long-term implications for their infrastructure. And of course, he also discusses the timeline of election night, and how surprise and unpredictability led to rapid shifts in reader behavior and the NYT’s response.
Preparation, News Style...
Good: Ready for Anything
Better: Ready for Anything + Exhaustive Prep
✘ Who’s responsible?
✘ What if something goes wrong?
✘ Oh it did go wrong in 2012…
✘ What if there’s more load than we expect?
✓ Team, Roles & Responsibilities
✓ Build an Election Night Runbook (16 pages!)
✓ Dry runs around debates
✓ Integrate a CDN...
8/21 - Olympics are a wrap
8/24 - First Election prep meeting
9/21 - Meet w/ Fastly
9/23 - Commit to using Fastly
10/25 - In production
11/5 - Agreement signed
11/8 - Election night!
Jon: can use 90 80 70, ok 60 percent of VCL code.
Plan B for Elections
8 Additional www-varnish for content requests
8 Additional www-varnish for userinfo requests
8 Additional www-fe (just in case)
4 Additional www-varnish for elections app
Mobileweb and video load tests next week to inform possible buildout
Final test tonight for MobileWeb
Also Auth Scaling, Warming Amazon ELBs.. etc..
You already know this but...
“A DDoS attack is like someone anonymously placing a
press ad including your phone number and offering an
Aston Martin for sale at $200. You’re bombarded by calls,
your life is misery, the callers aren’t aware they’re part of a
trick, and your attacker is almost impossible to trace.”
Joys of CDN
� Scaled caching
� Better performance due to edge delivery
� DDoS protection
Slightly less obvious:
� Consistent performance
� Better everything - TLS negotation, compression, etc.
� Cascading effects of smaller, simpler infrastructure
What is risk?
It’s not risk if someone else is responsible.
It’s not risk if there’s no chance of
It’s still risk if you mitigate it.
It’s still risk if you hedge, create
contingencies, and plan.
Risk and Accountability
Our current ideologies of risk-taking and
accountability are at odds.
Risk-taking can only take place within a
context of judgment that is opaque.
A culture that values “boldness”, action-bias,
or the appearance of certainty, usually
destroys true risk-taking.
Boring bullet list of stuff we’re changing
Logic changes in varnish if the request came from Fastly
Moving Abra back to the client-side (yay Ken)
Userinfo back to the client-side (can’t decrypt the session cookie..yet)
Audit what www services we can cache in Fastly
Connected CREAM to Fastly’s purge API
???????????????????? SO MANY THINGS
When are things happening
10/4-5 - First rounds of production tests (WWW)
10/09 - Testing during debate (WWW)
10/13-19 - Testing with Mobile Web (internally, public, debate)
10/25 - Production launch
11/08 - Hide somewhere and hope Trump doesn’t win
11/10’ish - back to datacenter if necessary (it wasn’t…)
To end: we are just getting started...
� Continuing to shrink provisioning
� Continuing to “purge” or replace downstream caching
� Logs into BigQuery
� Looking at edge processing opportunities:
⛈ Load balancing, WAF
⛈ Image service
⛈ Auth & Meter