1. The World Is On Fire And
So Is Your Website
Architecting systems for extremely bursty web
traffic driven by the news cycle
Ann Lewis, CTO @
@ann_lewis
3. What is MoveOn?
● Grassroots campaigning
● Fighting for social justice, progressive policies,
progressive candidates
● A community of millions of progressives in all 50 states
4. What is MoveOn?
● Small, scrappy, fully-distributed team
● Nationally impactful programs powered by tech tools and
data
● A complex ecosystem of 30+ websites and tools that need
to scale on a nonprofit budget
5. Who am I?
● MoveOn’s CTO since 2015
● Software engineer and technical leader for 15+ years
● Alum of Carnegie Mellon, Amazon, Rosetta Stone, handful
of startups, consulting companies
● Excited about building tech that powers collective action
6. Agenda
● The new attention economy
● Story: a protest goes viral
● The tech behind mass mobilization infrastructure
● How to scale a complex system architecture in the new
attention economy, on a nonprofit budget
7. A walk down memory lane
Show of hands:
Who remembers Slashdot?
Who remembers the internet before
big social media?
9. The “Slashdot” effect of the 90s
A massive surge of web traffic that occurs when a popular website
links to a smaller website.
10. The attention economy
● As the volume of information and
news grows, attention becomes a
scarce resource
● All content publishers compete for
this aggregate attention
● Social media platforms attempt to
control engagement around viral
content
Content
Attention
11. The attention economy evolves
● Previous generation:
○ Social news sites like Slashdot aggregated attention
○ Virality happened via cumulative direct user actions, like
upvoting
● Today:
○ Dominance of social media platforms
○ Virality is controlled by the platforms, who make the rules
around who sees what, when and why
12. Feedback Loops
The news cycle is a dumpster fire, and social media feedback
loops are very effective at quickly amplifying the most inflammatory
content to virality.
13. Oligarchy?
● America’s economic oligarchy: over the last
generation, a small number of people have grown
more rich while middle and working class wages
have stagnated
● On most social media platforms, 0.1% of users
have > 100K followers, and 2% have 10K-100K
followers
● Most everyone else has 700 followers or less
● Social media is an oligarchy too!
14. Influencers
● Influencers: social media users with > 100K
followers
● Micro-influencers: social media users with
10K-100K followers
● Influencers control the nature of virality in
today’s attention economy
● Yes, your favorite gen Z-er was right about
becoming an instagram influencer
16. No One Is Above The Law
● Nov 6: US election day. Everyone working on elections is
proud and exhausted. Highest turnout for a midterm since
1914!
● Nov 7 2:40pm: Trump crosses a Mueller investigation
“red line”: fires Jeff Sessions and replaces with loyalist
● Nov 7 5:10pm: Trump Is Not Above the Law’ protest
coordination network launches
17. Trump Is Not Above the Law
● Nov 7 5:10pm: Protest hub website lists 700 protest
events nationwide, 400K people RSVPed
● Nov 7, 7pm: Protest call-to-action gets 10Ks of retweets,
we observe moderate surges of traffic
● Nov 7, 9pm: Influencer Rachel Maddow mentions protest
website on evening show, traffic surges to 3.5MM views,
site falls over (but quickly comes back up)
19. Trump Is Not Above The Law
● 11/8/2018 12pm ET: Protest hub website has
accumulated ~1000 events nationwide, ~500K people
RSVP. 300 new events and 100K more RSVPs in 24
hours!
● 11/8/2018 5pm local time: Nationwide protests!
21. Key Technical Takeaways
● Today, the observed behavior of virality is
tightly controlled by the social media
platforms
● “Going viral” only means traffic surges if
the platforms decide it does.
● With a major exception: influencers can
still generate organic viral behavior
23. The Tech Behind Protest Networks
● Hub website: a database of protest events, protest prep
material content hub, event map and search tools
● Crowdsourced event creation: anyone can host a
protest
● Mobilization tools drive event creation and RSVPs: we
email, text, and buy targeted social media ads to find
people interested in nearby protest events
24. Stepping Up to Big Moments
● No one knows when the next big moment will happen
● We need to be able to react and launch quickly
● Massive scale is critical to impact
● ... all on a nonprofit budget!
25. Problems to Solve
● Can’t predict or control when content will go viral
● Can’t afford to maintain big company levels of tech
infrastructure all the time
● Our infrastructure = a complex 30+ entity ecosystem of
in-house and vendor tools. Scale testing complex
architecture is very time-consuming.
26. Monitoring and Measurement
● Monitoring is key: monitor everything,
through the architectural stack, including
vendor tools
● SLAs are key:
○ Aggressive SLAs for in-house tools
○ Observe vendor uptime and availability
○ Plan around cascading failures
EWarren has a plan.
Do you?
27. Vendors
● Your system doesn’t scale if your vendors don’t scale.
● Get SLAs and incident response plans into your contracts
● Build a strong relationship with vendors before the next
big scaling emergency.
● Do regular build vs buy and platform analysis and
understand the cost of switching if you need to
28. Scaling Incident Response Plans
● What to do before, during and after a scaling incident
● Who to call, what to check, what decisions to make
● Hot backup failover plans for in-house systems
● Static or stopgap backups for vendor systems.
29. Granular Autoscaling
● Fast reaction time is key
● Breakout virality will have a 100x scaling impact within
minutes, not hours
● User action curve will be order of magnitude minutes
● We can’t miss 15min waiting for autoscaling to kick in
30. Granular Autoscaling
● Consider microservices for scaling bottlenecks:
spinning up additional containers is much faster than
booting up additional virtual machines
● It’s often cheaper: the per-invocation cost of handling a
traffic surge is 10% of the cost of dedicated hardware
during the scaling period
31. Granular Autoscaling
● Scaling response plan should include all distributed
systems scaling levers:
○ Quickly add servers (or containerized capacity)
○ Just-in-time upgrade hardware
○ Enable additional caching
○ Queue up bursts of writes to process later
32. Don’t Forget the CAP Theorem
● Consistency, Availability and Partition Tolerance: pick 2
● Analyze your architecture ahead of the scaling incident
and map out the choices to make in the event of loss of
data consistency, component availability, and network
partitioning
● Include this in your scaling incident response plan, and be
prepared to make hard choices
33. Conclusion
● Big social media companies have changed the shape of
the attention economy
● Social media is an oligarchy, and influencers win
● Traffic surges happen in O(minutes) instead of O(hours)
● Scale planning is harder
● Scale planning is also key: monitor everything, create
scaling emergency response plans, get granular