SlideShare a Scribd company logo
r/place
How We Built and Scaled Reddit’s 2017 April Fools’ Project
Daniel Ellis
u/daniel
@I_am_Dan_Ellis
What Was It?
Individually you can create something.
Together you can create something more.
1.1M
Unique Users
6
150K
Concurrent Users
7
16.5M
Tiles Placed
72
Hours
9
Challenges
Challenges
● You only get one shot to launch, and it’ll only last a few days.
● You shouldn’t affect the main site.
● It’s a small team, and you have other stuff to do.
● You have no idea what users will do with it.
No Pressure!
Development
Workflow
How it Worked
Overview - Getting the Board
app server
CDN
(fastly)
client
Overview - Setting a Pixel
app server
client
websockets
server
event
collector
1
2
3
4
Backend Choice
Cassandra vs. Redis
Cassandra
● Initial MVP in Cassandra
● Quite mature for us
○ 36 nodes
○ ~96TB of data
○ 90k reads/sec
○ 30k writes/sec
● Downsides
○ Doesn’t fit this project’s data model well
○ Potentially affects the main site
Redis
● Upsides
○ Fits the data model well
○ Doesn’t affect the main site
● Downsides
○ We don’t use it a lot, mostly for counting
redis
Format of the Board
SETBIT?
● SETBIT key offset value
● SETBIT canvas 100 1
SETBIT canvas 101 1
SETBIT canvas 102 1
SETBIT canvas 103 1
BITFIELD - Setting a Pixel
● BITFIELD key SET TYPE OFFSET VALUE
● BITFIELD canvas SET u4 #25 15
BITFIELD - Getting a Pixel
● Simple GET command
Load
Testing
● ~180k writes per second estimated
● 1 read per second == loss of 2k writes per second
Board Load Time
redis 10ms
cassandra >30s
Caching
client
CDN
(fastly)
application
cache
redis cassandra
Takeaways
Have knobs you can tune, switches you can flip.
Some of them will be crucial (like changing cooldown timers), and some you won’t have to resort to at all (like changing
caching behavior).
Takeaways
Load test everything with real-world data.
This showed the need for caching and for another backend store. Had we just tested simple reads and writes we might
have missed the problems arising from disparity in sizes for gets vs sets.
Takeaways
Use power principles.
You don’t have much time, so anything you choose to implement should have a disproportionate payoff.
Takeaways
Some things you think will matter won’t, some things you
don’t think will matter will, some things you think will matter
would have mattered but you’ll never know they mattered
because you did the thing that prevented them from
mattering.
Thanks!
Questions?

More Related Content

What's hot

Cours GRH.ppt
Cours GRH.pptCours GRH.ppt
Cours GRH.ppt
sergeomgba1
 
Le recrutement
Le recrutementLe recrutement
Le recrutement
Mohcine Boudanes
 
9 reporting pilotage_donnees_sociales
9 reporting pilotage_donnees_sociales9 reporting pilotage_donnees_sociales
9 reporting pilotage_donnees_socialesChristelle Ollivier
 
Introduction à la GRH
Introduction à la GRHIntroduction à la GRH
Introduction à la GRHAdama Ndiaye
 
Les techniques de recrutement.pdf
Les techniques de recrutement.pdfLes techniques de recrutement.pdf
Les techniques de recrutement.pdf
salmasamih2
 
Le bien-être au travail des salariés : un enjeu majeur pour les entreprises
Le bien-être au travail  des salariés : un enjeu majeur pour les entreprisesLe bien-être au travail  des salariés : un enjeu majeur pour les entreprises
Le bien-être au travail des salariés : un enjeu majeur pour les entreprises
Guillaume Testa
 
Formation M2i - GEPP : en quoi sa mise en place est-elle pertinente et d'actu...
Formation M2i - GEPP : en quoi sa mise en place est-elle pertinente et d'actu...Formation M2i - GEPP : en quoi sa mise en place est-elle pertinente et d'actu...
Formation M2i - GEPP : en quoi sa mise en place est-elle pertinente et d'actu...
M2i Formation
 
La motivation des ressources humaines travail final
La motivation des ressources humaines travail finalLa motivation des ressources humaines travail final
La motivation des ressources humaines travail final
Anwar Youssef
 
Exposé Marketin RH
Exposé Marketin RHExposé Marketin RH
Exposé Marketin RH
Faculty of Economic Studies
 
Recrutement 2.0; nouvelles méthodes et nouveaux métiers
Recrutement 2.0; nouvelles méthodes et nouveaux métiersRecrutement 2.0; nouvelles méthodes et nouveaux métiers
Recrutement 2.0; nouvelles méthodes et nouveaux métiers
Baptiste Defrent
 
le marketing de la santé
le marketing de la santéle marketing de la santé
le marketing de la santé
Anis Mzoughi
 
Le GRH
Le GRH Le GRH
Recrutement 2.0 - présentation générale
Recrutement 2.0 - présentation généraleRecrutement 2.0 - présentation générale
Recrutement 2.0 - présentation générale
Anthony Grolleau-Fricard
 
Mon offre de services
Mon offre de servicesMon offre de services
Mon offre de services
guest1b1d66
 
Conférence sur le Bonheur au Travail
Conférence sur le Bonheur au TravailConférence sur le Bonheur au Travail
Conférence sur le Bonheur au Travail
Pierre-Yves HOSTIN
 
Sujets de pfe pour etudiants en grh
Sujets de  pfe pour etudiants en  grhSujets de  pfe pour etudiants en  grh
Sujets de pfe pour etudiants en grh
ezzeddine mbarek
 

What's hot (20)

Cours GRH.ppt
Cours GRH.pptCours GRH.ppt
Cours GRH.ppt
 
Le recrutement
Le recrutementLe recrutement
Le recrutement
 
Motivation
MotivationMotivation
Motivation
 
9 reporting pilotage_donnees_sociales
9 reporting pilotage_donnees_sociales9 reporting pilotage_donnees_sociales
9 reporting pilotage_donnees_sociales
 
Introduction à la GRH
Introduction à la GRHIntroduction à la GRH
Introduction à la GRH
 
Les techniques de recrutement.pdf
Les techniques de recrutement.pdfLes techniques de recrutement.pdf
Les techniques de recrutement.pdf
 
Gestion des ressouces humaines
Gestion des ressouces humainesGestion des ressouces humaines
Gestion des ressouces humaines
 
Le bien-être au travail des salariés : un enjeu majeur pour les entreprises
Le bien-être au travail  des salariés : un enjeu majeur pour les entreprisesLe bien-être au travail  des salariés : un enjeu majeur pour les entreprises
Le bien-être au travail des salariés : un enjeu majeur pour les entreprises
 
JAE-2018-gestion-temps (2).ppt
JAE-2018-gestion-temps (2).pptJAE-2018-gestion-temps (2).ppt
JAE-2018-gestion-temps (2).ppt
 
Formation M2i - GEPP : en quoi sa mise en place est-elle pertinente et d'actu...
Formation M2i - GEPP : en quoi sa mise en place est-elle pertinente et d'actu...Formation M2i - GEPP : en quoi sa mise en place est-elle pertinente et d'actu...
Formation M2i - GEPP : en quoi sa mise en place est-elle pertinente et d'actu...
 
La motivation des ressources humaines travail final
La motivation des ressources humaines travail finalLa motivation des ressources humaines travail final
La motivation des ressources humaines travail final
 
La fonction ressources humaines
La fonction ressources humainesLa fonction ressources humaines
La fonction ressources humaines
 
Exposé Marketin RH
Exposé Marketin RHExposé Marketin RH
Exposé Marketin RH
 
Recrutement 2.0; nouvelles méthodes et nouveaux métiers
Recrutement 2.0; nouvelles méthodes et nouveaux métiersRecrutement 2.0; nouvelles méthodes et nouveaux métiers
Recrutement 2.0; nouvelles méthodes et nouveaux métiers
 
le marketing de la santé
le marketing de la santéle marketing de la santé
le marketing de la santé
 
Le GRH
Le GRH Le GRH
Le GRH
 
Recrutement 2.0 - présentation générale
Recrutement 2.0 - présentation généraleRecrutement 2.0 - présentation générale
Recrutement 2.0 - présentation générale
 
Mon offre de services
Mon offre de servicesMon offre de services
Mon offre de services
 
Conférence sur le Bonheur au Travail
Conférence sur le Bonheur au TravailConférence sur le Bonheur au Travail
Conférence sur le Bonheur au Travail
 
Sujets de pfe pour etudiants en grh
Sujets de  pfe pour etudiants en  grhSujets de  pfe pour etudiants en  grh
Sujets de pfe pour etudiants en grh
 

Similar to Altitude SF 2017: Reddit - How we built and scaled r/place

RedisConf17 - Reddit - How We Built and Scaled r/place
RedisConf17 - Reddit - How We Built and Scaled r/placeRedisConf17 - Reddit - How We Built and Scaled r/place
RedisConf17 - Reddit - How We Built and Scaled r/place
Redis Labs
 
Cloud arch patterns
Cloud arch patternsCloud arch patterns
Cloud arch patterns
Corey Huinker
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadKrivoy Rog IT Community
 
Piano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processingPiano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processing
MartinStrycek
 
David Max: A Tale of Two Systems | Nowhere Developers 2018
David Max: A Tale of Two Systems | Nowhere Developers 2018David Max: A Tale of Two Systems | Nowhere Developers 2018
David Max: A Tale of Two Systems | Nowhere Developers 2018
Nowhere Developers Conference
 
AWS User Group October
AWS User Group OctoberAWS User Group October
AWS User Group October
PolarSeven Pty Ltd
 
A Tale of Two Systems - Insights from Software Architecture
A Tale of Two Systems - Insights from Software ArchitectureA Tale of Two Systems - Insights from Software Architecture
A Tale of Two Systems - Insights from Software Architecture
David Max
 
THE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEAST
THE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEASTTHE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEAST
THE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEAST
Opher Dubrovsky
 
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutesDruid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Shivji Kumar Jha
 
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
Codemotion
 
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...
Brian Brazil
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Omid Vahdaty
 
Our journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scaleOur journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scale
Itai Yaffe
 
Cassandra Day London 2015: The Resilience of Apache Cassandra
Cassandra Day London 2015: The Resilience of Apache CassandraCassandra Day London 2015: The Resilience of Apache Cassandra
Cassandra Day London 2015: The Resilience of Apache Cassandra
DataStax Academy
 
iFood on Delivering 100 Million Events a Month to Restaurants with Scylla
iFood on Delivering 100 Million Events a Month to Restaurants with ScyllaiFood on Delivering 100 Million Events a Month to Restaurants with Scylla
iFood on Delivering 100 Million Events a Month to Restaurants with Scylla
ScyllaDB
 
Converting Your Legacy Data to S1000D
Converting Your Legacy Data to S1000DConverting Your Legacy Data to S1000D
Converting Your Legacy Data to S1000D
dclsocialmedia
 
Big data & frameworks: no book for you anymore
Big data & frameworks: no book for you anymoreBig data & frameworks: no book for you anymore
Big data & frameworks: no book for you anymore
Stfalcon Meetups
 
Big data & frameworks: no book for you anymore.
Big data & frameworks: no book for you anymore.Big data & frameworks: no book for you anymore.
Big data & frameworks: no book for you anymore.
Roman Nikitchenko
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
Edward Capriolo
 
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
Codemotion Tel Aviv
 

Similar to Altitude SF 2017: Reddit - How we built and scaled r/place (20)

RedisConf17 - Reddit - How We Built and Scaled r/place
RedisConf17 - Reddit - How We Built and Scaled r/placeRedisConf17 - Reddit - How We Built and Scaled r/place
RedisConf17 - Reddit - How We Built and Scaled r/place
 
Cloud arch patterns
Cloud arch patternsCloud arch patterns
Cloud arch patterns
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
 
Piano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processingPiano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processing
 
David Max: A Tale of Two Systems | Nowhere Developers 2018
David Max: A Tale of Two Systems | Nowhere Developers 2018David Max: A Tale of Two Systems | Nowhere Developers 2018
David Max: A Tale of Two Systems | Nowhere Developers 2018
 
AWS User Group October
AWS User Group OctoberAWS User Group October
AWS User Group October
 
A Tale of Two Systems - Insights from Software Architecture
A Tale of Two Systems - Insights from Software ArchitectureA Tale of Two Systems - Insights from Software Architecture
A Tale of Two Systems - Insights from Software Architecture
 
THE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEAST
THE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEASTTHE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEAST
THE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEAST
 
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutesDruid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
 
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
 
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
 
Our journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scaleOur journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scale
 
Cassandra Day London 2015: The Resilience of Apache Cassandra
Cassandra Day London 2015: The Resilience of Apache CassandraCassandra Day London 2015: The Resilience of Apache Cassandra
Cassandra Day London 2015: The Resilience of Apache Cassandra
 
iFood on Delivering 100 Million Events a Month to Restaurants with Scylla
iFood on Delivering 100 Million Events a Month to Restaurants with ScyllaiFood on Delivering 100 Million Events a Month to Restaurants with Scylla
iFood on Delivering 100 Million Events a Month to Restaurants with Scylla
 
Converting Your Legacy Data to S1000D
Converting Your Legacy Data to S1000DConverting Your Legacy Data to S1000D
Converting Your Legacy Data to S1000D
 
Big data & frameworks: no book for you anymore
Big data & frameworks: no book for you anymoreBig data & frameworks: no book for you anymore
Big data & frameworks: no book for you anymore
 
Big data & frameworks: no book for you anymore.
Big data & frameworks: no book for you anymore.Big data & frameworks: no book for you anymore.
Big data & frameworks: no book for you anymore.
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
 

More from Fastly

Revisiting HTTP/2
Revisiting HTTP/2Revisiting HTTP/2
Revisiting HTTP/2
Fastly
 
Altitude San Francisco 2018: Preparing for Video Streaming Events at Scale
Altitude San Francisco 2018: Preparing for Video Streaming Events at ScaleAltitude San Francisco 2018: Preparing for Video Streaming Events at Scale
Altitude San Francisco 2018: Preparing for Video Streaming Events at Scale
Fastly
 
Altitude San Francisco 2018: Building the Souther Hemisphere of the Internet
Altitude San Francisco 2018: Building the Souther Hemisphere of the InternetAltitude San Francisco 2018: Building the Souther Hemisphere of the Internet
Altitude San Francisco 2018: Building the Souther Hemisphere of the Internet
Fastly
 
Altitude San Francisco 2018: The World Cup Stream
Altitude San Francisco 2018: The World Cup StreamAltitude San Francisco 2018: The World Cup Stream
Altitude San Francisco 2018: The World Cup Stream
Fastly
 
Altitude San Francisco 2018: We Own Our Destiny
Altitude San Francisco 2018: We Own Our DestinyAltitude San Francisco 2018: We Own Our Destiny
Altitude San Francisco 2018: We Own Our Destiny
Fastly
 
Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...
Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...
Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...
Fastly
 
Altitude San Francisco 2018: Moving Off the Monolith: A Seamless Migration
Altitude San Francisco 2018: Moving Off the Monolith: A Seamless MigrationAltitude San Francisco 2018: Moving Off the Monolith: A Seamless Migration
Altitude San Francisco 2018: Moving Off the Monolith: A Seamless Migration
Fastly
 
Altitude San Francisco 2018: Bringing TLS to GitHub Pages
Altitude San Francisco 2018: Bringing TLS to GitHub PagesAltitude San Francisco 2018: Bringing TLS to GitHub Pages
Altitude San Francisco 2018: Bringing TLS to GitHub Pages
Fastly
 
Altitude San Francisco 2018: HTTP Invalidation Workshop
Altitude San Francisco 2018: HTTP Invalidation WorkshopAltitude San Francisco 2018: HTTP Invalidation Workshop
Altitude San Francisco 2018: HTTP Invalidation Workshop
Fastly
 
Altitude San Francisco 2018: HTTP/2 Tales: Discovery and Woe
Altitude San Francisco 2018: HTTP/2 Tales: Discovery and WoeAltitude San Francisco 2018: HTTP/2 Tales: Discovery and Woe
Altitude San Francisco 2018: HTTP/2 Tales: Discovery and Woe
Fastly
 
Altitude San Francisco 2018: How Magento moved to the cloud while maintaining...
Altitude San Francisco 2018: How Magento moved to the cloud while maintaining...Altitude San Francisco 2018: How Magento moved to the cloud while maintaining...
Altitude San Francisco 2018: How Magento moved to the cloud while maintaining...
Fastly
 
Altitude San Francisco 2018: Scaling Ethereum to 10B requests per day
Altitude San Francisco 2018: Scaling Ethereum to 10B requests per dayAltitude San Francisco 2018: Scaling Ethereum to 10B requests per day
Altitude San Francisco 2018: Scaling Ethereum to 10B requests per day
Fastly
 
Altitude San Francisco 2018: Authentication at the Edge
Altitude San Francisco 2018: Authentication at the EdgeAltitude San Francisco 2018: Authentication at the Edge
Altitude San Francisco 2018: Authentication at the Edge
Fastly
 
Altitude San Francisco 2018: WebAssembly Tools & Applications
Altitude San Francisco 2018: WebAssembly Tools & ApplicationsAltitude San Francisco 2018: WebAssembly Tools & Applications
Altitude San Francisco 2018: WebAssembly Tools & Applications
Fastly
 
Altitude San Francisco 2018: Testing with Fastly Workshop
Altitude San Francisco 2018: Testing with Fastly WorkshopAltitude San Francisco 2018: Testing with Fastly Workshop
Altitude San Francisco 2018: Testing with Fastly Workshop
Fastly
 
Altitude San Francisco 2018: Fastly Purge Control at the USA TODAY NETWORK
Altitude San Francisco 2018: Fastly Purge Control at the USA TODAY NETWORKAltitude San Francisco 2018: Fastly Purge Control at the USA TODAY NETWORK
Altitude San Francisco 2018: Fastly Purge Control at the USA TODAY NETWORK
Fastly
 
Altitude San Francisco 2018: WAF Workshop
Altitude San Francisco 2018: WAF WorkshopAltitude San Francisco 2018: WAF Workshop
Altitude San Francisco 2018: WAF Workshop
Fastly
 
Altitude San Francisco 2018: Logging at the Edge
Altitude San Francisco 2018: Logging at the Edge Altitude San Francisco 2018: Logging at the Edge
Altitude San Francisco 2018: Logging at the Edge
Fastly
 
Altitude San Francisco 2018: Video Workshop Docs
Altitude San Francisco 2018: Video Workshop DocsAltitude San Francisco 2018: Video Workshop Docs
Altitude San Francisco 2018: Video Workshop Docs
Fastly
 
Altitude San Francisco 2018: Programming the Edge
Altitude San Francisco 2018: Programming the EdgeAltitude San Francisco 2018: Programming the Edge
Altitude San Francisco 2018: Programming the Edge
Fastly
 

More from Fastly (20)

Revisiting HTTP/2
Revisiting HTTP/2Revisiting HTTP/2
Revisiting HTTP/2
 
Altitude San Francisco 2018: Preparing for Video Streaming Events at Scale
Altitude San Francisco 2018: Preparing for Video Streaming Events at ScaleAltitude San Francisco 2018: Preparing for Video Streaming Events at Scale
Altitude San Francisco 2018: Preparing for Video Streaming Events at Scale
 
Altitude San Francisco 2018: Building the Souther Hemisphere of the Internet
Altitude San Francisco 2018: Building the Souther Hemisphere of the InternetAltitude San Francisco 2018: Building the Souther Hemisphere of the Internet
Altitude San Francisco 2018: Building the Souther Hemisphere of the Internet
 
Altitude San Francisco 2018: The World Cup Stream
Altitude San Francisco 2018: The World Cup StreamAltitude San Francisco 2018: The World Cup Stream
Altitude San Francisco 2018: The World Cup Stream
 
Altitude San Francisco 2018: We Own Our Destiny
Altitude San Francisco 2018: We Own Our DestinyAltitude San Francisco 2018: We Own Our Destiny
Altitude San Francisco 2018: We Own Our Destiny
 
Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...
Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...
Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...
 
Altitude San Francisco 2018: Moving Off the Monolith: A Seamless Migration
Altitude San Francisco 2018: Moving Off the Monolith: A Seamless MigrationAltitude San Francisco 2018: Moving Off the Monolith: A Seamless Migration
Altitude San Francisco 2018: Moving Off the Monolith: A Seamless Migration
 
Altitude San Francisco 2018: Bringing TLS to GitHub Pages
Altitude San Francisco 2018: Bringing TLS to GitHub PagesAltitude San Francisco 2018: Bringing TLS to GitHub Pages
Altitude San Francisco 2018: Bringing TLS to GitHub Pages
 
Altitude San Francisco 2018: HTTP Invalidation Workshop
Altitude San Francisco 2018: HTTP Invalidation WorkshopAltitude San Francisco 2018: HTTP Invalidation Workshop
Altitude San Francisco 2018: HTTP Invalidation Workshop
 
Altitude San Francisco 2018: HTTP/2 Tales: Discovery and Woe
Altitude San Francisco 2018: HTTP/2 Tales: Discovery and WoeAltitude San Francisco 2018: HTTP/2 Tales: Discovery and Woe
Altitude San Francisco 2018: HTTP/2 Tales: Discovery and Woe
 
Altitude San Francisco 2018: How Magento moved to the cloud while maintaining...
Altitude San Francisco 2018: How Magento moved to the cloud while maintaining...Altitude San Francisco 2018: How Magento moved to the cloud while maintaining...
Altitude San Francisco 2018: How Magento moved to the cloud while maintaining...
 
Altitude San Francisco 2018: Scaling Ethereum to 10B requests per day
Altitude San Francisco 2018: Scaling Ethereum to 10B requests per dayAltitude San Francisco 2018: Scaling Ethereum to 10B requests per day
Altitude San Francisco 2018: Scaling Ethereum to 10B requests per day
 
Altitude San Francisco 2018: Authentication at the Edge
Altitude San Francisco 2018: Authentication at the EdgeAltitude San Francisco 2018: Authentication at the Edge
Altitude San Francisco 2018: Authentication at the Edge
 
Altitude San Francisco 2018: WebAssembly Tools & Applications
Altitude San Francisco 2018: WebAssembly Tools & ApplicationsAltitude San Francisco 2018: WebAssembly Tools & Applications
Altitude San Francisco 2018: WebAssembly Tools & Applications
 
Altitude San Francisco 2018: Testing with Fastly Workshop
Altitude San Francisco 2018: Testing with Fastly WorkshopAltitude San Francisco 2018: Testing with Fastly Workshop
Altitude San Francisco 2018: Testing with Fastly Workshop
 
Altitude San Francisco 2018: Fastly Purge Control at the USA TODAY NETWORK
Altitude San Francisco 2018: Fastly Purge Control at the USA TODAY NETWORKAltitude San Francisco 2018: Fastly Purge Control at the USA TODAY NETWORK
Altitude San Francisco 2018: Fastly Purge Control at the USA TODAY NETWORK
 
Altitude San Francisco 2018: WAF Workshop
Altitude San Francisco 2018: WAF WorkshopAltitude San Francisco 2018: WAF Workshop
Altitude San Francisco 2018: WAF Workshop
 
Altitude San Francisco 2018: Logging at the Edge
Altitude San Francisco 2018: Logging at the Edge Altitude San Francisco 2018: Logging at the Edge
Altitude San Francisco 2018: Logging at the Edge
 
Altitude San Francisco 2018: Video Workshop Docs
Altitude San Francisco 2018: Video Workshop DocsAltitude San Francisco 2018: Video Workshop Docs
Altitude San Francisco 2018: Video Workshop Docs
 
Altitude San Francisco 2018: Programming the Edge
Altitude San Francisco 2018: Programming the EdgeAltitude San Francisco 2018: Programming the Edge
Altitude San Francisco 2018: Programming the Edge
 

Recently uploaded

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 

Recently uploaded (20)

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 

Altitude SF 2017: Reddit - How we built and scaled r/place

Editor's Notes

  1. Hey everybody, thanks for coming. I’m here today to talk about place, reddit’s april fools day project for 2017. Though I’m assuming at least a few people here know exactly what place was, I’ll give a quick overview just so no one is completely lost.
  2. At its core, place was a large scale social experiment. Users were given one pixel every 5 minutes they could place anywhere on this 1000x1000 grid board (a million pixels total) from a palette of 16 colors. The idea was that by yourself, you couldn’t really draw anything, but together, through collaboration, you could make some great stuff. And as you can see here, people really, really, really did make some really awesome stuff. It lasted about 3 days, at which point we felt like the canvas was pretty full, and then we decided to stop it. It was really really cool to see just how much effort people put into this. People had dedicated discord chats, I think for rainbow road they had almost a thousand people in discord, with different duties split out into different channels like maintenance, diplomacy, construction. They had dedicated negotiators, template designers. Truly an awesome thing to behold. As cool as that is and as long as I could go on about all the individual parts and wars and factions that broke out, I’m here specifically to talk
  3. Let’s start with some initial challenges. The first is that you only get one shot. There’s no opportunity to beta test on a scale any larger than internal employee tests. There’s no slowly rolling out to some users. There’s a launch and an announcement and a sudden flood of users. You have a hard deadline. There’s like… simply no way to delay the fact that april fools is on april 1st. The next challenge is that it’s only going to last a few days. If it’s a really crappy experience those few days, you don’t have a few weeks to make it better and have people forget about the rocky launch. That’s just… the whole project. Also, we’d like to prevent issues from affecting the main site. It’s a lot easier to argue that we should have some time to work on this stuff if we aren’t bringing down the site and affecting everyday users. You have other work to do. Yep. You have a small team. You simply can’t fix every bug or do everything you want to do, so you have to aggressively prioritize. Finally, and this is less of a technical problem, but this is the internet, and on the internet people sometimes do amazing things and sometimes do really crappy things. This might end up being a complete mess full of racism and a bunch of other garbage that will reflect negatively on you.
  4. So, ya know, no pressure.
  5. And so to manage all this we used a very high tech task management system: a spreadsheet. We’d put tasks on here, highlight them when done, sometimes put tasks on there, sometimes not. And this also had such tasks as “Client buffering thing to do”. We were also able to put notes about the tasks, like this gem “josh is idiot” which I didn’t actually notice until I went to prepare this slide. It’s nice because there’s not necessary the pressure to be formal that’s on a full fledged task management system. So, it seems simple or silly but it actually worked just fine. We would self-prioritize and self-manage and it worked out pretty well. I in general would argue that those huge complex tasks management systems are mostly a distraction, and if you get the right type of people on a project they’ll self-organize and get the project done regardless.
  6. The first and simplest major path is actually reading the board. For this, the CDN played a key role. We cached the board “heavily” (with a TTL of 1 second), and let our CDN, fastly, do the work of serving the asset.
  7. This is for the most part pretty complete. There are some other minor things we haven’t included here like checking our postgres instance for the account age to make sure you could play the game, and of course, to make sure you actually have an account. But here is where the place-specific stuff was stored. The second major path is when drawing a pixel. For drawing a pixel, a request comes in and gets passed through the CDN and load balancer untouched, and the application server does a few checks to make sure you’re logged in, your account is of the right age, and that you haven’t placed a pixel in the required amount of time. If that all checks out, an update is sent out to a few places. The first is to rabbitMQ. A new item is added to a fanout exchange which every websockets server is set to consume from. This then sends out a websockets message to every connected consumer subscribed to the “place” namespace. Another update is sent to our event collector. This allows us to do analysis as well as publish a complete dump of events once the game is finished. Another update is sent to redis, where we are storing a bitmap of the board in one key. We’ll go over this in a lot more detail later, since this is redis conf and it probably is a bit more interesting to most folks here than the rest of the stuff. Finally we store the actual data about the user who drew the pixel in cassandra. This is to support the functionality allowing people to click a pixel to see who placed it and when.
  8. Key points: Series of uint4s, essentially a bitmap. Each number in the uint4 maps to a color index in a palette that is interpreted by the frontend. On the left you see the data as stored and transmitted, and on the right you see how it “wraps around” when placed on the grid. Do to a write, we can just address a particular place in the bitmap based on the coordinates (x + y * canvas_size) This is great because we can store pixel information for 1 million pixels in 500KB.
  9. Initially we were eyeing SETBIT, which lets you set individual bits. We started wondering about atomicity, which we figured we could handle with transactions, but also: maybe we didn’t really care. If a pixel was highly contested they might stomp over one another and a random mix of the colors would result, but maybe it wasn’t that big of a deal. Either way, this seemed attractive because it also meant we could store a color from a 16 color palette in a half-byte, making our total board of 1 million pixels only take up 500KB.
  10. Then we came across something that redis added recently that seemed to fit out use case even better: BITFIELD. This would let us define our integer size (in this case uint4), the offset we wanted to write to, and the value, and redis would even do the work of calculating the actual bit offset and writing the value! [show the basic usage here] This seemed to fit our use case perfectly, no transactions necessary. This meant very little code was involved for the setting. Essentially we just had to translate our X & Y coordinates into a single offset, calculated by multiplying the Y value with the canvas width and adding it to the X coordinate. By putting the hash tag in front of the offset, redis will automatically calculate the bit position. You can also string these together to set multiple bits at once. So we could have easily done some batching here if we had wanted to.
  11. For the GET, we don’t actually need to use BITFIELD at all. We actually just use a simple GET, and let the client handle the parsing out of the pixels. The code here ends up being a bit more complicated since there are no uint4arrays in javascript, so we need to do our own bit shifting, but it’s not too bad.
  12. At this point we started doing some load testing. Initial tests showed we’d be able to get about 75k/sec writes before we maxed out the network. We were hitting about 40% CPU. It turns out we were doing the load testing on a 1gbps instance. So if we scaled up to the next bottleneck we figured we should be able to hit somewhere around 180k writes/sec. At this point it was obvious we were overengineering, since that would be an order of magnitude higher traffic than we get for all of reddit, and we’d hit a lot more limits before we got near that threshold. One thing that I think was really important that came out of this was the unexpected tradeoff between gets and sets. Initially load testing was done with sets only, but it seemed like peppering in even 1 GET of the key per second caused us to take a huge hit in number of sets. The reason is kind of obvious as soon as you see it: a set is pretty small, setting a single pixel value, but a GET is getting them all! There’s obviously things like TCP overhead and the overhead of the redis command itself, but it seemed like each GET request translated to a loss of ~2K writes per second. I wish I had better graphs for this but it’s mostly all stored in my memory and off-handed slack comments. So anyway, it became clear that redis would more than handle our write throughput and give us plenty of room to really lower the pixel cooldown timer if we wanted to, but made us realize we might want to cache the reads.
  13. One of the things you first have to start thinking about when building one of these projects is what kind of backend infrastructure you’re going to use. reddit’s engineering team is still pretty small, so we don’t really have the tools at our disposal that we might otherwise have if we had entire teams dedicated to managing one datastore or something like that. So we tend to be biased towards things we already know and have exercised well. This lets us do napkin math and estimations and have a pretty good sense of where we might have issues. So our initial idea for the project was to store most of our data in Cassandra, since we already are putting about 17TB of data in there and are writing at something like 80k/sec total. [get real cass numbers] That being said, we recognized that while it might make sense for some things like storing individual pixel information, it didn’t *feel* right for storing the entire board. It felt wasteful, since it would mean querying a million columns just to reconstruct the board, and would mean a hotspot in the ring, but we went ahead and did it for the initial MVP since we figured with caching it might not matter too much. If we could still return the board in under a second and cache that response, we figured we’d be good. At this point we had been eyeing redis but some of the people on the team felt that if we could do it with existing tools it would be a safer option. Still… redis seemed really attractive. It made me realize numbers really win conversations. At this point, we still weren’t necessarily convinced we should use redis since we were more familiar with cassandra. So we loaded up the board with 1 million pixels and compared the load times: [diagram showing redis @ 50ms and cass > 30 seconds]. This isn’t a knock on cassandra at all, just that it wasn’t necessarily the best use case for this project. We started to think through the failure cases, and as a backup in case we had issues with redis and lost the data on it, we made a function to load the data back up from cassandra. Initially, some of us figured it wouldn’t be too bad to lose the board in case of a failure, since people would be able to get it filled back up again pretty quickly. In retrospect, I’m really really glad we didn’t do this. I think that would have fundamentally damaged the culture and factions built in the project. Redis ended up faring really well and we were really happy with the decision to go with it.
  14. So another thing that ended up being very useful here and felt like a worthwhile investment was putting some effort into caching. From the load testing we did on redis we knew it would be a pretty big deal if we let all reads just fall through to the main store. And plus, we had live updates, so in theory we could just wait the TTL’s length before requesting the board while receiving the live updates and then request it and we should have a pretty good live picture of what the board looked like. We decided to set a TTL of 1 second. The best part about this is that our CDN would automatically pick up on those headers and do the hard work of caching for us. This essentially brought the number of requests that could hit us in a given second to the number of POPs our CDN had, since the asset would be cached and shared among nodes at each POP. In the case of fastly, that should come out to around 30/sec, which is pretty small and reasonable. But still… could we go further? We also looked into shielding, which essentially puts an origin in front of ours that will cache the asset, and the other caches make requests from that. That would limit us to a mere 1 request per second! Unfortunately we didn’t do it. With the way our config was set up it was gonna be kind of convoluted, and with convoluted solutions come confusing issues, and it felt like we were overengineering at that point anyway. We added another layer of caching internally since all of our application servers have local caches. We did this just in case CDN caching failed spectacularly and we needed a quick way to fall back to something else without hitting redis directly. Thankfully we never ended up needing this! This nice part about all of this was that it wasn’t too hard to add these extra layers and would have made us feel way more comfortable in the case that something bad happened. [not sure how to explain this, something about game time decisions. it’s nice to have simple ways to fall back and lower load or fix problems that are ready to go.] Finally, as I mentioned earlier, we were worried that the CDN might start serving old assets. We decided to put a 32 bit timestamp at the beginning of the board that clients could then either report back to us and we could keep an eye on or that we could then use to make a decision about avoiding the caching layer, but we ended up not doing it.
  15. May seem overengineered, but was super easy to implement given our current setup. You can see from the code here that adding the caching for our CDN was as simple as adding an additional header with the TTL. Using the local cache in reddit applications is as simple as checking for the value. And finally we get from redis if none of that worked. We were also writing to cassandra in case of major failure, so we could manually load the board back into redis if need be. Again, all of this took very little actual code but paid off greatly in terms of how it reduced load to the system.
  16. But place kind of ended up being a load test for rabbit and this workload, so there ya go. Also, kind of as a joke, I created a script that would blast out only websockets updates during our employee testing period and would just spell out my name in huge letters on the canvas. This turned out to be a mini load test for the frontend, because we realized that couple hundred pixels coming in would actually take a while to render and would come in in a huge burst! This led to one of our frontend people fixing the problem. So again, load test everything!
  17. [Mention shielding here vs just one per POP.]
  18. And finally the last takeaway. You’re gonna miss some stuff, in this case, rabbitMQ. Some unexpected stuff will arise. You also are gonna end up doing work that you don’t even end up using. I’m assuming there’s some people in software here, so you probably know this. A lot of the features and stuff we write only gets barely used, and then some other stuff gets used a lot more than we expected. We added a timestamp we didn’t use. We had layers of caching we could fall back on we didn’t use. We had a backup PR for batching websockets messages we didn’t use. We also did a lot right, but we don’t really know what would have happened if we hadn’t done those things. But because they didn’t fail catastrophically, you never hear about it. So maybe if we hadn’t cached it would have overloaded our database and caused writes to take forever and block and clients to back up and… we never saw it. Which is sometimes the unfortunate part about this job -- if you do everything right, people think it was just easy.