Altitude SF 2017: Reddit - How we built and scaled r/place

•Download as PPTX, PDF•

1 like•1,274 views

Fastly

This talk goes over how the Reddit team built and scaled their 2017 April Fools’ project, r/place.

Technology

r/place
How We Built and Scaled Reddit’s 2017 April Fools’ Project
Daniel Ellis
u/daniel
@I_am_Dan_Ellis

Individually you can create something.
Together you can create something more.

Challenges
● You only get one shot to launch, and it’ll only last a few days.
● You shouldn’t affect the main site.
● It’s a small team, and you have other stuff to do.
● You have no idea what users will do with it.

Overview - Getting the Board
app server
CDN
(fastly)
client

Overview - Setting a Pixel
app server
client
websockets
server
event
collector
1
2
3
4

Cassandra
● Initial MVP in Cassandra
● Quite mature for us
○ 36 nodes
○ ~96TB of data
○ 90k reads/sec
○ 30k writes/sec
● Downsides
○ Doesn’t fit this project’s data model well
○ Potentially affects the main site

Redis
● Upsides
○ Fits the data model well
○ Doesn’t affect the main site
● Downsides
○ We don’t use it a lot, mostly for counting

SETBIT?
● SETBIT key offset value
● SETBIT canvas 100 1
SETBIT canvas 101 1
SETBIT canvas 102 1
SETBIT canvas 103 1

BITFIELD - Setting a Pixel
● BITFIELD key SET TYPE OFFSET VALUE
● BITFIELD canvas SET u4 #25 15

BITFIELD - Getting a Pixel
● Simple GET command

Load
Testing
● ~180k writes per second estimated
● 1 read per second == loss of 2k writes per second

Board Load Time
redis 10ms
cassandra >30s

client
CDN
(fastly)
application
cache
redis cassandra

Takeaways
Have knobs you can tune, switches you can flip.
Some of them will be crucial (like changing cooldown timers), and some you won’t have to resort to at all (like changing
caching behavior).

Takeaways
Load test everything with real-world data.
This showed the need for caching and for another backend store. Had we just tested simple reads and writes we might
have missed the problems arising from disparity in sizes for gets vs sets.

Takeaways
Use power principles.
You don’t have much time, so anything you choose to implement should have a disproportionate payoff.

Takeaways
Some things you think will matter won’t, some things you
don’t think will matter will, some things you think will matter
would have mattered but you’ll never know they mattered
because you did the thing that prevented them from
mattering.

Dans un contexte économique difficile, les directeurs et responsables RH sont confrontés notamment aux difficultés de recrutement, de flexibilité ou de motivation du personnel. Cette formation vous propose de découvrir les avantages de la mise en place d'une GEPP (Gestion des Emplois et des Parcours Professionnels) en entreprise, ainsi que ses principaux leviers : Comprendre les différences fondamentales avec la GPEC ou Gestion Prévisionnelle de l'Emploi et des Compétences (évolution de la loi depuis 2005) Appréhender les notions de talents, de soft skills et hard skills, ainsi que de la marque employeur Mettre en œuvre une politique RH qui redonnera du sens et de la pérennité dans la gestion RH de tout type et toute taille d'entreprise. Formation offerte animée à distance par notre experte Corinne Schwartz

La motivation des ressources humaines travail final

Anwar Youssef

La fonction ressources humainesAbderrahmane Belhimer

Exposé Marketin RH

Faculty of Economic Studies

Recrutement 2.0; nouvelles méthodes et nouveaux métiers

Baptiste Defrent

le marketing de la santé

Anis Mzoughi

Le GRH

Mariem Chaaben

Recrutement 2.0 - présentation générale

Anthony Grolleau-Fricard

Mon offre de services

guest1b1d66

Conférence sur le Bonheur au Travail

Pierre-Yves HOSTIN

Sujets de pfe pour etudiants en grh

ezzeddine mbarek

RedisConf17 - Reddit - How We Built and Scaled r/place

Redis Labs

Cloud arch patterns

Corey Huinker

What's hot

Cours GRH.ppt

sergeomgba1

Le recrutement

Mohcine Boudanes

MotivationConstsalv

9 reporting pilotage_donnees_socialesChristelle Ollivier

Introduction à la GRHAdama Ndiaye

Les techniques de recrutement.pdf

salmasamih2

Gestion des ressouces humainesAllaeddine Makhlouk

Le bien-être au travail des salariés : un enjeu majeur pour les entreprises

Guillaume Testa

JAE-2018-gestion-temps (2).ppt

Fatima Nabahatz GUENAOU

Formation M2i - GEPP : en quoi sa mise en place est-elle pertinente et d'actu...

M2i Formation

La motivation des ressources humaines travail final

Anwar Youssef

La fonction ressources humainesAbderrahmane Belhimer

Exposé Marketin RH

Faculty of Economic Studies

Recrutement 2.0; nouvelles méthodes et nouveaux métiers

Baptiste Defrent

le marketing de la santé

Anis Mzoughi

Le GRH

Mariem Chaaben

Recrutement 2.0 - présentation générale

Anthony Grolleau-Fricard

Mon offre de services

guest1b1d66

Conférence sur le Bonheur au Travail

Pierre-Yves HOSTIN

Sujets de pfe pour etudiants en grh

ezzeddine mbarek

What's hot (20)

Cours GRH.ppt

Le recrutement

Motivation

9 reporting pilotage_donnees_sociales

Introduction à la GRH

Les techniques de recrutement.pdf

Gestion des ressouces humaines

Le bien-être au travail des salariés : un enjeu majeur pour les entreprises

JAE-2018-gestion-temps (2).ppt

Formation M2i - GEPP : en quoi sa mise en place est-elle pertinente et d'actu...

La motivation des ressources humaines travail final

La fonction ressources humaines

Exposé Marketin RH

Recrutement 2.0; nouvelles méthodes et nouveaux métiers

le marketing de la santé

Le GRH

Recrutement 2.0 - présentation générale

Mon offre de services

Conférence sur le Bonheur au Travail

Sujets de pfe pour etudiants en grh

Similar to Altitude SF 2017: Reddit - How we built and scaled r/place

RedisConf17 - Reddit - How We Built and Scaled r/place

Redis Labs

Cloud arch patterns

Corey Huinker

kranonit S06E01 Игорь Цинько: High loadKrivoy Rog IT Community

Piano Media - approach to data gathering and processing

MartinStrycek

David Max: A Tale of Two Systems | Nowhere Developers 2018

Nowhere Developers Conference

https://www.nowheredevelopers.com/ In this tale, a software team attempts to build a new system to replace their old system that was failing because of its inability to scale. The system they end up building meets all their criteria for scaling, but they discover that it has broken other criteria in ways that they did not anticipate. What this team did not take into account is that scalability is only one quality attribute of a system's architecture. Certain quality attributes tend to trade off against each other, so it is often impossible to change just one aspect of a system while expecting all the other attributes to stay the same. This team could have avoided disaster if they had thought more about their system architecturally by making more explicit their goals and accounting for the ways the attributes they needed to change would trade off against the existing constraints their new system still needed to meet.

AWS User Group October

PolarSeven Pty Ltd

A Tale of Two Systems - Insights from Software Architecture

David Max

Talk delivered at the Nowhere Developers Conference in Bentonville, AR. March 15, 2018 Video link: https://youtu.be/PkAkDhBfLWk In this tale, a software team attempts to build a new system to replace their old system that was failing because of its inability to scale. The system they end up building meets all their criteria for scaling, but they discover that it has broken other criteria in ways that they did not anticipate. David Max is a Senior Software Engineer at LinkedIn in New York City where he helps build software systems to connect the world’s professionals and create economic opportunity for every member of the global workforce. He earned his undergraduate degree in Computer Science from the California Institute of Technology, and has a Masters degree in Computer Science from New York University. He has previously worked at Google and in the financial technology field. About the Conference: Nowhere Developers Conference 2018 brought together hundreds of developers and engineers from across the U.S. for a one-day, pure tech conference in Bentonville, AR on March 15. With a mission of showcasing the incredible development and engineering talent in middle America, the conference featured local, regional, and national speakers from companies like Google, Mozilla, Mailchimp, and Walmart speaking about the state of the industry, software engineering, emerging technology such as machine learning and cryptocurrency, and the region's growing technology ecosystem. For more information, visit www.nowheredevelopers.com or on Facebook, Twitter, and Instagram.

THE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEAST

Opher Dubrovsky

From talk given on June 10, 2020 at the DevTalks Reimagined conference. ABSTRACT: Serverless is an amazing beast of a technology. With it, you can quickly build and deploy incredible systems. You get out-of-the-box scalability and flexibility.Nevertheless, with great power comes great(er) cost!In this talk, you’ll learn about building a huge data pipeline using Serverless architecture, and how to tame the beast.After this session, you will understand the pitfalls to avoid and the great powers to exploit.

Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes

Shivji Kumar Jha

This is a real world account from a Druid cluster in production. A story of 48 hours of debugging, learning and understanding Druid better, filing a couple of issues in Druid github and finally a stable production pipeline again thanks to the Druid community. We will discuss the bottlenecks we had in overlord, slot issues for Peons in middle managers, coordinator bottlenecks, how we mitigated task and segment flooding, what configs we changed sprinkled with real world numbers and snapshots from our grafana dashboards.

S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...

Codemotion

Vast volume of our processed data is Time Series data and once you start working with distributed systems, you start tackling many scale and performance problems: How to handle missing data?Should I handle both serving and backed process or separating them out? Best Performance for Money? In the talk we will tell the tale of all of the transformations we’ve made to our data model@Windward, some of the problems we’ve handled, review the multiple data persistency layers like: S3, MongoDB, Apache Cassandra, MySQL. And I’ll try my best NOT to answer the question “Which one of them is the Best?"

No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...

Brian Brazil

Traditional relational databases focus on ACID, providing strong semantics that require careful synchronisation between actors that limit scalability. NoSQL Column Stores such as Cassandra, Riak and Dynamo offer another way, by eschewing strong consistency you can meet your application's needs while also increasing scalability and reliability. This talk will cover how and where to use eventual consistency.

Big Data in 200 km/h | AWS Big Data Demystified #1.3

Omid Vahdaty

What we're about A while ago I entered the challenging world of Big Data. As an engineer, at first, I was not so impressed with this field. As time went by, I realised more and more, The technological challenges in this area are too great to master by one person. Just look at the picture in this articles, it only covers a small fraction of the technologies in the Big Data industry… Consequently, I created a meetup detailing all the challenges of Big Data, especially in the world of cloud. I am using AWS infrastructure to answer the basic questions of anyone starting their way in the big data world. how to transform data (TXT, CSV, TSV, JSON) into Parquet, ORCwhich technology should we use to model the data ? EMR? Athena? Redshift? Spectrum? Glue? Spark? SparkSQL?how to handle streaming?how to manage costs?Performance tips?Security tip?Cloud best practices tips? Some of our online materials: Website: https://big-data-demystified.ninja/ Youtube channels: https://www.youtube.com/channel/UCzeGqhZIWU-hIDczWa8GtgQ?view_as=subscriber https://www.youtube.com/channel/UCMSdNB0fGmX5dXI7S7Y_LFA?view_as=subscriber Meetup: https://www.meetup.com/AWS-Big-Data-Demystified/ https://www.meetup.com/Big-Data-Demystified Facebook Group : https://www.facebook.com/groups/amazon.aws.big.data.demystified/ Facebook page (https://www.facebook.com/Amazon-AWS-Big-Data-Demystified-1832900280345700/) Audience: Data Engineers Data Science DevOps Engineers Big Data Architects Solution Architects CTO VP R&D

Our journey with druid - from initial research to full production scale

Itai Yaffe

Here at the Nielsen Marketing Cloud we use druid.io (http://druid.io/) as one of our main data stores, both for simple counts and for approximate count-distinct (DataSketches). It’s been more than a year since we started using it, injecting billions of events each day to multiple druid clusters for different use-cases. In this meet-up, we will share our journey, the challenges we had, the way we overcame them (at least most of them) and the steps we made to optimize the process around Druid to keep the solution cost effective. Before diving into Druid, we will briefly present our data pipeline architecture, starting from the front-end serving system, deployed in number of geo-locations, to a centralized Kafka cluster in the cloud, and give some examples of the different processes that consume from Kafka and feed our different data sources.

Cassandra Day London 2015: The Resilience of Apache Cassandra

DataStax Academy

iFood on Delivering 100 Million Events a Month to Restaurants with Scylla

ScyllaDB

iFood is the largest Brazilian-based food delivery app company. It connects users, restaurants, and deliverymen using an event-driven architecture using AWS SQS and SNS, with programming in Java and Node.js. Thales' team is responsible for delivering orders' events to restaurant devices at least once, which is currently done using a REST API polling and acknowledgment system. Learn how their database infrastructure evolved from a PostgreSQL database, but began to show limitations and was a single point of failure. Growing through a few intermediary steps, including Amazon DynamoDB, eventually, turning to Scylla for its data model and collections to condense multiple tables. Using Scylla, iFood reduced the time to process events and acknowledgments (from ~80ms to ~3ms) and reduced costs using Scylla vs DynamoDB by over 9x.

Converting Your Legacy Data to S1000D

dclsocialmedia

If everyone write their documents with the intent that they be standardized and converted, conversion to S1000D would be easy. But the reality is that most legacy data lacks the details needed for a full conversion or contains anomalies and irrelevant text. This leads us to the question one must ask: should I convert, rewrite, or manually convert the legacy data? In this presentation, we will attempt to answer this question by reviewing: o A very quick introduction to S1000D conversions o What the technical headaches are o Whether to convert or rewrite o Planning for a good conversion experience o What the timeline looks like o Some tools to help

Big data & frameworks: no book for you anymore

Stfalcon Meetups

Роман Нікітченко Big Data solutions architect в компанії V.I.Tech. Спеціаліст з більш ніж 20-річним досвідом роботи в галузі телекому і вбудованих систем, що змінив індустрію на Java Enterprise. Завдяки попередньому досвіду за короткий термін став одним з провідних архітекторів Big Data в Україні.

Big data & frameworks: no book for you anymore.

Roman Nikitchenko

When your clients need only small database for personal music library and some kind of HTTP interface to it, everything looks nice and you can use lot of bright frameworks and trusted approaches for your application. But what changes if you step ahead of existing solutions to bring things like population health management? Let's talk about our Big Data experience and meaningful framework usage: - What makes the difference when you go Big Data and Hadoop. - Frameworks and big data: hamsters vs hipsters. - Reality matters. Frameworks cost. How much? - What framework is good for you? - Making your own frameworks.

Big data nyu

Edward Capriolo

S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...

Codemotion Tel Aviv

Similar to Altitude SF 2017: Reddit - How we built and scaled r/place (20)

RedisConf17 - Reddit - How We Built and Scaled r/place

Cloud arch patterns

kranonit S06E01 Игорь Цинько: High load

Piano Media - approach to data gathering and processing

David Max: A Tale of Two Systems | Nowhere Developers 2018

AWS User Group October

A Tale of Two Systems - Insights from Software Architecture

THE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEAST

Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes

S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...

No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...

Big Data in 200 km/h | AWS Big Data Demystified #1.3

Our journey with druid - from initial research to full production scale

Cassandra Day London 2015: The Resilience of Apache Cassandra

iFood on Delivering 100 Million Events a Month to Restaurants with Scylla

Converting Your Legacy Data to S1000D

Big data & frameworks: no book for you anymore

Big data & frameworks: no book for you anymore.

Big data nyu

S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...

More from Fastly

Revisiting HTTP/2

Fastly

RFC 7540 was ratified over 2 years ago and, today, all major browsers, servers, and CDNs support the next generation of HTTP. Just over a year ago, at Velocity, we discussed the protocol, looked at some real world implications of its deployment and use, and what realistic expectations we should have from its use. Now that adoption is ramped up and the protocol is being regularly used on the Internet, it's a good time to revisit the protocol and its deployment. Has it evolved? Have we learned anything? Are all the features providing the benefits we were expecting? What's next?In this session, we'll review protocol basics and try to answer some of these questions based on real-world use of it. We'll dig into the core features like interaction with TCP, server push, priorities and dependencies, and HPACK. We'll look at these features through the lens of experience and see if good practice patterns have emerged. We'll also review available tools and discuss what protocol enhancements are in the near and not-so-near horizon.

Altitude San Francisco 2018: Preparing for Video Streaming Events at Scale

Fastly

CBS Interactive streams some of the largest video streaming events on the planet, including SuperBowl in 2019. This talk will focus on all the work that goes in ahead of time to prepare and plan for game day. From architecture design to capacity reservations to operational visibility and building playbooks we will explore how we build, test and prepare for these large events. We will also explore how some of Fastly's unique features such as MediaShield and VCL are becoming critical to these workflows.

Altitude San Francisco 2018: Building the Souther Hemisphere of the Internet

Fastly

As a global organization, Fastly carefully selects and deploys POP locations to service the greater audience of the Internet. Fastly currently has 52 global POPs across the Internet, 13 of which are located in the Southern Hemisphere. Another 3 are outside North America, Europe, and Asia. During this talk, VP of Infrastructure Tom Daly will share our experience in building Fastly's network of POPs south of the equator, where, in some cases, the Internet we know here in San Francisco, is much different. Tom will explore the physical datacenter infrastructure, network topology, and network policy that pose of unique challenges when operating in these parts of the world.

Altitude San Francisco 2018: The World Cup Stream

Fastly

FuboTV’s recent offering of the 2018 FIFA World Cup broke all of our previous records for viewership and put our systems to the test as we delivered all 64 matches live. Coverage for a majority of games was spread out across ~150 regional sports networks, local FOX affiliates, owned and operated regional stations and other local FOX offerings, with a few early matches broadcasted on national channels. Running a successful World Cup required us to pay close attention to our caching strategies, delivery mechanisms, content edge-case handling and more. An event at this scale, spread out over a month, also gave us an excellent test bed to run experiments. We were able to augment our last-mile delivery, test/tweak our solution for CDN decisioning/priority, and even stand up a set of UHD HDR10 feeds to give our users their first glimpse of live OTT UHD offerings. We’ll run through this whole event from a scale and technology perspective and share our takeaways as we prepare for the upcoming NFL season and beyond.

Altitude San Francisco 2018: We Own Our Destiny

Fastly

Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...

Fastly

Braze is a customer engagement platform that delivers more than a billion messaging experiences across push, email, apps and more each day. In this session, Jon Hyman will describe the company's challenges during an inflection point in 2015 when the company reached the limitation of their physical networking equipment, and how Braze has since grown more than 7x on Fastly. Jon will also discuss how Braze uses Fastly's Layer 7 load balancing to improve stability and uptime of its APIs.

Altitude San Francisco 2018: Moving Off the Monolith: A Seamless Migration

Fastly

Altitude San Francisco 2018: Bringing TLS to GitHub Pages

Fastly

Altitude San Francisco 2018: HTTP Invalidation Workshop

Fastly

One of the most powerful tools that Fastly offers is worldwide, instant purge. Come learn the ins and outs of how HTTP invalidation works in general and how purge and surrogate keys can be used to improve your site's delivery and get even more value from Fastly. This talk will also cover the purge blast radius Surrogate Keys are an amazing way to purge your content from cache, but they can be a bit scary when you aren't sure how many URLs this surrogate key is tied to or what kind of affect this will have on origin. Join the USA Today Network as we explain how we leverage big data tools, Go APIs, New Relic, and Sumo Logic to provide our users a suite of tools for purging content from Fastly. Developers love knowing the blast radius of their surrogate keys, while our engineers love the real-time metrics and notifications we get when developers are hard-purging content.

Altitude San Francisco 2018: HTTP/2 Tales: Discovery and Woe

Fastly

Altitude San Francisco 2018: How Magento moved to the cloud while maintaining...

Fastly

Magento Commerce was first released by a small web development agency over ten years when they saw first-hand what a challenge it was for companies like them to build unique eCommerce sites. They created an open source platform that gives developers the flexibility to create meaningful shopping experiences while building a global community that drives down merchant costs and fosters innovation. Amid the rise of cloud-based software Magento needed to keep pace with more complex merchant needs and heightened shopper expectations. In this session learn how Magento, with the help of Partners like Fastly, evolved into a cloud-based platform without sacrificing their commitment to open software, flexibility, and the community.

Altitude San Francisco 2018: Scaling Ethereum to 10B requests per day

Fastly

ConsenSys is a venture production studio building decentralized applications and developer and end-user tools for blockchains. Their Infura platform is a core infrastructure pillar of Ethereum, enabling decentralized applications of all kinds to scale to accommodate their users. Infura went from 20 million requests a day at the beginning of 2017 to over 10 billion requests today. This staggering 500x increase naturally lead to questions of scale. In this talk, co-founder Michael Wuehler will discuss the technical challenges encountered while building and scaling the Infura platform, and the infrastructure decisions that led to their adoption of Fastly and other pivotal technologies.

Altitude San Francisco 2018: Authentication at the Edge

Fastly

Turning away unwanted traffic close to the source is a common and key use case for edge networks like Fastly, but identity, authentication, and authorization at the edge can go far beyond blocking DDoS. The unique way that you identify your site’s users can probably move to the edge too, allowing you to cut response times in your critical path, offload more origin traffic, and make smarter routing decisions at the edge. In this talk we’ll cover a number of patterns in use by real Fastly customers. Whether you prefer token authentication, pre-shared keys, OAuth, HTTP auth, JSON web tokens, or a complex paywall, learn how you can potentially make your authentication decisions at the edge.

Altitude San Francisco 2018: WebAssembly Tools & Applications

Fastly

Altitude San Francisco 2018: Testing with Fastly Workshop

Fastly

Altitude San Francisco 2018: Fastly Purge Control at the USA TODAY NETWORK

Fastly

Altitude San Francisco 2018: WAF Workshop

Fastly

In this hands-on workshop you will attack a vulnerable web application while defending your own web service behind a Fastly WAF. Attendees will depart understanding how common web application attacks can be exploited as well defended against. They will experience WAF logging and analytics via sumologic to detect attacks realtime. For mitigation you will use a preview version of our newly built WAF rule management UI. We will close off the workshop by deep diving on how our security team analyzed and mitigated some of this summer major vulnerabilities.

Altitude San Francisco 2018: Logging at the Edge

Fastly

Altitude San Francisco 2018: Video Workshop Docs

Fastly

Fastly delivers more than a million log events per second. Our Real-Time Log Streaming is easy to set up, but there are many features you might not be using to their full extent. This workshop will cover setting up logging to various endpoints, dealing with structured data, and getting real-time insights into your customers’ behavior. - - - - - - - - - - - Live streaming and on-demand video can provide a powerful way to connect with customers, but viewers expect seamless pixel-perfect streams without common video delivery inconveniences, such as downtime or lags. This workshop will demonstrate how anyone can deliver live video at scale. We’ll thoroughly explain key video delivery optimizations and more importantly, demonstrate their efficacy using the data collected from both Fastly Log Streaming/Sumo Logic and the Mux quality of experience service.

Altitude San Francisco 2018: Programming the Edge

Fastly

Programming the edge Second floor Andrew Betts Principal Developer Advocate, Fastly Hide abstract Through our support for running your own code on our edge servers, Fastly's network offers you a platform of unparalleled speed, reliability and efficiency to which you can delegate a surprising amount of logic that has traditionally been in the application layer. In this workshop, you'll implement a series of advanced edge solutions, and learn how to apply these patterns to your own applications to reduce your origin load, dramatically improve performance, and make your applications more secure.

More from Fastly (20)

Revisiting HTTP/2

Altitude San Francisco 2018: Preparing for Video Streaming Events at Scale

Altitude San Francisco 2018: Building the Souther Hemisphere of the Internet

Altitude San Francisco 2018: The World Cup Stream

Altitude San Francisco 2018: We Own Our Destiny

Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...

Altitude San Francisco 2018: Moving Off the Monolith: A Seamless Migration

Altitude San Francisco 2018: Bringing TLS to GitHub Pages

Altitude San Francisco 2018: HTTP Invalidation Workshop

Altitude San Francisco 2018: HTTP/2 Tales: Discovery and Woe

Altitude San Francisco 2018: How Magento moved to the cloud while maintaining...

Altitude San Francisco 2018: Scaling Ethereum to 10B requests per day

Altitude San Francisco 2018: Authentication at the Edge

Altitude San Francisco 2018: WebAssembly Tools & Applications

Altitude San Francisco 2018: Testing with Fastly Workshop

Altitude San Francisco 2018: Fastly Purge Control at the USA TODAY NETWORK

Altitude San Francisco 2018: WAF Workshop

Altitude San Francisco 2018: Logging at the Edge

Altitude San Francisco 2018: Video Workshop Docs

Altitude San Francisco 2018: Programming the Edge

Recently uploaded

When stars align: studies in data quality, knowledge graphs, and machine lear...

Elena Simperl

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf

FIDO Alliance

Essentials of Automations: Optimizing FME Workflows with Parameters

Safe Software

Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place. Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects. Here’s what you’ll gain: - Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows. - Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy. - Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency. - Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity. We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic. Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

Sri Ambati

ODC, Data Fabric and Architecture User Group

CatarinaPereira64715

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

BookNet Canada

The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more. Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/ Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality

Inflectra

In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring. Learn about: • The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks. • Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective. • Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification. • Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process. Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.

How world-class product teams are winning in the AI era by CEO and Founder, P...

Product School

Neuro-symbolic is not enough, we need neuro-*semantic*

Frank van Harmelen

Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”. All of this illustrated with link prediction over knowledge graphs, but the argument is general.

The Art of the Pitch: WordPress Relationships and Sales

Laura Byrne

Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes? All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

FIDO Alliance

UiPath Test Automation using UiPath Test Suite series, part 4

DianaGray10

Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap. The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies. Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques What will you get from this session? 1. Insights into SAP testing best practices 2. Heatmap utilization for testing 3. Optimization of testing processes 4. Demo Topics covered: Execution from the test manager Orchestrator execution result Defect reporting SAP heatmap example with demo Speaker: Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

Ramesh Iyer

In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.

GraphRAG is All You need? LLM & Knowledge Graph

Guy Korland

Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs. 1. Unifying Large Language Models and Knowledge Graphs: A Roadmap. https://arxiv.org/abs/2306.08302 2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs: https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...

Product School

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf

91mobiles

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...

Product School

DevOps and Testing slides at DASA Connect

Kari Kakkonen

PHP Frameworks: I want to break free (IPC Berlin 2024)

Ralf Eggert

In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development. This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...

Product School

Recently uploaded (20)

When stars align: studies in data quality, knowledge graphs, and machine lear...

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf

Essentials of Automations: Optimizing FME Workflows with Parameters

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

ODC, Data Fabric and Architecture User Group

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality

How world-class product teams are winning in the AI era by CEO and Founder, P...

Neuro-symbolic is not enough, we need neuro-*semantic*

The Art of the Pitch: WordPress Relationships and Sales

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

UiPath Test Automation using UiPath Test Suite series, part 4

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

GraphRAG is All You need? LLM & Knowledge Graph

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...

DevOps and Testing slides at DASA Connect

PHP Frameworks: I want to break free (IPC Berlin 2024)

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...

Altitude SF 2017: Reddit - How we built and scaled r/place

1. r/place How We Built and Scaled Reddit’s 2017 April Fools’ Project Daniel Ellis u/daniel @I_am_Dan_Ellis

2. What Was It?

3. Individually you can create something. Together you can create something more.

6. 1.1M Unique Users 6

7. 150K Concurrent Users 7

8. 16.5M Tiles Placed

9. 72 Hours 9

10. Challenges

11. Challenges ● You only get one shot to launch, and it’ll only last a few days. ● You shouldn’t affect the main site. ● It’s a small team, and you have other stuff to do. ● You have no idea what users will do with it.

12. No Pressure!

13. Development Workflow

14.

15.

16. How it Worked

17. Overview - Getting the Board app server CDN (fastly) client

18. Overview - Setting a Pixel app server client websockets server event collector 1 2 3 4

19. Backend Choice Cassandra vs. Redis

20. Cassandra ● Initial MVP in Cassandra ● Quite mature for us ○ 36 nodes ○ ~96TB of data ○ 90k reads/sec ○ 30k writes/sec ● Downsides ○ Doesn’t fit this project’s data model well ○ Potentially affects the main site

21. Redis ● Upsides ○ Fits the data model well ○ Doesn’t affect the main site ● Downsides ○ We don’t use it a lot, mostly for counting

22. redis

23. Format of the Board

24. SETBIT? ● SETBIT key offset value ● SETBIT canvas 100 1 SETBIT canvas 101 1 SETBIT canvas 102 1 SETBIT canvas 103 1

25. BITFIELD - Setting a Pixel ● BITFIELD key SET TYPE OFFSET VALUE ● BITFIELD canvas SET u4 #25 15

26. BITFIELD - Getting a Pixel ● Simple GET command

27. Load Testing ● ~180k writes per second estimated ● 1 read per second == loss of 2k writes per second

28. Board Load Time redis 10ms cassandra >30s

29. Caching

30. client CDN (fastly) application cache redis cassandra

31.

32. Takeaways Have knobs you can tune, switches you can flip. Some of them will be crucial (like changing cooldown timers), and some you won’t have to resort to at all (like changing caching behavior).

33. Takeaways Load test everything with real-world data. This showed the need for caching and for another backend store. Had we just tested simple reads and writes we might have missed the problems arising from disparity in sizes for gets vs sets.

34. Takeaways Use power principles. You don’t have much time, so anything you choose to implement should have a disproportionate payoff.

35. Takeaways Some things you think will matter won’t, some things you don’t think will matter will, some things you think will matter would have mattered but you’ll never know they mattered because you did the thing that prevented them from mattering.

36. Thanks!

37. Questions?

Editor's Notes

Hey everybody, thanks for coming. I’m here today to talk about place, reddit’s april fools day project for 2017. Though I’m assuming at least a few people here know exactly what place was, I’ll give a quick overview just so no one is completely lost.
At its core, place was a large scale social experiment. Users were given one pixel every 5 minutes they could place anywhere on this 1000x1000 grid board (a million pixels total) from a palette of 16 colors. The idea was that by yourself, you couldn’t really draw anything, but together, through collaboration, you could make some great stuff. And as you can see here, people really, really, really did make some really awesome stuff. It lasted about 3 days, at which point we felt like the canvas was pretty full, and then we decided to stop it. It was really really cool to see just how much effort people put into this. People had dedicated discord chats, I think for rainbow road they had almost a thousand people in discord, with different duties split out into different channels like maintenance, diplomacy, construction. They had dedicated negotiators, template designers. Truly an awesome thing to behold. As cool as that is and as long as I could go on about all the individual parts and wars and factions that broke out, I’m here specifically to talk
Let’s start with some initial challenges. The first is that you only get one shot. There’s no opportunity to beta test on a scale any larger than internal employee tests. There’s no slowly rolling out to some users. There’s a launch and an announcement and a sudden flood of users. You have a hard deadline. There’s like… simply no way to delay the fact that april fools is on april 1st. The next challenge is that it’s only going to last a few days. If it’s a really crappy experience those few days, you don’t have a few weeks to make it better and have people forget about the rocky launch. That’s just… the whole project. Also, we’d like to prevent issues from affecting the main site. It’s a lot easier to argue that we should have some time to work on this stuff if we aren’t bringing down the site and affecting everyday users. You have other work to do. Yep. You have a small team. You simply can’t fix every bug or do everything you want to do, so you have to aggressively prioritize. Finally, and this is less of a technical problem, but this is the internet, and on the internet people sometimes do amazing things and sometimes do really crappy things. This might end up being a complete mess full of racism and a bunch of other garbage that will reflect negatively on you.
So, ya know, no pressure.
And so to manage all this we used a very high tech task management system: a spreadsheet. We’d put tasks on here, highlight them when done, sometimes put tasks on there, sometimes not. And this also had such tasks as “Client buffering thing to do”. We were also able to put notes about the tasks, like this gem “josh is idiot” which I didn’t actually notice until I went to prepare this slide. It’s nice because there’s not necessary the pressure to be formal that’s on a full fledged task management system. So, it seems simple or silly but it actually worked just fine. We would self-prioritize and self-manage and it worked out pretty well. I in general would argue that those huge complex tasks management systems are mostly a distraction, and if you get the right type of people on a project they’ll self-organize and get the project done regardless.
The first and simplest major path is actually reading the board. For this, the CDN played a key role. We cached the board “heavily” (with a TTL of 1 second), and let our CDN, fastly, do the work of serving the asset.
This is for the most part pretty complete. There are some other minor things we haven’t included here like checking our postgres instance for the account age to make sure you could play the game, and of course, to make sure you actually have an account. But here is where the place-specific stuff was stored. The second major path is when drawing a pixel. For drawing a pixel, a request comes in and gets passed through the CDN and load balancer untouched, and the application server does a few checks to make sure you’re logged in, your account is of the right age, and that you haven’t placed a pixel in the required amount of time. If that all checks out, an update is sent out to a few places. The first is to rabbitMQ. A new item is added to a fanout exchange which every websockets server is set to consume from. This then sends out a websockets message to every connected consumer subscribed to the “place” namespace. Another update is sent to our event collector. This allows us to do analysis as well as publish a complete dump of events once the game is finished. Another update is sent to redis, where we are storing a bitmap of the board in one key. We’ll go over this in a lot more detail later, since this is redis conf and it probably is a bit more interesting to most folks here than the rest of the stuff. Finally we store the actual data about the user who drew the pixel in cassandra. This is to support the functionality allowing people to click a pixel to see who placed it and when.
Key points: Series of uint4s, essentially a bitmap. Each number in the uint4 maps to a color index in a palette that is interpreted by the frontend. On the left you see the data as stored and transmitted, and on the right you see how it “wraps around” when placed on the grid. Do to a write, we can just address a particular place in the bitmap based on the coordinates (x + y * canvas_size) This is great because we can store pixel information for 1 million pixels in 500KB.
Initially we were eyeing SETBIT, which lets you set individual bits. We started wondering about atomicity, which we figured we could handle with transactions, but also: maybe we didn’t really care. If a pixel was highly contested they might stomp over one another and a random mix of the colors would result, but maybe it wasn’t that big of a deal. Either way, this seemed attractive because it also meant we could store a color from a 16 color palette in a half-byte, making our total board of 1 million pixels only take up 500KB.
Then we came across something that redis added recently that seemed to fit out use case even better: BITFIELD. This would let us define our integer size (in this case uint4), the offset we wanted to write to, and the value, and redis would even do the work of calculating the actual bit offset and writing the value! [show the basic usage here] This seemed to fit our use case perfectly, no transactions necessary. This meant very little code was involved for the setting. Essentially we just had to translate our X & Y coordinates into a single offset, calculated by multiplying the Y value with the canvas width and adding it to the X coordinate. By putting the hash tag in front of the offset, redis will automatically calculate the bit position. You can also string these together to set multiple bits at once. So we could have easily done some batching here if we had wanted to.
For the GET, we don’t actually need to use BITFIELD at all. We actually just use a simple GET, and let the client handle the parsing out of the pixels. The code here ends up being a bit more complicated since there are no uint4arrays in javascript, so we need to do our own bit shifting, but it’s not too bad.
At this point we started doing some load testing. Initial tests showed we’d be able to get about 75k/sec writes before we maxed out the network. We were hitting about 40% CPU. It turns out we were doing the load testing on a 1gbps instance. So if we scaled up to the next bottleneck we figured we should be able to hit somewhere around 180k writes/sec. At this point it was obvious we were overengineering, since that would be an order of magnitude higher traffic than we get for all of reddit, and we’d hit a lot more limits before we got near that threshold. One thing that I think was really important that came out of this was the unexpected tradeoff between gets and sets. Initially load testing was done with sets only, but it seemed like peppering in even 1 GET of the key per second caused us to take a huge hit in number of sets. The reason is kind of obvious as soon as you see it: a set is pretty small, setting a single pixel value, but a GET is getting them all! There’s obviously things like TCP overhead and the overhead of the redis command itself, but it seemed like each GET request translated to a loss of ~2K writes per second. I wish I had better graphs for this but it’s mostly all stored in my memory and off-handed slack comments. So anyway, it became clear that redis would more than handle our write throughput and give us plenty of room to really lower the pixel cooldown timer if we wanted to, but made us realize we might want to cache the reads.
One of the things you first have to start thinking about when building one of these projects is what kind of backend infrastructure you’re going to use. reddit’s engineering team is still pretty small, so we don’t really have the tools at our disposal that we might otherwise have if we had entire teams dedicated to managing one datastore or something like that. So we tend to be biased towards things we already know and have exercised well. This lets us do napkin math and estimations and have a pretty good sense of where we might have issues. So our initial idea for the project was to store most of our data in Cassandra, since we already are putting about 17TB of data in there and are writing at something like 80k/sec total. [get real cass numbers] That being said, we recognized that while it might make sense for some things like storing individual pixel information, it didn’t *feel* right for storing the entire board. It felt wasteful, since it would mean querying a million columns just to reconstruct the board, and would mean a hotspot in the ring, but we went ahead and did it for the initial MVP since we figured with caching it might not matter too much. If we could still return the board in under a second and cache that response, we figured we’d be good. At this point we had been eyeing redis but some of the people on the team felt that if we could do it with existing tools it would be a safer option. Still… redis seemed really attractive. It made me realize numbers really win conversations. At this point, we still weren’t necessarily convinced we should use redis since we were more familiar with cassandra. So we loaded up the board with 1 million pixels and compared the load times: [diagram showing redis @ 50ms and cass > 30 seconds]. This isn’t a knock on cassandra at all, just that it wasn’t necessarily the best use case for this project. We started to think through the failure cases, and as a backup in case we had issues with redis and lost the data on it, we made a function to load the data back up from cassandra. Initially, some of us figured it wouldn’t be too bad to lose the board in case of a failure, since people would be able to get it filled back up again pretty quickly. In retrospect, I’m really really glad we didn’t do this. I think that would have fundamentally damaged the culture and factions built in the project. Redis ended up faring really well and we were really happy with the decision to go with it.
So another thing that ended up being very useful here and felt like a worthwhile investment was putting some effort into caching. From the load testing we did on redis we knew it would be a pretty big deal if we let all reads just fall through to the main store. And plus, we had live updates, so in theory we could just wait the TTL’s length before requesting the board while receiving the live updates and then request it and we should have a pretty good live picture of what the board looked like. We decided to set a TTL of 1 second. The best part about this is that our CDN would automatically pick up on those headers and do the hard work of caching for us. This essentially brought the number of requests that could hit us in a given second to the number of POPs our CDN had, since the asset would be cached and shared among nodes at each POP. In the case of fastly, that should come out to around 30/sec, which is pretty small and reasonable. But still… could we go further? We also looked into shielding, which essentially puts an origin in front of ours that will cache the asset, and the other caches make requests from that. That would limit us to a mere 1 request per second! Unfortunately we didn’t do it. With the way our config was set up it was gonna be kind of convoluted, and with convoluted solutions come confusing issues, and it felt like we were overengineering at that point anyway. We added another layer of caching internally since all of our application servers have local caches. We did this just in case CDN caching failed spectacularly and we needed a quick way to fall back to something else without hitting redis directly. Thankfully we never ended up needing this! This nice part about all of this was that it wasn’t too hard to add these extra layers and would have made us feel way more comfortable in the case that something bad happened. [not sure how to explain this, something about game time decisions. it’s nice to have simple ways to fall back and lower load or fix problems that are ready to go.] Finally, as I mentioned earlier, we were worried that the CDN might start serving old assets. We decided to put a 32 bit timestamp at the beginning of the board that clients could then either report back to us and we could keep an eye on or that we could then use to make a decision about avoiding the caching layer, but we ended up not doing it.
May seem overengineered, but was super easy to implement given our current setup. You can see from the code here that adding the caching for our CDN was as simple as adding an additional header with the TTL. Using the local cache in reddit applications is as simple as checking for the value. And finally we get from redis if none of that worked. We were also writing to cassandra in case of major failure, so we could manually load the board back into redis if need be. Again, all of this took very little actual code but paid off greatly in terms of how it reduced load to the system.
But place kind of ended up being a load test for rabbit and this workload, so there ya go. Also, kind of as a joke, I created a script that would blast out only websockets updates during our employee testing period and would just spell out my name in huge letters on the canvas. This turned out to be a mini load test for the frontend, because we realized that couple hundred pixels coming in would actually take a while to render and would come in in a huge burst! This led to one of our frontend people fixing the problem. So again, load test everything!
[Mention shielding here vs just one per POP.]
And finally the last takeaway. You’re gonna miss some stuff, in this case, rabbitMQ. Some unexpected stuff will arise. You also are gonna end up doing work that you don’t even end up using. I’m assuming there’s some people in software here, so you probably know this. A lot of the features and stuff we write only gets barely used, and then some other stuff gets used a lot more than we expected. We added a timestamp we didn’t use. We had layers of caching we could fall back on we didn’t use. We had a backup PR for batching websockets messages we didn’t use. We also did a lot right, but we don’t really know what would have happened if we hadn’t done those things. But because they didn’t fail catastrophically, you never hear about it. So maybe if we hadn’t cached it would have overloaded our database and caused writes to take forever and block and clients to back up and… we never saw it. Which is sometimes the unfortunate part about this job -- if you do everything right, people think it was just easy.

Altitude SF 2017: Reddit - How we built and scaled r/place

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Altitude SF 2017: Reddit - How we built and scaled r/place

Similar to Altitude SF 2017: Reddit - How we built and scaled r/place (20)

More from Fastly

More from Fastly (20)

Recently uploaded

Recently uploaded (20)

Altitude SF 2017: Reddit - How we built and scaled r/place

Editor's Notes