Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Develop, deploy, and operate services at reddit scale oscon 2018

278 views

Published on

The last few years have been a period of tremendous growth for Reddit. Process, tooling, and culture have all had to adapt to an organization that has tripled in size and ambition. Greg Taylor discusses Reddit's evolution and explains how one of the world’s busiest sites develops, deploys, and operates services at significant scale.

Presented at OSCON 2018 in Portland, Oregon

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Develop, deploy, and operate services at reddit scale oscon 2018

  1. 1. Develop, deploy, and operate services at Reddit scale Greg Taylor – EM, Reddit Infrastructure /u/gctaylor
  2. 2. What is Reddit?
  3. 3. /r/CrappyDesign
  4. 4. /r/theocho
  5. 5. /r/news
  6. 6. /r/daddit
  7. 7. /r/IAMA
  8. 8. Reddit by the numbers 5th/8th Alexa Rank (US/World) 330M+ Monthly active users 138K Active communities 12M Posts per month 2B Votes per month
  9. 9. Rapid Recent Growth Engineers in early 2016 Engineers today Increase in less than three years 25 185 7X
  10. 10. Reddit Engineering in 2016 Monolith
  11. 11. 2016 - The Infrastructure Team ● Provisioned and configured all infrastructure ● Operated most of our systems ● Handled most non-trivial debugging
  12. 12. 2016 - The Stack
  13. 13. Mid 2016 - Rapid growth
  14. 14. Diminishing returns
  15. 15. Determining the path forward
  16. 16. Service-oriented Architecture (SOA)
  17. 17. The first service Monolith Service
  18. 18. Growing pains: Automated testing ● Problem: Unit/integration testing as an afterthought ○ “We know we should do better, buut….” ○ We needed CI for mere mortals ● Solution: Cultural shift ○ Git hooks are not enough! ○ Testing, linting, integration, oh my! ○ Green master branches ● https://drone.io/
  19. 19. Growing pains: Something to build on ● Problem: Artisanal, hand-crafted services ● Solution: Reddit service framework ○ Named “baseplate” ○ Consistency between services ○ Batteries-included ● https://baseplate.readthedocs.io
  20. 20. Hey! That was pretty cool. Let’s do it again! Service Monolith Service
  21. 21. Growing pains: Artisanal Infrastructure ● Problem: Artisanal, hand-crafted Infrastructure ● Solution: Infrastructure-as-code ○ Declarative, repeatable ○ Code review… for infrastructure ○ Reusable modules ● https://www.terraform.io/
  22. 22. Once more, with feeling! Service Service Monolith Service Service
  23. 23. Growing pains: Staging/integration woes ● Problem: Inadequate staging solution for SOA ● Outlandish idea: Give Kubernetes a shot ○ Let’s see what we can MVP in 3 weeks! ○ Pile of manifests and bash scripts ○ MVP very positively received ● https://kubernetes.io/
  24. 24. Reddit Engineering in 2017 Service Service Service Service Monolith
  25. 25. Growing pains: Infra team as a bottleneck ● Problem: Development is too dependent on Infra ○ Service provisioning ○ Ongoing operation ○ Debugging and performance work ● Short-term “solution”: Train and deputize infrastructure-oriented teams ○ Allows for more self-sufficiency ○ Results in accelerated dev velocity
  26. 26. One size fits all some Not all teams want to operate the full stack for their service Don’t make me infrastructure! My toaster runs Docker
  27. 27. You want me to do WHAT?!?
  28. 28. What do we really really want?
  29. 29. Service ownership! A service owner is empowered to: ● Dev and test their service in a prod-like env ● Do most of the work to get to production ● Own the health of their service ● Diagnose issues
  30. 30. Service ownership challenges The learning curve looms large
  31. 31. Service ownership challenges With great power comes great responsibility
  32. 32. Service ownership challenges Mistakes are going to happen… often!
  33. 33. Early 2018 - How to enable Service Ownership?
  34. 34. Reddit Infrastructure as a Product
  35. 35. Selecting the right foundation
  36. 36. How to build an Infrastructure Product
  37. 37. Limit the surface area
  38. 38. The user/operator contract Product Users (Service Owners) ● Learn some Kubernetes basics ● Deploy and operate own services Product Operators (Reddit Infrastructure) ● Keep the Kubernetes clusters running ● Provision AWS resources ● Support and advise Product Users
  39. 39. Batteries-included Provide as much as we can out-of-the-box: ● Test, build, and deploy pipelines ● Observability ● Service discovery ● Baseline security
  40. 40. Paint-by-numbers Enabling service ownership for all backgrounds: ● Limit learning expectations ● You want to do X? Here’s a guide for that ● Training and extensive documentation ● Auto-generate all the things An engineer should not require deep infra experience in order to be productive!
  41. 41. Consistency throughout As a developer takes an application through its development cycle, the operating environment should be as consistent as possible Local dev Staging Production
  42. 42. Deploy with confidence A developer should have tools and processes that will help them build confidence as work approaches production Code Review Unit Testing Integration Testing Some
  43. 43. Guardrails and Safeties Things that prevent or minimize damage ● Resource limits ● Throttling ● Network policy and Access controls ● Scanning for common mistakes ● Docker Image policies
  44. 44. Reddit Infrastructure Product Core Tenets ● Limit the surface area ● The user/operator contract ● Batteries-included ● Paint-by-numbers ● Consistency throughout ● Deploy with confidence ● Guardrails and Safeties
  45. 45. What does all of this buy us?
  46. 46. Infra team: From Operators to Enablers
  47. 47. Infra team: From Operators to Enablers Infra provisions all infrastructure Infra provides infra as a product Infra deploys new services Service owners deploy new services Infra operates most services Service owners operate services Infra is a blocking dependency Infra is an adviser and enabler
  48. 48. Organizational scalability
  49. 49. Closing Remarks
  50. 50. reddit.com/jobs
  51. 51. Presenter Info + Resources ● Greg Taylor - Reddit Infrastructure ● /u/gctaylor ● @gctaylor ● github.com/gtaylor ● reddit.com/r/kubernetes ● redditblog.com/topic/technology

×