Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

(DVO203) The Life of a Netflix Engineer Using 37% of the Internet

6,794 views

Published on

Netflix is a large and ever-changing ecosystem made up of: hundreds of production changes every hour, thousands of micro services, tens of thousands of instances, millions of concurrent customers, billions of metrics every minute. And I'm the guy with the pager. This is an in-the-trenches look at what operating at Netflix scale in the cloud is really like. It covers how Netflix views the velocity of innovation, expected failures, high availability, engineer responsibility, and obsessing over the quality of the customer experience. It also explains why freedom and responsibility are key, trust is required, and chaos is your friend.

Published in: Technology
  • Be the first to comment

(DVO203) The Life of a Netflix Engineer Using 37% of the Internet

  1. 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Dave Hahn, Operations and Reliability Engineering dhahn@netflix.com | @relix42 DVO203 A Day in the Life of a Netflix Engineer Using 37% of the Internet
  2. 2. A bit about me Dave Hahn Senior * Engineer Critical Operations and Response Engineering Team (CORE)
  3. 3. Operations Crisis Handling Cloud Performance Reliability Insight Network Engagement Hardware Software *
  4. 4. Operations Crisis Handling Cloud Performance Reliability Insight Network Engagement Hardware Software Making things better *
  5. 5. CORE team Part of Operations Engineering Responsible for: Crisis management Availability reporting Reliability best practices AWS relationship Operations education
  6. 6. CORE team SREs PMs Crisis leaders
  7. 7. CORE goals Protect the customer experience
  8. 8. CORE goals Protect the customer experience
  9. 9. Cannot connect to the Netflix service Try Again
  10. 10. Go outside and play Sunshine!
  11. 11. CORE goals Protect the customer experience
  12. 12. CORE goals Protect the customer experience
  13. 13. CORE goals Protect the customer experience Unique failures
  14. 14. CORE goals Protect the customer experience Unique failures Constant Improvement
  15. 15. A bit about Netflix
  16. 16. A bit about Netflix Media and entertainment company Goal Delight our customers and win moments of truth
  17. 17. Moment
  18. 18. The Netflix cloud journey
  19. 19. ‘09 ‘10 ‘11 ‘13 ‘15 Start cloud effort First device talking to AWS Serving from EU-WEST-1 Serving from US-WEST-2 Migration complete The Netflix cloud journey
  20. 20. Netflix architecture
  21. 21. AZ1 Global deployment US-West US-East EU-West
  22. 22. Netflix architecture Microservices 100s & 100s of microservices
  23. 23. Netflix architecture
  24. 24. Netflix ecosystem 100s of microservices 1000s of daily production changes 10,000s of instances 100,000s of customer interactions per minute 1,000,000s of customers 1,000,000,000s of metrics 10,000,000,000 hours of streamed
  25. 25. Netflix ecosystem 100s of microservices 1000s of daily production changes 10,000s of instances 100,000s of customer interactions per minute 1,000,000s of customers 1,000,000,000s of metrics 10,000,000,000 hours of streamed 10s of operations engineers
  26. 26. Netflix ecosystem 100s of microservices 1000s of daily production changes 10,000s of instances 100,000s of customer interactions per minute 1,000,000s of customers 1,000,000,000s of metrics 10,000,000,000 hours of streamed 10s of operations engineers No NOC
  27. 27. How? DevOps culture
  28. 28. DevOps culture 100% ownership Code Test Deploy Run Support
  29. 29. DevOps culture 100% ownership On call 24x7
  30. 30. DevOps culture 100% ownership On call 24x7 Incident reviews
  31. 31. DevOps culture 100% ownership On call 24x7 Incident reviews Honest and open feedback
  32. 32. How? DevOps culture Easy ownership
  33. 33. Easy ownership
  34. 34. Easy ownership Service discovery
  35. 35. Easy ownership Solid communication
  36. 36. Easy ownership Continuous deployment
  37. 37. Easy ownership Data persistence
  38. 38. How? DevOps culture Easy ownership Insight
  39. 39. Insight Metrics
  40. 40. Insight
  41. 41. Insight
  42. 42. Insight
  43. 43. Insight Operational insight
  44. 44. Insight
  45. 45. Insight
  46. 46. Insight
  47. 47. How? DevOps culture Easy ownership Insight Cloud thinking
  48. 48. Cloud thinking
  49. 49. Cloud thinking Verbs not nouns
  50. 50. How? DevOps culture Easy ownership Insight Cloud thinking Remove surprises
  51. 51. Cloud guarantees Your instances will die
  52. 52. R.I.P. <your favorite instance> Stateless applications High data spread and redundancy Production failure injection
  53. 53. Cloud guarantees Your instances will die You will share resources
  54. 54. Cloud guarantees Your instances will die You will share resources The architecture will change
  55. 55. Cloud guarantees Your instances will die You will share resources The architecture will change You never see the lights
  56. 56. A day in the life What would you say you do here?
  57. 57. Well Bob Crisis handling
  58. 58. Well Bob Crisis handling Engagement
  59. 59. Well Bob Crisis handling Engagement Making things
  60. 60. Making things
  61. 61. Well Bob Crisis handling Engagement Automation Education
  62. 62. What I don’t do
  63. 63. What I do Make things better
  64. 64. Making things better for you netflix.github.io
  65. 65. Making things better for you jobs.netflix.com
  66. 66. Speaker When? Where? Running Spark and Presto on the Netflix Big Data Platform Daniel Weeks Thu @ 11am Palazzo F Splitting the Check on Compliance and Security: Keeping Developers and Auditors Happy in the Cloud Jason Chan Thu @ 11am Marcello 4501B @ Visit the Netflix booth Speakers there to answer questions
  67. 67. Thank you! Dave Hahn dhahn@netflix.com @relix42
  68. 68. Remember to complete your evaluations!

×