Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Josh Evans - Director of Operations Engineering
November 16, 2015
Beyond DevOps:
How Netflix Bridges the Gap
Technical Debt
• Java 6
• Perforce
• Single Master Jenkins
• Ant
• CentOS
• Asgard/Mimir
Fall 2013
How do we drive broad-based change?
The Paved Road
• Java 7
• Stash
• Jenkins Shards
• Gradle
• Ubuntu
Some said
• You’re overloading us
• Too many projects
• Poor targeting
Others said
• What took you so long?
• We’ve moved ...
• Expectations gap
– Division of labor
– Timing of solutions
– Leadership
• Affects
– Reputation
– Relationships
– Lost op...
How do we bridge the gap?
“Remember that TIME is money…”
Time is a form of currency
• Product Engineering
• Operations Engineering
• Challenges & Strategies
Our time today…
• Product Engineering
• Operations Engineering
• Challenges & Strategies
Our time today…
Product Innovation
winning moments of truth
● Every facet of the product
● 1400 AB tests in the last year & accelerating
Continuous Innovation
But wait, there’s more…
Build It
• design
• code
• build
• bake
• test
• deploy
Run It
• configure
• monitor
• triage
• fix
…at scale, globally
Yo...
Internet
• 1000s of starts per second
• 100,000s of requests per second
• 100,000,000 hours of content / day
• 3 AWS Regio...
Relentless product innovation
Building & running micro-
services at scale, globally
• Product Engineering
• Operations Engineering
• Challenges & Strategies
Our time today…
DevOps is a software development method that
emphasizes the roles of both software developers and
other information-techno...
Why? How?
Quality Velocity
Operational Excellence
Operational Excellence is the continuous improvement of
the management, design, and function of operational
environments t...
• Engineering Tools
• Insight & Real-time Analytics
• Performance & Reliability
Operations Engineering is the application ...
Operations Engineering
• Service provider
• Operational excellence driver
• Cross-cutting solutions
• Undifferentiated hea...
• Product Engineering
• Operations Engineering
• Challenges & Strategies
Our time today…
• You’re overloading us
• What took you so long?
Remember that feedback?
• We made assumptions
– Requirements – what & whe...
• Move from assumptions to knowledge
• Affect change without imposing a tax?
• Achieve and sustain operational excellence?...
Time is a form of currency
5 strategies for success
in time-based economies
software & organizational engineering
1. Reach out
• What are your biggest operational pain points?
• How can we help?
• How well are we meeting your needs today?
• What wou...
Grease the Squeaky Wheels
• low tolerance for tax
• more vocal than most
• High impact solutions
• Clarity on deliverables
• Lower operational tax
• Leadership, innovation, and partnership
What t...
• Deliver on solutions
• Better road map definition & communication
• A more aggressive stance on automation
• Deeper inve...
2. Make an impact
• Apply what you’ve learned
• Deliver what matters
• global cloud console
• end to end delivery
• automation platform
• velocity with confidence
Pipelines - Automated Global Delivery
3. Make it easy to do the right thing
• Engineering time is scarce
• We must do more heavy lifting
Supply & Demand
• Spinnaker manual step
• Automated migrations – Mimir
Provide on-ramps
Automate proven practices
• Alerting and Monitoring
• Apache & Tomcat Hardening
• Automated Canary Analysis
• Autoscaling
• Chaos Participation
• Co...
• Alerting and Monitoring
• Apache & Tomcat Hardening
• Automated Canary Analysis
• Autoscaling
• Chaos Participation
• Co...
Old Version (v1.0)
New Version
(v1.1)
Load BalancerCustomers
100 Servers
5 Servers
95%
5%
Metrics
Canaries
Old Version (v1.0)
New Version
(v1.1)
Load BalancerCustomers
0 Servers
100 Servers
100%
Metrics
Canaries
Define
• Metrics
• A threshold
Every n minutes
● Classify metrics
● Compute score
● Make a decision
Automated Canary Analy...
Canary Analysis
Performance
Integration Tests
Chaos
Conformity
Static
Unit Tests
Make it easy to do the
right thing
Static...
4. Reduce the cost of change
• Ongoing migrations
• Library propagation
• 100s of micro-services
• Complex dependencies
Continuous, Broad-based Change
Change Engineering
• Locate
• Communicate
• Facilitate
• Automated forensics
– Who last touched x?
– What team?
– Who was their manager?
Who owns this artifact, repository, serv...
Whitepages
• Workday wrapper
• App & REST API
• Organization hierarchy
• Metadata
• Change log
(###) ###-####
Krieger
• REST-based service
• Sources
– Whitepages
– Stash
– Edda
– Jenkins
– Spinnaker
– Etc…
{
"content": {},
"_links":...
/api/employees?q=jevans "employees": [
{
"id": "241",
"firstName": "Josh",
"lastName": "Evans",
"username": "jevans",
"ema...
• Security vulnerabilities
– Who owns this service?
• Platform updates
– Who is using this version of this library?
Today ...
Automated, efficient technical
project management
• Communication
• Guidance
• Tracking
Low tax for TPMs & engineers
Secur...
5. Develop Partnerships
Beyond supply & demand
• Nearing completion
• Aggressive schedule
• Unexpected delays
• Commitment to June delivery
Spinnaker 1.0 – 1H 2015
• Built their own continuous delivery solution
• Not positioned for engineering-wide support
• Believes common solutions
E...
Partnership in Action
• Strong relationship
• Open discussions about concerns
• Decision - leaned forward
• +2 engineers o...
Moving Forward Together
• Containers?
• Achieving alignment
• Collaborative exploration
– Edge, Platform, Operations
– A n...
• Paved Road adopted
– Adding new ones
• Production Ready ongoing
• Migrations easier
• Reputation improving
• Improved
– ...
Putting it to the test in 2016
• Streaming production & test - EC2 Classic to VPC
• Highly cross-functional
• Complex depe...
Five Strategies
1. Reach out
2. Make an impact
3. Make it easy to do the right thing
4. Reduce the cost of change
5. Devel...
Open Sourced!
https://netflix.github.io/
Josh Evans
jevans@netflix.com
@ops_engineering
Questions?
Beyond DevOps - How Netflix Bridges the Gap
Beyond DevOps - How Netflix Bridges the Gap
Beyond DevOps - How Netflix Bridges the Gap
Beyond DevOps - How Netflix Bridges the Gap
Upcoming SlideShare
Loading in …5
×

606

Share

Download to read offline

Beyond DevOps - How Netflix Bridges the Gap

Download to read offline

Operating a massively scalable, constantly changing, distributed global service is a daunting task. We innovate at breakneck speed to attract new customers and stay ahead of the competition. Simultaneously improving service quality and enabling rapid, continuous change seems impossible on the surface.

At Netflix, Operations Engineering is a centralized organization whose charter is to accomplish just that by applying high-leverage software engineering practices like continuous delivery. real-time analytics, and automation to solve operational problems. It's well established that many traditional IT Operations teams struggle to bridge the gap with software engineering. Operations Engineering is no exception. And while DevOps as a construct seeks to address this gap, it doesn't go far enough. It does not explain how to bridge the gap or even why it's important to do so.

In this talk we’ll use Netflix Operations Engineering as a case study to address these questions. We'll explore common challenges faced by operational teams and strategies to overcome them.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Beyond DevOps - How Netflix Bridges the Gap

  1. Josh Evans - Director of Operations Engineering November 16, 2015 Beyond DevOps: How Netflix Bridges the Gap
  2. Technical Debt • Java 6 • Perforce • Single Master Jenkins • Ant • CentOS • Asgard/Mimir Fall 2013
  3. How do we drive broad-based change?
  4. The Paved Road • Java 7 • Stash • Jenkins Shards • Gradle • Ubuntu
  5. Some said • You’re overloading us • Too many projects • Poor targeting Others said • What took you so long? • We’ve moved on • Now we need to migrate That’s great but… We’re paying a high tax
  6. • Expectations gap – Division of labor – Timing of solutions – Leadership • Affects – Reputation – Relationships – Lost opportunities Organizational Debt
  7. How do we bridge the gap?
  8. “Remember that TIME is money…”
  9. Time is a form of currency
  10. • Product Engineering • Operations Engineering • Challenges & Strategies Our time today…
  11. • Product Engineering • Operations Engineering • Challenges & Strategies Our time today…
  12. Product Innovation winning moments of truth
  13. ● Every facet of the product ● 1400 AB tests in the last year & accelerating Continuous Innovation
  14. But wait, there’s more…
  15. Build It • design • code • build • bake • test • deploy Run It • configure • monitor • triage • fix …at scale, globally You build it, you run it
  16. Internet • 1000s of starts per second • 100,000s of requests per second • 100,000,000 hours of content / day • 3 AWS Regions, 3 AZs per region
  17. Relentless product innovation Building & running micro- services at scale, globally
  18. • Product Engineering • Operations Engineering • Challenges & Strategies Our time today…
  19. DevOps is a software development method that emphasizes the roles of both software developers and other information-technology (IT) professionals with an emphasis on IT Operations. - Wikipedia The Gap
  20. Why? How?
  21. Quality Velocity Operational Excellence
  22. Operational Excellence is the continuous improvement of the management, design, and function of operational environments to achieve greater quality, velocity, and competitive advantage.
  23. • Engineering Tools • Insight & Real-time Analytics • Performance & Reliability Operations Engineering is the application of software engineering practices to achieve and sustain operational excellence.
  24. Operations Engineering • Service provider • Operational excellence driver • Cross-cutting solutions • Undifferentiated heavy lifting
  25. • Product Engineering • Operations Engineering • Challenges & Strategies Our time today…
  26. • You’re overloading us • What took you so long? Remember that feedback? • We made assumptions – Requirements – what & when – Time for non-product work
  27. • Move from assumptions to knowledge • Affect change without imposing a tax? • Achieve and sustain operational excellence? How do we…
  28. Time is a form of currency
  29. 5 strategies for success in time-based economies software & organizational engineering
  30. 1. Reach out
  31. • What are your biggest operational pain points? • How can we help? • How well are we meeting your needs today? • What would you like to see from us in the future? Listen Shower, rinse, repeat Talk to your engineering customers
  32. Grease the Squeaky Wheels • low tolerance for tax • more vocal than most
  33. • High impact solutions • Clarity on deliverables • Lower operational tax • Leadership, innovation, and partnership What they wanted
  34. • Deliver on solutions • Better road map definition & communication • A more aggressive stance on automation • Deeper investment into leadership, innovation, planning Our commitments
  35. 2. Make an impact • Apply what you’ve learned • Deliver what matters
  36. • global cloud console • end to end delivery • automation platform • velocity with confidence
  37. Pipelines - Automated Global Delivery
  38. 3. Make it easy to do the right thing
  39. • Engineering time is scarce • We must do more heavy lifting Supply & Demand
  40. • Spinnaker manual step • Automated migrations – Mimir Provide on-ramps
  41. Automate proven practices
  42. • Alerting and Monitoring • Apache & Tomcat Hardening • Automated Canary Analysis • Autoscaling • Chaos Participation • Consistent Naming • ELB Configuration • Healthcheck Configured • Red-Black Pipeline • Squeeze Testing • Timeout & Fallback Tuning • Workload Reliability Production Ready?
  43. • Alerting and Monitoring • Apache & Tomcat Hardening • Automated Canary Analysis • Autoscaling • Chaos Participation • Consistent Naming • ELB Configuration • Healthcheck Configured • Red-Black Pipeline • Squeeze Testing • Timeout & Fallback Tuning • Workload Reliability Production Ready?
  44. Old Version (v1.0) New Version (v1.1) Load BalancerCustomers 100 Servers 5 Servers 95% 5% Metrics Canaries
  45. Old Version (v1.0) New Version (v1.1) Load BalancerCustomers 0 Servers 100 Servers 100% Metrics Canaries
  46. Define • Metrics • A threshold Every n minutes ● Classify metrics ● Compute score ● Make a decision Automated Canary Analysis
  47. Canary Analysis Performance Integration Tests Chaos Conformity Static Unit Tests Make it easy to do the right thing Static & Functional Testing
  48. 4. Reduce the cost of change
  49. • Ongoing migrations • Library propagation • 100s of micro-services • Complex dependencies Continuous, Broad-based Change
  50. Change Engineering • Locate • Communicate • Facilitate
  51. • Automated forensics – Who last touched x? – What team? – Who was their manager? Who owns this artifact, repository, service?
  52. Whitepages • Workday wrapper • App & REST API • Organization hierarchy • Metadata • Change log (###) ###-####
  53. Krieger • REST-based service • Sources – Whitepages – Stash – Edda – Jenkins – Spinnaker – Etc… { "content": {}, "_links": { "employees": { "href": "/api/employees/" }, "projects": { "href": "/api/projects/" }, "teams": { "href": "/api/teams/" }, "applications": { "href": "/api/applications/" }, "jobs": { "href": "/api/build/jobs" }, "masters": { "href": "/api/build/masters" }, "projectDistribution": { "href": "/api/teams/projectDistribution" } } }
  54. /api/employees?q=jevans "employees": [ { "id": "241", "firstName": "Josh", "lastName": "Evans", "username": "jevans", "email": "jevans@netflix.com", "jobTitle": "Director of Operations Engineering", "isManager": true, "isCurrent": true, "title": "Josh Evans (jevans) - Operations Engineering", "_links": { "self": { "href": "/api/employees/241" }, "manager": { "href": "/api/employees/117890" }, "team": { "href": "/api/teams/f9134a81" }, "projects": { "href": "/api/teams/f9134a81/projects" } } } ] }
  55. • Security vulnerabilities – Who owns this service? • Platform updates – Who is using this version of this library? Today – Targeted Coordination
  56. Automated, efficient technical project management • Communication • Guidance • Tracking Low tax for TPMs & engineers Security Fix Guava Future – Change Campaigns
  57. 5. Develop Partnerships Beyond supply & demand
  58. • Nearing completion • Aggressive schedule • Unexpected delays • Commitment to June delivery Spinnaker 1.0 – 1H 2015
  59. • Built their own continuous delivery solution • Not positioned for engineering-wide support • Believes common solutions Edge Engineering
  60. Partnership in Action • Strong relationship • Open discussions about concerns • Decision - leaned forward • +2 engineers on Spinnaker • Successful 1.0 launch
  61. Moving Forward Together • Containers? • Achieving alignment • Collaborative exploration – Edge, Platform, Operations – A new paved road?
  62. • Paved Road adopted – Adding new ones • Production Ready ongoing • Migrations easier • Reputation improving • Improved – Service uptime – Rate of change Payoffs
  63. Putting it to the test in 2016 • Streaming production & test - EC2 Classic to VPC • Highly cross-functional • Complex dependencies • Zero downtime Stay tuned…
  64. Five Strategies 1. Reach out 2. Make an impact 3. Make it easy to do the right thing 4. Reduce the cost of change 5. Develop partnerships
  65. Open Sourced! https://netflix.github.io/
  66. Josh Evans jevans@netflix.com @ops_engineering Questions?
  • LauraVVanLaningham

    Jul. 3, 2021
  • deepakpandit

    May. 28, 2021
  • AmanSolanki33

    May. 12, 2021
  • AdinaNeagu3

    Apr. 15, 2021
  • AbouCamara5

    Apr. 5, 2021
  • BryantBooker

    Mar. 26, 2021
  • EwnetuZeleke

    Mar. 25, 2021
  • RoyalSbachl1

    Mar. 20, 2021
  • krishnakumar2397

    Mar. 20, 2021
  • SyedaAfshan4

    Mar. 10, 2021
  • amamaj

    Mar. 10, 2021
  • paulkdl

    Mar. 9, 2021
  • 734003128

    Mar. 2, 2021
  • JABARSONRICHARD

    Feb. 27, 2021
  • MousamiKalidindi

    Feb. 25, 2021
  • SiannRorke

    Jan. 30, 2021
  • IsumJayawardhana

    Jan. 26, 2021
  • maheshwarilakshman

    Jan. 21, 2021
  • ShimaaMekhlafy

    Jan. 9, 2021
  • RajeshkumarRavipati

    Nov. 27, 2020

Operating a massively scalable, constantly changing, distributed global service is a daunting task. We innovate at breakneck speed to attract new customers and stay ahead of the competition. Simultaneously improving service quality and enabling rapid, continuous change seems impossible on the surface. At Netflix, Operations Engineering is a centralized organization whose charter is to accomplish just that by applying high-leverage software engineering practices like continuous delivery. real-time analytics, and automation to solve operational problems. It's well established that many traditional IT Operations teams struggle to bridge the gap with software engineering. Operations Engineering is no exception. And while DevOps as a construct seeks to address this gap, it doesn't go far enough. It does not explain how to bridge the gap or even why it's important to do so. In this talk we’ll use Netflix Operations Engineering as a case study to address these questions. We'll explore common challenges faced by operational teams and strategies to overcome them.

Views

Total views

168,292

On Slideshare

0

From embeds

0

Number of embeds

637

Actions

Downloads

587

Shares

0

Comments

0

Likes

606

×