SlideShare a Scribd company logo
How do you eat a whale?
One byte at a time!
O’Reilly Velocity Conference 2017
Oct 3, New York, NY
Kelly Looney,
Director of DevOps Consulting
Skytap Inc
* No whales were harmed in the making of this presentation. Skytap does not promote the eating or harming of whales.
Topics
o Where we started
o Where we’re now
o How we got here
o Organization
o Education
o Technology
o Parting wisdom
Skytap: Key Stats
o Regions
o 7 Multi-tenant (3 US, TOR, EMEA,
AUS, APAC)
o 3 Single-tenant (US)
o 18,057,400 VMs deployed
o Up to 44,500 / day
o 10,356,700 virtual L2 networks deployed
o Up to 19,600 per day
o 604 petabytes of allocated virtual storage
Starting Situation (circa 2014)
o Complex distributed system deployed across several regions
o The service was (mostly) reliable and scalable
o Deployments once a month; patched as needed - but are scary
o Heavy involvement from operations
o Difficult for devs to develop, test, and deploy
Starting Point
Current Situation
o All new services since 1/2016 run in K8S
o All proprietary high churn
services run in K8S
o Integrated CI/CD pipeline
o Ops focused on high value projects
o Release as needed – with confidence!
Current Situation
SOME SORT OF K8S Picture
K8s clusters in Skytap
o Production
o 11 clusters
o 70 nodes
o 185 namespaces
o ~1K pods at any given time
o Staging & Preprod
o 9 clusters
o 34 nodes
o 400 pods at any given time
What We Were Aiming For
o Reduce the unit of deployment
o Micro-services
o Complexity will only increase
o Comprehensive monitoring,
service discovery, and orchestration
o Easy stuff first
o Stateless and immutable services
First Steps…
Guiding Principles
o Change as little as possible
o New tools harmonize with
existing tools
o New stuff in the new framework
Actions
o Get key players on board
o Inventory and categorize services
o Determine how to concurrently
run old/new
Organization
o Recruit a dedicated tools team
o Not a part-time job
o Ideally members have
o Deep technical ability
o Architectural knowledge of
o Major system components
o CI/CD Tools
o RM & Deployment Practices
o Ability to teach
The SRE Role
• An alternative top of the tech ladder
• Start reactive with goal of being mostly
proactive
• Fire Chief to SRE story
• Let this be your primary means of
improvement
• Make the system easy to change first
• Goal to be unafraid to replace or re-
implement when needed
• Be an educator and mentor
Education
o Buddy system
o Documentation
o Support channels
o Reusing existing tools
Our World
o Devs own image generation &
deployment config
o Prebaked templates and custom
builds
o Educational Areas
o Dependency management
o Dockerfile authoring and image
caching
o Implementing K8S health checks
o Estimating resource usage
Technology
o Developers are human
o Release management process
o Kube-native vs. traditional CI/CD
o Which services to move to
Kubernetes first?
Service Categorization
o Application Tier
• Services (Web)
• Platform
• Infra
o Communication model
• Socket based
• Message passing (MQ)
o Application type:
• Stateful
• Stateless
Our CI/CD & Kube
o Deploy ~150 3rd party and
proprietary services to ~1,000
machines in 10+ regions
o Custom CI/CD tools on
Capistrano and Jenkins
o Kube integrated with existing
CI/CD framework
Testability is a P-Zero
o Deployment tools are hard to test
o Failed deployments == Dirty test
environments
o Automated multi-fidelity
environment builds
Fatal Mistakes to Avoid
o Underestimating what you have
o Not considering code, state, & data
o Transient technology choices
o Trying to deliver too much at once
Parting Wisdom
o A customer first attitude will drive
adoption
o Start with compute
o Pick up networking & storage later
o Consider your existing toolchain
o Ability to reset environments will keep
you moving fast
o Much easier for container services to talk
to legacy than vice versa
Thank You!
Contain Yourself:
Incremental Adoption
for Modernization
CoreOS Feast, 2017
San Francisco, CA
Petr Novodvorskiy, Development Lead
Dan Jones, Director of Product Management

More Related Content

Similar to How do you eat a whale velocity 2017

DevOps State of the Union 2015
DevOps State of the Union 2015DevOps State of the Union 2015
DevOps State of the Union 2015
Ernest Mueller
 
Leaving the Ivory Tower: Research in the Real World
Leaving the Ivory Tower: Research in the Real WorldLeaving the Ivory Tower: Research in the Real World
Leaving the Ivory Tower: Research in the Real World
C4Media
 
From the Monolith to Microservices - CraftConf 2015
From the Monolith to Microservices - CraftConf 2015From the Monolith to Microservices - CraftConf 2015
From the Monolith to Microservices - CraftConf 2015
Randy Shoup
 
Service Architectures at Scale
Service Architectures at ScaleService Architectures at Scale
Service Architectures at Scale
Randy Shoup
 
Application Delivery Patterns
Application Delivery PatternsApplication Delivery Patterns
Application Delivery Patterns
Shiva Narayanaswamy
 
Application Delivery Patterns for Developers - Technical 401
Application Delivery Patterns for Developers - Technical 401Application Delivery Patterns for Developers - Technical 401
Application Delivery Patterns for Developers - Technical 401
Amazon Web Services
 
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Larry Smarr
 
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Larry Smarr
 
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Larry Smarr
 
Technical standards & the RDTF Vision: some considerations
Technical standards & the RDTF Vision: some considerationsTechnical standards & the RDTF Vision: some considerations
Technical standards & the RDTF Vision: some considerations
Paul Walk
 
Coding Secure Infrastructure in the Cloud using the PIE framework
Coding Secure Infrastructure in the Cloud using the PIE frameworkCoding Secure Infrastructure in the Cloud using the PIE framework
Coding Secure Infrastructure in the Cloud using the PIE framework
James Wickett
 
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebula Project
 
Computational Patterns of the Cloud
Computational Patterns of the CloudComputational Patterns of the Cloud
Computational Patterns of the Cloud
C4Media
 
Configuration Management Evolution at CERN
Configuration Management Evolution at CERNConfiguration Management Evolution at CERN
Configuration Management Evolution at CERN
Gavin McCance
 
1st Riga DevOps meetup
1st Riga DevOps meetup1st Riga DevOps meetup
1st Riga DevOps meetup
Uldis Karlovs-Karlovskis
 
CloudStack - Apache's best kept secret
CloudStack - Apache's best kept secretCloudStack - Apache's best kept secret
CloudStack - Apache's best kept secret
ShapeBlue
 
Introduction to OpenStack Storage
Introduction to OpenStack StorageIntroduction to OpenStack Storage
Introduction to OpenStack Storage
NetApp
 
Constrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project BonsaiConstrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project Bonsai
Ivo Andreev
 
EBSCO Digital Transformation with AWS
EBSCO Digital Transformation with AWS EBSCO Digital Transformation with AWS
EBSCO Digital Transformation with AWS
Kenzan
 
Dev Ops for systems of record - Talk at Agile Australia 2015
Dev Ops for systems of record - Talk at Agile Australia 2015Dev Ops for systems of record - Talk at Agile Australia 2015
Dev Ops for systems of record - Talk at Agile Australia 2015
Mirco Hering
 

Similar to How do you eat a whale velocity 2017 (20)

DevOps State of the Union 2015
DevOps State of the Union 2015DevOps State of the Union 2015
DevOps State of the Union 2015
 
Leaving the Ivory Tower: Research in the Real World
Leaving the Ivory Tower: Research in the Real WorldLeaving the Ivory Tower: Research in the Real World
Leaving the Ivory Tower: Research in the Real World
 
From the Monolith to Microservices - CraftConf 2015
From the Monolith to Microservices - CraftConf 2015From the Monolith to Microservices - CraftConf 2015
From the Monolith to Microservices - CraftConf 2015
 
Service Architectures at Scale
Service Architectures at ScaleService Architectures at Scale
Service Architectures at Scale
 
Application Delivery Patterns
Application Delivery PatternsApplication Delivery Patterns
Application Delivery Patterns
 
Application Delivery Patterns for Developers - Technical 401
Application Delivery Patterns for Developers - Technical 401Application Delivery Patterns for Developers - Technical 401
Application Delivery Patterns for Developers - Technical 401
 
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
 
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
 
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
 
Technical standards & the RDTF Vision: some considerations
Technical standards & the RDTF Vision: some considerationsTechnical standards & the RDTF Vision: some considerations
Technical standards & the RDTF Vision: some considerations
 
Coding Secure Infrastructure in the Cloud using the PIE framework
Coding Secure Infrastructure in the Cloud using the PIE frameworkCoding Secure Infrastructure in the Cloud using the PIE framework
Coding Secure Infrastructure in the Cloud using the PIE framework
 
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
 
Computational Patterns of the Cloud
Computational Patterns of the CloudComputational Patterns of the Cloud
Computational Patterns of the Cloud
 
Configuration Management Evolution at CERN
Configuration Management Evolution at CERNConfiguration Management Evolution at CERN
Configuration Management Evolution at CERN
 
1st Riga DevOps meetup
1st Riga DevOps meetup1st Riga DevOps meetup
1st Riga DevOps meetup
 
CloudStack - Apache's best kept secret
CloudStack - Apache's best kept secretCloudStack - Apache's best kept secret
CloudStack - Apache's best kept secret
 
Introduction to OpenStack Storage
Introduction to OpenStack StorageIntroduction to OpenStack Storage
Introduction to OpenStack Storage
 
Constrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project BonsaiConstrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project Bonsai
 
EBSCO Digital Transformation with AWS
EBSCO Digital Transformation with AWS EBSCO Digital Transformation with AWS
EBSCO Digital Transformation with AWS
 
Dev Ops for systems of record - Talk at Agile Australia 2015
Dev Ops for systems of record - Talk at Agile Australia 2015Dev Ops for systems of record - Talk at Agile Australia 2015
Dev Ops for systems of record - Talk at Agile Australia 2015
 

More from Kelly Looney

How to eat a whale?
How to eat a whale?How to eat a whale?
How to eat a whale?
Kelly Looney
 
DevOps Workshops Fall 2016
DevOps Workshops Fall 2016DevOps Workshops Fall 2016
DevOps Workshops Fall 2016
Kelly Looney
 
DevOps Days Ohio
DevOps Days OhioDevOps Days Ohio
DevOps Days Ohio
Kelly Looney
 
Continuous Delivery Decision points
Continuous Delivery Decision pointsContinuous Delivery Decision points
Continuous Delivery Decision points
Kelly Looney
 
Gartner Infrastructure and Operations Summit Berlin 2015 - DevOps Journey
Gartner Infrastructure and Operations Summit Berlin 2015 - DevOps JourneyGartner Infrastructure and Operations Summit Berlin 2015 - DevOps Journey
Gartner Infrastructure and Operations Summit Berlin 2015 - DevOps Journey
Kelly Looney
 
The DevOps Journey at bwin.party
The DevOps Journey at bwin.partyThe DevOps Journey at bwin.party
The DevOps Journey at bwin.party
Kelly Looney
 
Austin product camp 11 Agile - doing vs being
Austin product camp 11   Agile - doing vs beingAustin product camp 11   Agile - doing vs being
Austin product camp 11 Agile - doing vs being
Kelly Looney
 
Product Pricing: What your pricing says about you
Product Pricing: What your pricing says about youProduct Pricing: What your pricing says about you
Product Pricing: What your pricing says about youKelly Looney
 
Product Negatives to Positives
Product Negatives to PositivesProduct Negatives to Positives
Product Negatives to Positives
Kelly Looney
 

More from Kelly Looney (9)

How to eat a whale?
How to eat a whale?How to eat a whale?
How to eat a whale?
 
DevOps Workshops Fall 2016
DevOps Workshops Fall 2016DevOps Workshops Fall 2016
DevOps Workshops Fall 2016
 
DevOps Days Ohio
DevOps Days OhioDevOps Days Ohio
DevOps Days Ohio
 
Continuous Delivery Decision points
Continuous Delivery Decision pointsContinuous Delivery Decision points
Continuous Delivery Decision points
 
Gartner Infrastructure and Operations Summit Berlin 2015 - DevOps Journey
Gartner Infrastructure and Operations Summit Berlin 2015 - DevOps JourneyGartner Infrastructure and Operations Summit Berlin 2015 - DevOps Journey
Gartner Infrastructure and Operations Summit Berlin 2015 - DevOps Journey
 
The DevOps Journey at bwin.party
The DevOps Journey at bwin.partyThe DevOps Journey at bwin.party
The DevOps Journey at bwin.party
 
Austin product camp 11 Agile - doing vs being
Austin product camp 11   Agile - doing vs beingAustin product camp 11   Agile - doing vs being
Austin product camp 11 Agile - doing vs being
 
Product Pricing: What your pricing says about you
Product Pricing: What your pricing says about youProduct Pricing: What your pricing says about you
Product Pricing: What your pricing says about you
 
Product Negatives to Positives
Product Negatives to PositivesProduct Negatives to Positives
Product Negatives to Positives
 

Recently uploaded

Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
Hironori Washizaki
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
Roshan Dwivedi
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 

Recently uploaded (20)

Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 

How do you eat a whale velocity 2017

  • 1. How do you eat a whale? One byte at a time! O’Reilly Velocity Conference 2017 Oct 3, New York, NY Kelly Looney, Director of DevOps Consulting Skytap Inc * No whales were harmed in the making of this presentation. Skytap does not promote the eating or harming of whales.
  • 2. Topics o Where we started o Where we’re now o How we got here o Organization o Education o Technology o Parting wisdom
  • 3. Skytap: Key Stats o Regions o 7 Multi-tenant (3 US, TOR, EMEA, AUS, APAC) o 3 Single-tenant (US) o 18,057,400 VMs deployed o Up to 44,500 / day o 10,356,700 virtual L2 networks deployed o Up to 19,600 per day o 604 petabytes of allocated virtual storage
  • 4. Starting Situation (circa 2014) o Complex distributed system deployed across several regions o The service was (mostly) reliable and scalable o Deployments once a month; patched as needed - but are scary o Heavy involvement from operations o Difficult for devs to develop, test, and deploy
  • 6. Current Situation o All new services since 1/2016 run in K8S o All proprietary high churn services run in K8S o Integrated CI/CD pipeline o Ops focused on high value projects o Release as needed – with confidence!
  • 8. SOME SORT OF K8S Picture K8s clusters in Skytap o Production o 11 clusters o 70 nodes o 185 namespaces o ~1K pods at any given time o Staging & Preprod o 9 clusters o 34 nodes o 400 pods at any given time
  • 9. What We Were Aiming For o Reduce the unit of deployment o Micro-services o Complexity will only increase o Comprehensive monitoring, service discovery, and orchestration o Easy stuff first o Stateless and immutable services
  • 10. First Steps… Guiding Principles o Change as little as possible o New tools harmonize with existing tools o New stuff in the new framework Actions o Get key players on board o Inventory and categorize services o Determine how to concurrently run old/new
  • 11. Organization o Recruit a dedicated tools team o Not a part-time job o Ideally members have o Deep technical ability o Architectural knowledge of o Major system components o CI/CD Tools o RM & Deployment Practices o Ability to teach
  • 12. The SRE Role • An alternative top of the tech ladder • Start reactive with goal of being mostly proactive • Fire Chief to SRE story • Let this be your primary means of improvement • Make the system easy to change first • Goal to be unafraid to replace or re- implement when needed • Be an educator and mentor
  • 13. Education o Buddy system o Documentation o Support channels o Reusing existing tools
  • 14. Our World o Devs own image generation & deployment config o Prebaked templates and custom builds o Educational Areas o Dependency management o Dockerfile authoring and image caching o Implementing K8S health checks o Estimating resource usage
  • 15. Technology o Developers are human o Release management process o Kube-native vs. traditional CI/CD o Which services to move to Kubernetes first?
  • 16. Service Categorization o Application Tier • Services (Web) • Platform • Infra o Communication model • Socket based • Message passing (MQ) o Application type: • Stateful • Stateless
  • 17. Our CI/CD & Kube o Deploy ~150 3rd party and proprietary services to ~1,000 machines in 10+ regions o Custom CI/CD tools on Capistrano and Jenkins o Kube integrated with existing CI/CD framework
  • 18. Testability is a P-Zero o Deployment tools are hard to test o Failed deployments == Dirty test environments o Automated multi-fidelity environment builds
  • 19. Fatal Mistakes to Avoid o Underestimating what you have o Not considering code, state, & data o Transient technology choices o Trying to deliver too much at once
  • 20. Parting Wisdom o A customer first attitude will drive adoption o Start with compute o Pick up networking & storage later o Consider your existing toolchain o Ability to reset environments will keep you moving fast o Much easier for container services to talk to legacy than vice versa
  • 22. Contain Yourself: Incremental Adoption for Modernization CoreOS Feast, 2017 San Francisco, CA Petr Novodvorskiy, Development Lead Dan Jones, Director of Product Management

Editor's Notes

  1. Speaker: Dan
  2. Speaker: Dan
  3. Speaker: Dan
  4. Speaker: Petr Skytap, as any big public cloud is big distributed system we run on vsphere cluster, not GCE/AWS/Azure We have around 150 microservices Because of ties between two management systems, source code in mercurial && binaries managed by puppet unability to rollback deployments require a lot of orchestration between different teams in the company and happen in big chunks Developers only partially own system they are working on, making it harder to develop and use newer tools to test
  5. Speaker: Petr This is extremely simplified diagram of our system circa 2014 Everything is running in VMs All connections come in through F5 loadbalancer and go to web nodes All other services are communicating over RabbitMQ
  6. Speaker: Petr slide It was a long way with tons of mistakes and different organizations pushing back on our agenda, but we pulled through We worked with dev teams to understand which services have highest release churn and prioritized them first Developers of those teams usually had highest level of frustration with current deployment tools, so convincing them to move to kubernetes wasn’t a big problem Ops are not maintaining anything inside developers VMs anymore, let’s them be focused on high value projects QA doesn’t gets obsessed with discrepancy between provisined state with puppet and deployed source code QA is confiden they can roll back broken build on staging environment without involving developers
  7. Speaker: Petr Highly simplified version of our current state High release churn services have moved to kubernetes Some proprietary services are still running in VMs and we don’t have short term plan to move them We are considering moving mq and mysql galera to kubernetes next
  8. Speaker: Petr
  9. Speaker: Dan
  10. Speaker: Dan
  11. Speaker: Dan
  12. Speaker: Dan
  13. Speaker: Petr In new world developers have far more power However with power comes responsiblity. As with any transfer of responsibility, we needed to educate developers and explain advantages of the appraoch We needed to explain what immutable builds are, why is it important to track versions of packages they are installing and pin them Image caching and faster builds Explanation why healthchecks and readiness checks are important and useful Working with developers to teach them how to profile their system and estimate resource usage
  14. Speaker: Petr Other problem we experienced with introducing kubernetes to new company is fear of change Kubernetes is very opinionated system for good reasons, however people usually have their own opinions too. While people are happy to take advantage of building their own images, they already have release management deployment tools that they know and use Instead of adopting any kubernetes native CI/CD tools we decided to take our existing tools and adopt them to kubernetes We also tried to choose services that require least amount of change to start running in kubernetes We tried to choose to port services that require least amount of change to start running in kubernetes have developers that are most interested in kubernetes featureset (churn)
  15. Speaker: Petr Split services in several categories high release churn communication model stateful/stateless I don’t want to spread the myth that it’s impossible to run stateful services. And networking policies and ingress rules are really helping us now.. However, it’s harder to run stateful services then stateless setting up efficient direct network communication between non-kube services and kube-services in absence of cloud provided loadbalancers is hard So: mq based, high churn, stateless services first candidate: workflow service, then web workers
  16. Speaker: Petr Instead of adopting any kubernetes native CI/CD tools we decided to take our existing tools and adopt them to kubernetes Our deployment tool is based on capistrano that was heavily modified and it is fairly archaic and we considered throwing it away and replacing it with something better However we realized that: it would be too much of a change along with introduction of kubernetes There’s knowledge built in this tools, that is not explicit, but it was accumulated by years of usage and fixing Integrated kubernetes with these custom tools in a manner with which we can later transition parts of the product to helm/tiller
  17. Speaker: Petr Deployment tools and deployment processes are hard to test Transitioning to new deployment processes is even harder to test Without testing you’ll have more problems in transition and can loose confidence of developers and that can compromise the whole project While inside kubernetes there are nice deployment objects that you can rollback, there was nothing like that for maintaining kubernetes cluster as a whole (until tectonic came out, and in case of tectonic that would relate to only part of the system that already migrated to coreos/kube) We ended up creating a tool that allows us to build fully functional copies of production environments on demand, high fidelity to low fidelity Without it confident transition to kubernetes wouldn’t be possible Each developer gets kubernetes environment as it is deployed in production
  18. Speaker: Dan Tech Choices: F5 Mesos decision – just went to a conference.
  19. Speaker: Dan
  20. Speaker: Dan