SlideShare a Scribd company logo
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Kubernetes - 7 lessons learned from 7 data centers in 7 months
Kubernetes for Developers Meetup – May 13, 2019
Mike Tougeron –
Senior Site Reliability Engineer @
Adobe
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
$ whoami && id | grep Adobe
 Mike Tougeron
 Senior Site Reliability Engineer @ Adobe
 Twitter: @mtougeron
 Started using Kubernetes in 2015
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Agenda
 Quick Introduction to Adobe Advertising Cloud’s Kubernetes Infrastructure
 Lesson 1: Communication, Teamwork & Training
 Lesson 2: Code to production pipelines
 Lesson 3: The ABCs of Production apps
 Lesson 4: Multi-cloud challenges
 Lesson 5: Knowing your application
 Lesson 6: Metrics based monitoring
 Lesson 7: Take a deep breath
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
High Traffic
350 billion requests
a day
Latency
<50ms @ 95th
percentile
Huge Datasets
Billions of objects to
store
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Adobe Advertising Cloud’s Kubernetes Overview
 ~225 worker nodes; growing to ~300
in May/June
 6 OpenStack data centers across 4
regions
 Running on VMs
 No persistent storage
 No autoscaling; “fixed” footprint
 Smaller but growing
 3 AWS clusters in us-east-1
 Running on m5d.12xlarge ec2 instances
 EBS volumes for persistent storage
 Uses cluster-autoscaler
 Autoscaling events many times per hour
 Prometheus for monitoring
 Dozens of Machine Learning
workloads in AWS
 Reason for frequent autoscaling events
 Cluster updates done via new Image
and rolling update of existing nodes
 Updates are deployed approx every
4-6 weeks
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Lesson 1: Communication, Teamwork & Training
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Communication: Reaching large, distributed teams
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Teamwork: Who’s responsible for what?
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Abstraction vs Experts
 Need understanding of core
resources but also need easy
onboarding
 Pair programming training sessions
 Remove need for boiler plate
 Don’t duplicate efforts by avoiding
abstraction
 Don’t abstract to the point where
you’re not using Kubernetes
 kubectl should *not* be your
entrypoint
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Lesson 2: Code to production pipelines
De
v
Pull
Request
maste
r
Unit
testin
g
merge
Deplo
y bot
Production
Integration
testing
Insert your steps here!
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Tools to help build application resources
 Helm (templating and/or tiller)
 Kustomize
 Kapitan
 and more…
 We use a combination of Helm
templating for infrastructure/3rd-party
and Kustomize for application teams
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
$> helm template --name opa --namespace opa --values ./values/globals.yaml
--values ./values/mgmt/cluster.yaml --values ./values/mgmt/adcloud-
opa/values.yaml --output-dir ../../../cloud/opa/mgmt charts/adcloud-opa
versus
$> ./build.py --chart adcloud-opa --cluster mgmt
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Lesson 3: The ABCs of Production
 HorizontalPodAutoscaler
 PodDisruptionBudget
 "DevOps"
 Cluster Upgrades
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
HorizontalPodAutoscaler
 Easily scale on CPU or Memory usage
 Also able to scale on custom metrics like
http_requests from Ingress resources
 Don’t set replicas in your Deployment
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
PodDisruptionBudget
 Not the same thing as a Deployment
strategy
 Helps prevent taking down so many Pods
that the application is overwhelmed
 Can set by minAvailable or
maxUnavailable by number or
percentage
 Good for helping keep quorum
 Doesn’t apply to manual deletions
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
DevOps
 Expertise/specialists
 But empowerment & speed
 Things get lost in shuffle
 Everyone can do everything; aka don’t forget your guardrails
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
deny[msg] {
input.request.kind.kind = "Ingress"
input.request.operation = "CREATE"
host = input.request.object.spec.rules[_].host
ingress = ingresses[other_ns][other_ingress]
other_ns != input.request.namespace
ingress.spec.rules[_].host = host
msg = sprintf("invalid ingress host %q (conflicts with
%v/%v)", [host, other_ns, other_ingress])
}
patch[patchCode] {
isCreateOrUpdate
input.request.kind.kind == "Ingress"
not hasAnnotation(input.request.object,
"kubernetes.io/ingress.class")
patchCode =
makeAnnotationPatch("add",
"kubernetes.io/ingress.class", "nginx-
internal", "")
}
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Cluster Upgrades - Blue/Green or Canary?
 Who really has the hardware to run a 2nd full
Kubernetes cluster in their datacenter?
 Public cloud is easier, but you still have cost
considerations
 Are the application team(s) able to handle
deploying to a 2nd mirrored cluster?
 Does it make more sense to run N workers of a
different version/config for a period of time?
 Do you have the visibility into the cluster to know
how one performs vs the other?
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Lesson 4: Multi-Cloud Challenges
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Multiple code-bases but consistent infrastructure
 Packer – Shared modular code base, different builders
 Terraform – Separate but closely aligned code bases
 Puppet – Same code base
 Helm – Same modular code base
 Leverage templating to build the same deployments for
different (and future) clouds
 Re-use, re-use, re-use!
 Lab environments in all clouds
 OSSIA for HV/rack metadata for region/zone
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Lesson 5: Knowing your applications
 Seems like an obvious statement but it’s easy to forget to
think about
 Kubernetes brings advantages, but not all the ones that
bare metal and virtual machines bring out of the box
 Think about how your app actually functions
 Service Discovery
 Persistent Storage
 Shared Storage (e.g. replication, sharding, etc)
 Scheduling / Restarting
 Networking Ingress / Egress
 Think about how your app is going to handle the way
Kubernetes does things
https://imgur.com/gallery/B4D7Lf1
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Elasticsearch as Deployment (What We Did)
https://www.slideshare.net/JoergHenning/elasticsearch-on-kubernetes (slightly
modified)
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Oops…yeah Touge, I think something is wrong…
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Elasticsearch as StatefulSet (What We Should Have Done)
https://www.slideshare.net/JoergHenning/elasticsearch-on-kubernetes (slightly
modified)
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Lesson 6: Metrics-Based Monitoring
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Lesson 7: Take a deep breath
 Same team so we all learn & fix together
 Experience has been enlightening &
engineers have had fun
 Teams already onboarded are moving
faster than before
 Dev cycle to production is faster as we
integrate more automated testing
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Thanks!
Slides: https://touge.me/k8s-7lessons-meetup
Mike Tougeron
Email: tougeron@adobe.com
Twitter: @mtougeron
Images from https://stock.adobe.com

More Related Content

Similar to Kubernetes for Developers - 7 lessons learned from 7 data centers in 7 months (meetup)

Cloud native java workshop
Cloud native java workshopCloud native java workshop
Cloud native java workshop
Jamie Coleman
 
MicroShed Testing
MicroShed TestingMicroShed Testing
MicroShed Testing
Andrew Guibert
 
Journey to Cloud: Fast Track to Azure
Journey to Cloud: Fast Track to AzureJourney to Cloud: Fast Track to Azure
Journey to Cloud: Fast Track to Azure
Fausto Pasqualetti
 
Enabling and accelerating multi-tenancy with Capgemini Digital Cloud Platform...
Enabling and accelerating multi-tenancy with Capgemini Digital Cloud Platform...Enabling and accelerating multi-tenancy with Capgemini Digital Cloud Platform...
Enabling and accelerating multi-tenancy with Capgemini Digital Cloud Platform...
Capgemini
 
So you want to provision a test environment...
So you want to provision a test environment...So you want to provision a test environment...
So you want to provision a test environment...
DevOps.com
 
Azure fundamentals
Azure fundamentalsAzure fundamentals
Azure fundamentals
Alexandre BERGERE
 
Azure Day Rome Reloaded 2019 - Cloud Journey – FastTrack for Azure
Azure Day Rome Reloaded 2019 - Cloud Journey – FastTrack for AzureAzure Day Rome Reloaded 2019 - Cloud Journey – FastTrack for Azure
Azure Day Rome Reloaded 2019 - Cloud Journey – FastTrack for Azure
azuredayit
 
A Toolchain for Lean Architecture at American Airlines
A Toolchain for Lean Architecture at American AirlinesA Toolchain for Lean Architecture at American Airlines
A Toolchain for Lean Architecture at American Airlines
Shahir Daya
 
Multi cloud costs how to leverage insight and avoid overspending
Multi cloud costs  how to leverage insight and avoid overspendingMulti cloud costs  how to leverage insight and avoid overspending
Multi cloud costs how to leverage insight and avoid overspending
Appvia
 
Mobile cloud2020
Mobile cloud2020Mobile cloud2020
Mobile cloud2020
Arif A.
 
React Native App Development in 2023-Tips to Practice.pdf
React Native App Development in 2023-Tips to Practice.pdfReact Native App Development in 2023-Tips to Practice.pdf
React Native App Development in 2023-Tips to Practice.pdf
Techugo
 
Writing Applications at Cloud Scale
Writing Applications at Cloud ScaleWriting Applications at Cloud Scale
Writing Applications at Cloud Scale
Matt Ryan
 
Ensure the integration of Microservices with Consumer Driven Contracts
Ensure the integration of Microservices with Consumer Driven ContractsEnsure the integration of Microservices with Consumer Driven Contracts
Ensure the integration of Microservices with Consumer Driven Contracts
Ingo Griebsch
 
Azure
AzureAzure
Azure
AzureAzure
Emerging Cloud Migration Approaches
Emerging Cloud Migration ApproachesEmerging Cloud Migration Approaches
Emerging Cloud Migration Approaches
Arvind Viswanathan
 
Flutter App Performance Optimization_ Tips and Techniques.pdf
Flutter App Performance Optimization_ Tips and Techniques.pdfFlutter App Performance Optimization_ Tips and Techniques.pdf
Flutter App Performance Optimization_ Tips and Techniques.pdf
DianApps Technologies
 
How IBM is helping developers win the race to innovate with next-gen cloud se...
How IBM is helping developers win the race to innovate with next-gen cloud se...How IBM is helping developers win the race to innovate with next-gen cloud se...
How IBM is helping developers win the race to innovate with next-gen cloud se...
Michael Elder
 
Arif's PhD Defense (Title: Efficient Cloud Application Deployment in Distrib...
Arif's PhD Defense (Title:  Efficient Cloud Application Deployment in Distrib...Arif's PhD Defense (Title:  Efficient Cloud Application Deployment in Distrib...
Arif's PhD Defense (Title: Efficient Cloud Application Deployment in Distrib...
Arif A.
 
CI/CD Best Practices for Your DevOps Journey
CI/CD Best  Practices for Your DevOps JourneyCI/CD Best  Practices for Your DevOps Journey
CI/CD Best Practices for Your DevOps Journey
DevOps.com
 

Similar to Kubernetes for Developers - 7 lessons learned from 7 data centers in 7 months (meetup) (20)

Cloud native java workshop
Cloud native java workshopCloud native java workshop
Cloud native java workshop
 
MicroShed Testing
MicroShed TestingMicroShed Testing
MicroShed Testing
 
Journey to Cloud: Fast Track to Azure
Journey to Cloud: Fast Track to AzureJourney to Cloud: Fast Track to Azure
Journey to Cloud: Fast Track to Azure
 
Enabling and accelerating multi-tenancy with Capgemini Digital Cloud Platform...
Enabling and accelerating multi-tenancy with Capgemini Digital Cloud Platform...Enabling and accelerating multi-tenancy with Capgemini Digital Cloud Platform...
Enabling and accelerating multi-tenancy with Capgemini Digital Cloud Platform...
 
So you want to provision a test environment...
So you want to provision a test environment...So you want to provision a test environment...
So you want to provision a test environment...
 
Azure fundamentals
Azure fundamentalsAzure fundamentals
Azure fundamentals
 
Azure Day Rome Reloaded 2019 - Cloud Journey – FastTrack for Azure
Azure Day Rome Reloaded 2019 - Cloud Journey – FastTrack for AzureAzure Day Rome Reloaded 2019 - Cloud Journey – FastTrack for Azure
Azure Day Rome Reloaded 2019 - Cloud Journey – FastTrack for Azure
 
A Toolchain for Lean Architecture at American Airlines
A Toolchain for Lean Architecture at American AirlinesA Toolchain for Lean Architecture at American Airlines
A Toolchain for Lean Architecture at American Airlines
 
Multi cloud costs how to leverage insight and avoid overspending
Multi cloud costs  how to leverage insight and avoid overspendingMulti cloud costs  how to leverage insight and avoid overspending
Multi cloud costs how to leverage insight and avoid overspending
 
Mobile cloud2020
Mobile cloud2020Mobile cloud2020
Mobile cloud2020
 
React Native App Development in 2023-Tips to Practice.pdf
React Native App Development in 2023-Tips to Practice.pdfReact Native App Development in 2023-Tips to Practice.pdf
React Native App Development in 2023-Tips to Practice.pdf
 
Writing Applications at Cloud Scale
Writing Applications at Cloud ScaleWriting Applications at Cloud Scale
Writing Applications at Cloud Scale
 
Ensure the integration of Microservices with Consumer Driven Contracts
Ensure the integration of Microservices with Consumer Driven ContractsEnsure the integration of Microservices with Consumer Driven Contracts
Ensure the integration of Microservices with Consumer Driven Contracts
 
Azure
AzureAzure
Azure
 
Azure
AzureAzure
Azure
 
Emerging Cloud Migration Approaches
Emerging Cloud Migration ApproachesEmerging Cloud Migration Approaches
Emerging Cloud Migration Approaches
 
Flutter App Performance Optimization_ Tips and Techniques.pdf
Flutter App Performance Optimization_ Tips and Techniques.pdfFlutter App Performance Optimization_ Tips and Techniques.pdf
Flutter App Performance Optimization_ Tips and Techniques.pdf
 
How IBM is helping developers win the race to innovate with next-gen cloud se...
How IBM is helping developers win the race to innovate with next-gen cloud se...How IBM is helping developers win the race to innovate with next-gen cloud se...
How IBM is helping developers win the race to innovate with next-gen cloud se...
 
Arif's PhD Defense (Title: Efficient Cloud Application Deployment in Distrib...
Arif's PhD Defense (Title:  Efficient Cloud Application Deployment in Distrib...Arif's PhD Defense (Title:  Efficient Cloud Application Deployment in Distrib...
Arif's PhD Defense (Title: Efficient Cloud Application Deployment in Distrib...
 
CI/CD Best Practices for Your DevOps Journey
CI/CD Best  Practices for Your DevOps JourneyCI/CD Best  Practices for Your DevOps Journey
CI/CD Best Practices for Your DevOps Journey
 

Recently uploaded

Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 

Recently uploaded (20)

Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 

Kubernetes for Developers - 7 lessons learned from 7 data centers in 7 months (meetup)

  • 1. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Kubernetes - 7 lessons learned from 7 data centers in 7 months Kubernetes for Developers Meetup – May 13, 2019 Mike Tougeron – Senior Site Reliability Engineer @ Adobe
  • 2. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 $ whoami && id | grep Adobe  Mike Tougeron  Senior Site Reliability Engineer @ Adobe  Twitter: @mtougeron  Started using Kubernetes in 2015
  • 3. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Agenda  Quick Introduction to Adobe Advertising Cloud’s Kubernetes Infrastructure  Lesson 1: Communication, Teamwork & Training  Lesson 2: Code to production pipelines  Lesson 3: The ABCs of Production apps  Lesson 4: Multi-cloud challenges  Lesson 5: Knowing your application  Lesson 6: Metrics based monitoring  Lesson 7: Take a deep breath
  • 4. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 High Traffic 350 billion requests a day Latency <50ms @ 95th percentile Huge Datasets Billions of objects to store
  • 5. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Adobe Advertising Cloud’s Kubernetes Overview  ~225 worker nodes; growing to ~300 in May/June  6 OpenStack data centers across 4 regions  Running on VMs  No persistent storage  No autoscaling; “fixed” footprint  Smaller but growing  3 AWS clusters in us-east-1  Running on m5d.12xlarge ec2 instances  EBS volumes for persistent storage  Uses cluster-autoscaler  Autoscaling events many times per hour  Prometheus for monitoring  Dozens of Machine Learning workloads in AWS  Reason for frequent autoscaling events  Cluster updates done via new Image and rolling update of existing nodes  Updates are deployed approx every 4-6 weeks
  • 6. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
  • 7. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Lesson 1: Communication, Teamwork & Training
  • 8. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Communication: Reaching large, distributed teams
  • 9. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Teamwork: Who’s responsible for what?
  • 10. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Abstraction vs Experts  Need understanding of core resources but also need easy onboarding  Pair programming training sessions  Remove need for boiler plate  Don’t duplicate efforts by avoiding abstraction  Don’t abstract to the point where you’re not using Kubernetes  kubectl should *not* be your entrypoint
  • 11. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Lesson 2: Code to production pipelines De v Pull Request maste r Unit testin g merge Deplo y bot Production Integration testing Insert your steps here!
  • 12. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Tools to help build application resources  Helm (templating and/or tiller)  Kustomize  Kapitan  and more…  We use a combination of Helm templating for infrastructure/3rd-party and Kustomize for application teams
  • 13. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 $> helm template --name opa --namespace opa --values ./values/globals.yaml --values ./values/mgmt/cluster.yaml --values ./values/mgmt/adcloud- opa/values.yaml --output-dir ../../../cloud/opa/mgmt charts/adcloud-opa versus $> ./build.py --chart adcloud-opa --cluster mgmt
  • 14. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Lesson 3: The ABCs of Production  HorizontalPodAutoscaler  PodDisruptionBudget  "DevOps"  Cluster Upgrades
  • 15. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 HorizontalPodAutoscaler  Easily scale on CPU or Memory usage  Also able to scale on custom metrics like http_requests from Ingress resources  Don’t set replicas in your Deployment
  • 16. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 PodDisruptionBudget  Not the same thing as a Deployment strategy  Helps prevent taking down so many Pods that the application is overwhelmed  Can set by minAvailable or maxUnavailable by number or percentage  Good for helping keep quorum  Doesn’t apply to manual deletions
  • 17. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 DevOps  Expertise/specialists  But empowerment & speed  Things get lost in shuffle  Everyone can do everything; aka don’t forget your guardrails
  • 18. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 deny[msg] { input.request.kind.kind = "Ingress" input.request.operation = "CREATE" host = input.request.object.spec.rules[_].host ingress = ingresses[other_ns][other_ingress] other_ns != input.request.namespace ingress.spec.rules[_].host = host msg = sprintf("invalid ingress host %q (conflicts with %v/%v)", [host, other_ns, other_ingress]) } patch[patchCode] { isCreateOrUpdate input.request.kind.kind == "Ingress" not hasAnnotation(input.request.object, "kubernetes.io/ingress.class") patchCode = makeAnnotationPatch("add", "kubernetes.io/ingress.class", "nginx- internal", "") }
  • 19. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Cluster Upgrades - Blue/Green or Canary?  Who really has the hardware to run a 2nd full Kubernetes cluster in their datacenter?  Public cloud is easier, but you still have cost considerations  Are the application team(s) able to handle deploying to a 2nd mirrored cluster?  Does it make more sense to run N workers of a different version/config for a period of time?  Do you have the visibility into the cluster to know how one performs vs the other?
  • 20. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Lesson 4: Multi-Cloud Challenges
  • 21. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Multiple code-bases but consistent infrastructure  Packer – Shared modular code base, different builders  Terraform – Separate but closely aligned code bases  Puppet – Same code base  Helm – Same modular code base  Leverage templating to build the same deployments for different (and future) clouds  Re-use, re-use, re-use!  Lab environments in all clouds  OSSIA for HV/rack metadata for region/zone
  • 22. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Lesson 5: Knowing your applications  Seems like an obvious statement but it’s easy to forget to think about  Kubernetes brings advantages, but not all the ones that bare metal and virtual machines bring out of the box  Think about how your app actually functions  Service Discovery  Persistent Storage  Shared Storage (e.g. replication, sharding, etc)  Scheduling / Restarting  Networking Ingress / Egress  Think about how your app is going to handle the way Kubernetes does things https://imgur.com/gallery/B4D7Lf1
  • 23. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Elasticsearch as Deployment (What We Did) https://www.slideshare.net/JoergHenning/elasticsearch-on-kubernetes (slightly modified)
  • 24. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Oops…yeah Touge, I think something is wrong…
  • 25. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Elasticsearch as StatefulSet (What We Should Have Done) https://www.slideshare.net/JoergHenning/elasticsearch-on-kubernetes (slightly modified)
  • 26. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Lesson 6: Metrics-Based Monitoring
  • 27. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Lesson 7: Take a deep breath  Same team so we all learn & fix together  Experience has been enlightening & engineers have had fun  Teams already onboarded are moving faster than before  Dev cycle to production is faster as we integrate more automated testing
  • 28. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Thanks! Slides: https://touge.me/k8s-7lessons-meetup Mike Tougeron Email: tougeron@adobe.com Twitter: @mtougeron Images from https://stock.adobe.com

Editor's Notes

  1. Adobe Advertising Cloud allows you to manage video, display, and search advertising across traditional TV and digital formats.
  2. ./deploy-ami.py master --context aws-lab
  3. Repeat, repeat, repeat There's always a medium that someone doesn't read even if they are supposed to Shout it from the mountain top Still drives me nuts
  4. Deploybot deploys yaml after being committed to git Team A wrote app, Team X had failure, Who gets alerts? Assumptions made by all parties involved Same type of problem with Registry server All boils down to lack of communication
  5. Don’t have good answer for everyone Balance is key to success
  6. Crucial to success Slow pipeline slows down adoption & Creates friction Easy pipeline creates the “that’s it?” question far too often :)
  7. We chose canary  -  app teams are not far enough to support cross-cluster LB
  8. Most data warehousing and analytics processing happens in AWS Bidding and ad serving then happen in via one of our six Openstack regions throughout the world Allows us the best of both worlds Burstable compute and storage when we need it Cheap, fast, low-latency compute that the majority of our workload needs
  9. We re-used much of the AWS code, and adapted it to be modular based on the target cloud Consistency across clusters and clouds Write once, target OSSIA – Open Stack Simple Inventory API Written in-house by Mykola Moglyenko Allows us to tag pods by their physical location in the cage, and make decisions that evenly spread out workloads Adobe will be open-sourcing this tool this spring
  10. Does a fixed hostname make a difference? For example zookeeper How does the app/service save its state? In memory or on disk? What about cluster data? Is it sharded? Replicated? How well does it handle rescheduling? How do other applications or teams access the app/service?
  11. How many people have run an elasticsearch cluster, or at least know about elasticsearch? We followed a blog post to set it up in K8s. Not a bad thing! We just didn’t think in a kubernetes way It looked like this. This lived in our AWS cluster, where our ML jobs causing a lot of auto-scaling up and down Fair amount of volatility When we first deployed it, it worked! Then we upgraded our nodes, which meant draining and replacing them one at a time Lots of app rescheduling Lots of autoscaler activity
  12. While deploying new worker images to our nodes, we noticed this happening to elasticsearch Everything was suddenly in CLBO Unassigned primary and replicas When we got things back up, we found we had lost 7% of our data (this was in dev)
  13. Converted es-master deployment to a StatefulSet Makes sure that master nodes are gracefully removed and re-added, without impacting quorum Adjusted cluster deployment scripts Respect the pod disruption budget for longer timeouts Pre-cordon nodes Increase size of cluster before draining nodes Disabled the cluster-autoscaler (so the cluster will stay inflated)