SlideShare a Scribd company logo
Moving Our Entire Stack to K8S Within a Year - 7 Lessons Learned
October 12, 2018
Chris Homer
Co-Founder & CTO at thredUPddd
● Largest Consignment Store
● $130M+ invested
● 1000+ employees
● 4 distribution centers
● Kiev & SF Engineering Offices
● We’re Hiring!
Co-Founder & CTO at thredUP
Solution Specialist at Microsoft
Princeton University & Harvard Business School
Chris Homer - @chrishomer
Confidential 4
The thredUP Marketplace
● Convenient Pre-Paid Bag
● Earn Cash or Donate
● Do Good
● Amazing prices
● Wide assortment
● Fresh selection everyday
Confidential 5
Visualizing The thredUP Marketplace
Confidential 6
Operating the Marketplace
Confidential 7
Augmenting the Marketplace
Supplier
Scoring
Partners
Supply
Lifecycle
Quality +
Expected
Value
Proprietary
Pricing
Algorithm
Personalization
Search
Notifications
Discounting
Algorithm
Marketing
Confidential 8
● K8S Migration Begins
Infrastructure Timeline
A little history of our journey towards the promised-land
2010 201820152014 2016 2017 2019 ...2009
● Slicehost
● Manual Config
● Capistrano Deploy
● Manual Tests
● AWS Hosted
● Manual Saved AMI’s
● Staging & Dev - cleansed prod copy
● “Outsourcing DevOps”
● Back to Chef
● “Microservices”
● Hand-crafted Staging
● Chef
● Ansible all the things
● “Insourcing DevOps”
● Back to Ansible - One Source of Truth
● Infrastructure Team
● DevOps is about Culture
● Security Assessment
● Terraform
● Ansible Hardening
● Dynamic Staging
● Service Mesh
● DevSecOps● Docker & ECS “Attempt”
Confidential 9
The Current Infrastructure Stack
After the migration, the picture is getting clearer and increasingly rational
prod staging dev
Confidential 10
Why Docker & Kubernetes?
● Obviously because it’s cool & hype :)
● Popularity - widely supported
● Scalable & fault-tolerant out of the box
● Flexibility & deep control
● Standardization & ownership
● Speed up development lifecycle
● Encourage more & smaller services
● Linux Foundation & CNCF
Confidential 11
Learning #1 - Fear, Uncertainty & Doubt => Excitement & Ownership
● Not everyone will be on board
● Share the vision, explain the advantages, pains and short-comings
● A simple demo application helps “make it real”
● Emphasize that success requires app team and infra team ownership
● Cultivate champions and use their help
● Momentum is your friend
● Milestones are important for larger services
● Technical debt opportunities
● Knowledge sharing & workshops along the way and after
Confidential 12
Learning #2 - Pay close attention to performance
➢ Setup k8s VPS that is peered with prod
VPC
○ Redis
○ Memcached
○ Aurora
➢ scale haproxy instances
➢ update kubernetes nodes to c5.2xlarge
➢ disable ingress controller
➢ disable kubeDNS
Confidential 13
Learning #2 - Pay close attention to performance
ec2 response time p90
k8s response time p90
Confidential 14
Learning #2 cont’d - Internal communication is way faster
access by
cluster IP
access by
public DNS name
Confidential 15
Learning #3 - Liveness probe is not always your friend
Response time
time
k8s healthcheck timeout
External Request
Our Code
Confidential 16
Learning #3 - Liveness probe is not always your friend
Response time
time
Confidential 17
Many DNS errors and ~5 seconds delays
Learning #4 – DNS
Confidential 18
Many DNS errors and ~5 seconds delays
Learning #4 – DNS
● It’s a well-known issue with UDP & Dynamic NAT
● It has a bug report - https://github.com/kubernetes/kubernetes/issues/56903
● And good problem explanation https://www.weave.works/blog/racy-conntrack-and-dns-
lookup-timeouts
Solution – use TCP as a protocol
dnsConfig:
options:
- name: use-vc
dnsPolicy: ClusterFirst
Another Solution
dnsConfig:
options:
- name: single-request-reopen
dnsPolicy: ClusterFirst
Confidential 19
Learning #5 - Too many open files
Ok, Google =)
max_user_watches=8192 → this looks too low, let's bump it a little!
That did seem to help … For some time ....
Confidential 20
Learning #5 - Too many open files
Spikes seem to correlate with POD Crash Loops? Why?
Confidential 21
Learning #5 - Too many open files
Logaggregatorclient
Docker
container
container
container
log file
log file
log file
fd
fd
fd
Confidential 22
Learning #5 - Too many open files
Docker
container
container
container
log file
log file
log file
fd
fd
container log file
fd
fd
fd
fdfd
Logaggregatorclient
These are still opened
Confidential 23
Learning #6 - Pod Distribution after Cluster Maintenance
Worker node#1 Worker node#2 Worker node#3
Service A pod Service A pod Service A pod
Confidential 24
Learning #6 - Pod Distribution after Cluster Maintenance
Under
Maintenance
Worker node#1 Worker node#2 Worker node#3
Service A pod Service A pod
Service A pod
Service A pod
Confidential 25
Learning #6 - Pod Distribution after Cluster Maintenance
Alive and
functioning
Worker node#1 Worker node#2 Worker node#3
Service A pod Service A pod
Service A pod
Confidential 26
Learning #6 - Pod Distribution after Cluster Maintenance
Worker node#1 Worker node#2 Worker node#3
Service A pod Service A pod
Service A pod
Confidential 27
Learning #6 - Pod Distribution after Cluster Maintenance
Worker node#1 Worker node#2 Worker node#3
Service A pod Service A pod
Service A pod
Service A pod
Service A pod
All traffic goes here
Confidential 28
Learning #6 - Pod Distribution after Cluster Maintenance
Worker node#1 Worker node#2 Worker node#3
Service A pod Service A pod
Service A pod
Service A pod
Service A pod
All traffic goes here
Solution: Redeploy to redistribute pods
Confidential 29
Learning #6 - Pod Distribution after Cluster Maintenance
Worker node#1 Worker node#2 Worker node#3
Service A pod Service A pod Service A pod
Confidential 30
Learning #7 - Building Docker Images within the K8s Cluster
Kubernetes worker node
Docker
daemon
docker.sock
Jenkins
slave
Container BContainer A
Docker cli
Containers
Jenkinsfile
docker build ...
docker build ...
Confidential 31
Learning #7 - Building Docker Images within the K8s Cluster
Kubernetes worker node
Docker
daemon
docker.sock
Jenkins
slave
Container BContainer A
Docker cli
Containers
Jenkinsfile
docker rm ...
docker rm ...
Confidential 32
Learning #7 - Building Docker Images within the K8s Cluster
Kubernetes worker node
Docker
daemon
Jenkins
slave
Container BContainer A
Docker cli
Containers
Separate ec2 instance
Confidential 33
Was it worth it? YES!
● Deployment time halved ~ (main service – from 12 min to 5 min)
● Rollback is very easy and fast (nearly instant)
● Hardware provisioned decreased by a factor of 3
● Pods autoscaling eliminated manual work to support traffic spikes
● System level upgrades are now non-blocking and easy to execute
● Time to provision and deploy a new service in production changed from
days/weeks to minutes/hours
● Each project has its own simple helm chart in a project repo ~ 3200
ansible config files deprecated.
Confidential 34
What’s next?
● Dynamic Staging Environments
○ Encourage better development workflow
○ Easily enable cross-team review with design, marketing and others
● Telepresence for Complex Local Development
○ Easier onboarding & dev env refresh
○ More consistent behavior with production
● End-to-end integration suite
● Iterate for Improvements
○ Faster builds
○ Cluster Performance
○ Observability
○ Cost Improvements
● Service mesh with Istio
Confidential 35
Thank You!
chris@thredup.com
@chrishomer
PS. We’re Hiring :)

More Related Content

What's hot

從系統思考看 DevOps:以 microservices 為例 (DevOps: a system dynamics perspective)
從系統思考看 DevOps:以 microservices 為例 (DevOps: a system dynamics perspective)從系統思考看 DevOps:以 microservices 為例 (DevOps: a system dynamics perspective)
從系統思考看 DevOps:以 microservices 為例 (DevOps: a system dynamics perspective)
William Yeh
 
Docker Dhahran Nov 2016 meetup
Docker Dhahran Nov 2016 meetupDocker Dhahran Nov 2016 meetup
Docker Dhahran Nov 2016 meetup
Walid Shaari
 
OpenShift As A DevOps Platform
OpenShift As A DevOps PlatformOpenShift As A DevOps Platform
OpenShift As A DevOps Platform
Lalatendu Mohanty
 
Кирилл Толкачев. Микросервисы: огонь, вода и девопс
Кирилл Толкачев. Микросервисы: огонь, вода и девопсКирилл Толкачев. Микросервисы: огонь, вода и девопс
Кирилл Толкачев. Микросервисы: огонь, вода и девопс
ScrumTrek
 
Reproducible development to live applications with Red Hat CDK and Red Hat Op...
Reproducible development to live applications with Red Hat CDK and Red Hat Op...Reproducible development to live applications with Red Hat CDK and Red Hat Op...
Reproducible development to live applications with Red Hat CDK and Red Hat Op...
Lalatendu Mohanty
 
Jelastic Docker Orchestrator
Jelastic Docker OrchestratorJelastic Docker Orchestrator
Jelastic Docker Orchestrator
Hidora
 
Hands-on GitOps Patterns for Helm Users
Hands-on GitOps Patterns for Helm UsersHands-on GitOps Patterns for Helm Users
Hands-on GitOps Patterns for Helm Users
Weaveworks
 
Perforce Innovations Showcase 
Perforce Innovations Showcase Perforce Innovations Showcase 
Perforce Innovations Showcase 
Perforce
 
Modern Post-Exploitation Strategies - 44CON 2012
Modern Post-Exploitation Strategies - 44CON 2012Modern Post-Exploitation Strategies - 44CON 2012
Modern Post-Exploitation Strategies - 44CON 2012
44CON
 
Multi-container Applications on OpenShift with the Ansible Service Broker Mul...
Multi-container Applications on OpenShift with the Ansible Service Broker Mul...Multi-container Applications on OpenShift with the Ansible Service Broker Mul...
Multi-container Applications on OpenShift with the Ansible Service Broker Mul...
Amazon Web Services
 
QCon SF 2017 - Microservices: Service-Oriented Development
QCon SF 2017 - Microservices: Service-Oriented DevelopmentQCon SF 2017 - Microservices: Service-Oriented Development
QCon SF 2017 - Microservices: Service-Oriented Development
Ambassador Labs
 
Enterprise-Grade DevOps Solutions for a Start Up Budget
Enterprise-Grade DevOps Solutions for a Start Up BudgetEnterprise-Grade DevOps Solutions for a Start Up Budget
Enterprise-Grade DevOps Solutions for a Start Up Budget
DevOps.com
 
CNCF Webinar Series: "Creating an Effective Developer Experience on Kubernetes"
CNCF Webinar Series: "Creating an Effective Developer Experience on Kubernetes"CNCF Webinar Series: "Creating an Effective Developer Experience on Kubernetes"
CNCF Webinar Series: "Creating an Effective Developer Experience on Kubernetes"
Daniel Bryant
 
[RHFSeoul2017]6 Steps to Transform Enterprise Applications
[RHFSeoul2017]6 Steps to Transform Enterprise Applications[RHFSeoul2017]6 Steps to Transform Enterprise Applications
[RHFSeoul2017]6 Steps to Transform Enterprise Applications
Daniel Oh
 
Puzzle ITC Talk @Docker CH meetup CI CD_with_Openshift_0.2
Puzzle ITC Talk @Docker CH meetup CI CD_with_Openshift_0.2Puzzle ITC Talk @Docker CH meetup CI CD_with_Openshift_0.2
Puzzle ITC Talk @Docker CH meetup CI CD_with_Openshift_0.2
Amrita Prasad
 
Aws ug dxb 2021 container series iv
Aws ug dxb 2021 container series  ivAws ug dxb 2021 container series  iv
Aws ug dxb 2021 container series iv
Walid Shaari
 
Setting up Notifications, Alerts & Webhooks with Flux v2 by Alison Dowdney
Setting up Notifications, Alerts & Webhooks with Flux v2 by Alison DowdneySetting up Notifications, Alerts & Webhooks with Flux v2 by Alison Dowdney
Setting up Notifications, Alerts & Webhooks with Flux v2 by Alison Dowdney
Weaveworks
 
TDD anche su iOS
TDD anche su iOSTDD anche su iOS
TDD anche su iOS
Andrea Francia
 
Jfrog artifactory artifact management c tamilmaran presentation - copy
Jfrog artifactory artifact management c tamilmaran presentation - copyJfrog artifactory artifact management c tamilmaran presentation - copy
Jfrog artifactory artifact management c tamilmaran presentation - copy
TAMILMARAN C
 
Enabling Cloud Native Buildpacks for Windows Containers
Enabling Cloud Native Buildpacks for Windows ContainersEnabling Cloud Native Buildpacks for Windows Containers
Enabling Cloud Native Buildpacks for Windows Containers
VMware Tanzu
 

What's hot (20)

從系統思考看 DevOps:以 microservices 為例 (DevOps: a system dynamics perspective)
從系統思考看 DevOps:以 microservices 為例 (DevOps: a system dynamics perspective)從系統思考看 DevOps:以 microservices 為例 (DevOps: a system dynamics perspective)
從系統思考看 DevOps:以 microservices 為例 (DevOps: a system dynamics perspective)
 
Docker Dhahran Nov 2016 meetup
Docker Dhahran Nov 2016 meetupDocker Dhahran Nov 2016 meetup
Docker Dhahran Nov 2016 meetup
 
OpenShift As A DevOps Platform
OpenShift As A DevOps PlatformOpenShift As A DevOps Platform
OpenShift As A DevOps Platform
 
Кирилл Толкачев. Микросервисы: огонь, вода и девопс
Кирилл Толкачев. Микросервисы: огонь, вода и девопсКирилл Толкачев. Микросервисы: огонь, вода и девопс
Кирилл Толкачев. Микросервисы: огонь, вода и девопс
 
Reproducible development to live applications with Red Hat CDK and Red Hat Op...
Reproducible development to live applications with Red Hat CDK and Red Hat Op...Reproducible development to live applications with Red Hat CDK and Red Hat Op...
Reproducible development to live applications with Red Hat CDK and Red Hat Op...
 
Jelastic Docker Orchestrator
Jelastic Docker OrchestratorJelastic Docker Orchestrator
Jelastic Docker Orchestrator
 
Hands-on GitOps Patterns for Helm Users
Hands-on GitOps Patterns for Helm UsersHands-on GitOps Patterns for Helm Users
Hands-on GitOps Patterns for Helm Users
 
Perforce Innovations Showcase 
Perforce Innovations Showcase Perforce Innovations Showcase 
Perforce Innovations Showcase 
 
Modern Post-Exploitation Strategies - 44CON 2012
Modern Post-Exploitation Strategies - 44CON 2012Modern Post-Exploitation Strategies - 44CON 2012
Modern Post-Exploitation Strategies - 44CON 2012
 
Multi-container Applications on OpenShift with the Ansible Service Broker Mul...
Multi-container Applications on OpenShift with the Ansible Service Broker Mul...Multi-container Applications on OpenShift with the Ansible Service Broker Mul...
Multi-container Applications on OpenShift with the Ansible Service Broker Mul...
 
QCon SF 2017 - Microservices: Service-Oriented Development
QCon SF 2017 - Microservices: Service-Oriented DevelopmentQCon SF 2017 - Microservices: Service-Oriented Development
QCon SF 2017 - Microservices: Service-Oriented Development
 
Enterprise-Grade DevOps Solutions for a Start Up Budget
Enterprise-Grade DevOps Solutions for a Start Up BudgetEnterprise-Grade DevOps Solutions for a Start Up Budget
Enterprise-Grade DevOps Solutions for a Start Up Budget
 
CNCF Webinar Series: "Creating an Effective Developer Experience on Kubernetes"
CNCF Webinar Series: "Creating an Effective Developer Experience on Kubernetes"CNCF Webinar Series: "Creating an Effective Developer Experience on Kubernetes"
CNCF Webinar Series: "Creating an Effective Developer Experience on Kubernetes"
 
[RHFSeoul2017]6 Steps to Transform Enterprise Applications
[RHFSeoul2017]6 Steps to Transform Enterprise Applications[RHFSeoul2017]6 Steps to Transform Enterprise Applications
[RHFSeoul2017]6 Steps to Transform Enterprise Applications
 
Puzzle ITC Talk @Docker CH meetup CI CD_with_Openshift_0.2
Puzzle ITC Talk @Docker CH meetup CI CD_with_Openshift_0.2Puzzle ITC Talk @Docker CH meetup CI CD_with_Openshift_0.2
Puzzle ITC Talk @Docker CH meetup CI CD_with_Openshift_0.2
 
Aws ug dxb 2021 container series iv
Aws ug dxb 2021 container series  ivAws ug dxb 2021 container series  iv
Aws ug dxb 2021 container series iv
 
Setting up Notifications, Alerts & Webhooks with Flux v2 by Alison Dowdney
Setting up Notifications, Alerts & Webhooks with Flux v2 by Alison DowdneySetting up Notifications, Alerts & Webhooks with Flux v2 by Alison Dowdney
Setting up Notifications, Alerts & Webhooks with Flux v2 by Alison Dowdney
 
TDD anche su iOS
TDD anche su iOSTDD anche su iOS
TDD anche su iOS
 
Jfrog artifactory artifact management c tamilmaran presentation - copy
Jfrog artifactory artifact management c tamilmaran presentation - copyJfrog artifactory artifact management c tamilmaran presentation - copy
Jfrog artifactory artifact management c tamilmaran presentation - copy
 
Enabling Cloud Native Buildpacks for Windows Containers
Enabling Cloud Native Buildpacks for Windows ContainersEnabling Cloud Native Buildpacks for Windows Containers
Enabling Cloud Native Buildpacks for Windows Containers
 

Similar to Chris Homer - Moving the entire stack to k8s within a year – lessons learned

Truemotion Adventures in Containerization
Truemotion Adventures in ContainerizationTruemotion Adventures in Containerization
Truemotion Adventures in Containerization
Ryan Hunter
 
Container orchestration and microservices world
Container orchestration and microservices worldContainer orchestration and microservices world
Container orchestration and microservices world
Karol Chrapek
 
Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016
aspyker
 
Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016
Sharma Podila
 
Container Days
Container DaysContainer Days
Container Days
Patrick Mizer
 
ContainerDays NYC 2015: "Easing Your Way Into Docker: Lessons From a Journey ...
ContainerDays NYC 2015: "Easing Your Way Into Docker: Lessons From a Journey ...ContainerDays NYC 2015: "Easing Your Way Into Docker: Lessons From a Journey ...
ContainerDays NYC 2015: "Easing Your Way Into Docker: Lessons From a Journey ...
DynamicInfraDays
 
Migrating .NET Apps to CF, A Strategy for Enterprises
Migrating .NET Apps to CF, A Strategy for EnterprisesMigrating .NET Apps to CF, A Strategy for Enterprises
Migrating .NET Apps to CF, A Strategy for Enterprises
VMware Tanzu
 
Webinar : Docker in Production
Webinar : Docker in ProductionWebinar : Docker in Production
Webinar : Docker in Production
Newt Global Consulting LLC
 
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
Haggai Philip Zagury
 
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOpsDevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
Ambassador Labs
 
Dockerizing Aurea - Docker Con EU 2017
Dockerizing Aurea - Docker Con EU 2017Dockerizing Aurea - Docker Con EU 2017
Dockerizing Aurea - Docker Con EU 2017
Matias Lespiau
 
Scalable Clusters On Demand
Scalable Clusters On DemandScalable Clusters On Demand
Scalable Clusters On Demand
Bogdan Kyryliuk
 
Kubernetes Clusters At Scale: Managing Hundreds Apache Pinot Kubernetes Clust...
Kubernetes Clusters At Scale: Managing Hundreds Apache Pinot Kubernetes Clust...Kubernetes Clusters At Scale: Managing Hundreds Apache Pinot Kubernetes Clust...
Kubernetes Clusters At Scale: Managing Hundreds Apache Pinot Kubernetes Clust...
Xiaoman DONG
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
Helder Klemp
 
Ridwan Fadjar Septian PyCon ID 2021 Regular Talk - django application monitor...
Ridwan Fadjar Septian PyCon ID 2021 Regular Talk - django application monitor...Ridwan Fadjar Septian PyCon ID 2021 Regular Talk - django application monitor...
Ridwan Fadjar Septian PyCon ID 2021 Regular Talk - django application monitor...
Ridwan Fadjar
 
Docker Bday #5, SF Edition: Introduction to Docker
Docker Bday #5, SF Edition: Introduction to DockerDocker Bday #5, SF Edition: Introduction to Docker
Docker Bday #5, SF Edition: Introduction to Docker
Docker, Inc.
 
Kubernetes Navigation Stories – DevOpsStage 2019, Kyiv
Kubernetes Navigation Stories – DevOpsStage 2019, KyivKubernetes Navigation Stories – DevOpsStage 2019, Kyiv
Kubernetes Navigation Stories – DevOpsStage 2019, Kyiv
Aleksey Asiutin
 
Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)
Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)
Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)
Cédrick Lunven
 
AWS-CICD_MULESOFT
AWS-CICD_MULESOFTAWS-CICD_MULESOFT
AWS-CICD_MULESOFT
shiva310211
 
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
NETWAYS
 

Similar to Chris Homer - Moving the entire stack to k8s within a year – lessons learned (20)

Truemotion Adventures in Containerization
Truemotion Adventures in ContainerizationTruemotion Adventures in Containerization
Truemotion Adventures in Containerization
 
Container orchestration and microservices world
Container orchestration and microservices worldContainer orchestration and microservices world
Container orchestration and microservices world
 
Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016
 
Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016
 
Container Days
Container DaysContainer Days
Container Days
 
ContainerDays NYC 2015: "Easing Your Way Into Docker: Lessons From a Journey ...
ContainerDays NYC 2015: "Easing Your Way Into Docker: Lessons From a Journey ...ContainerDays NYC 2015: "Easing Your Way Into Docker: Lessons From a Journey ...
ContainerDays NYC 2015: "Easing Your Way Into Docker: Lessons From a Journey ...
 
Migrating .NET Apps to CF, A Strategy for Enterprises
Migrating .NET Apps to CF, A Strategy for EnterprisesMigrating .NET Apps to CF, A Strategy for Enterprises
Migrating .NET Apps to CF, A Strategy for Enterprises
 
Webinar : Docker in Production
Webinar : Docker in ProductionWebinar : Docker in Production
Webinar : Docker in Production
 
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
 
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOpsDevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
 
Dockerizing Aurea - Docker Con EU 2017
Dockerizing Aurea - Docker Con EU 2017Dockerizing Aurea - Docker Con EU 2017
Dockerizing Aurea - Docker Con EU 2017
 
Scalable Clusters On Demand
Scalable Clusters On DemandScalable Clusters On Demand
Scalable Clusters On Demand
 
Kubernetes Clusters At Scale: Managing Hundreds Apache Pinot Kubernetes Clust...
Kubernetes Clusters At Scale: Managing Hundreds Apache Pinot Kubernetes Clust...Kubernetes Clusters At Scale: Managing Hundreds Apache Pinot Kubernetes Clust...
Kubernetes Clusters At Scale: Managing Hundreds Apache Pinot Kubernetes Clust...
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
Ridwan Fadjar Septian PyCon ID 2021 Regular Talk - django application monitor...
Ridwan Fadjar Septian PyCon ID 2021 Regular Talk - django application monitor...Ridwan Fadjar Septian PyCon ID 2021 Regular Talk - django application monitor...
Ridwan Fadjar Septian PyCon ID 2021 Regular Talk - django application monitor...
 
Docker Bday #5, SF Edition: Introduction to Docker
Docker Bday #5, SF Edition: Introduction to DockerDocker Bday #5, SF Edition: Introduction to Docker
Docker Bday #5, SF Edition: Introduction to Docker
 
Kubernetes Navigation Stories – DevOpsStage 2019, Kyiv
Kubernetes Navigation Stories – DevOpsStage 2019, KyivKubernetes Navigation Stories – DevOpsStage 2019, Kyiv
Kubernetes Navigation Stories – DevOpsStage 2019, Kyiv
 
Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)
Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)
Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)
 
AWS-CICD_MULESOFT
AWS-CICD_MULESOFTAWS-CICD_MULESOFT
AWS-CICD_MULESOFT
 
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
 

More from Dariia Seimova

juliya tkachova - dev ops on scale from philosophy to toolset
juliya tkachova - dev ops on scale from philosophy to toolsetjuliya tkachova - dev ops on scale from philosophy to toolset
juliya tkachova - dev ops on scale from philosophy to toolset
Dariia Seimova
 
rohit sharma - dev ops virtual assistant - automate devops stuffs using nlp a...
rohit sharma - dev ops virtual assistant - automate devops stuffs using nlp a...rohit sharma - dev ops virtual assistant - automate devops stuffs using nlp a...
rohit sharma - dev ops virtual assistant - automate devops stuffs using nlp a...
Dariia Seimova
 
ostap soroka - aws architecture and a human body
ostap soroka - aws architecture and a human bodyostap soroka - aws architecture and a human body
ostap soroka - aws architecture and a human body
Dariia Seimova
 
sveta smirnova - my sql performance schema in action
sveta smirnova - my sql performance schema in actionsveta smirnova - my sql performance schema in action
sveta smirnova - my sql performance schema in action
Dariia Seimova
 
faisal mushtaq - an enterprise cloud cost management framework
faisal mushtaq - an enterprise cloud cost management frameworkfaisal mushtaq - an enterprise cloud cost management framework
faisal mushtaq - an enterprise cloud cost management framework
Dariia Seimova
 
mykola marzhan - jenkins on aws spot instance
mykola marzhan - jenkins on aws spot instancemykola marzhan - jenkins on aws spot instance
mykola marzhan - jenkins on aws spot instance
Dariia Seimova
 
maksym vlasov - culture of git as roots of your ci
maksym vlasov - culture of git as roots of your cimaksym vlasov - culture of git as roots of your ci
maksym vlasov - culture of git as roots of your ci
Dariia Seimova
 
vitaly davidoff - end 2 end containers secure sdlc process
vitaly davidoff - end 2 end containers secure sdlc processvitaly davidoff - end 2 end containers secure sdlc process
vitaly davidoff - end 2 end containers secure sdlc process
Dariia Seimova
 
yegor maksymchuk - open shift as a cloud for data science
yegor maksymchuk - open shift as a cloud for data scienceyegor maksymchuk - open shift as a cloud for data science
yegor maksymchuk - open shift as a cloud for data science
Dariia Seimova
 

More from Dariia Seimova (9)

juliya tkachova - dev ops on scale from philosophy to toolset
juliya tkachova - dev ops on scale from philosophy to toolsetjuliya tkachova - dev ops on scale from philosophy to toolset
juliya tkachova - dev ops on scale from philosophy to toolset
 
rohit sharma - dev ops virtual assistant - automate devops stuffs using nlp a...
rohit sharma - dev ops virtual assistant - automate devops stuffs using nlp a...rohit sharma - dev ops virtual assistant - automate devops stuffs using nlp a...
rohit sharma - dev ops virtual assistant - automate devops stuffs using nlp a...
 
ostap soroka - aws architecture and a human body
ostap soroka - aws architecture and a human bodyostap soroka - aws architecture and a human body
ostap soroka - aws architecture and a human body
 
sveta smirnova - my sql performance schema in action
sveta smirnova - my sql performance schema in actionsveta smirnova - my sql performance schema in action
sveta smirnova - my sql performance schema in action
 
faisal mushtaq - an enterprise cloud cost management framework
faisal mushtaq - an enterprise cloud cost management frameworkfaisal mushtaq - an enterprise cloud cost management framework
faisal mushtaq - an enterprise cloud cost management framework
 
mykola marzhan - jenkins on aws spot instance
mykola marzhan - jenkins on aws spot instancemykola marzhan - jenkins on aws spot instance
mykola marzhan - jenkins on aws spot instance
 
maksym vlasov - culture of git as roots of your ci
maksym vlasov - culture of git as roots of your cimaksym vlasov - culture of git as roots of your ci
maksym vlasov - culture of git as roots of your ci
 
vitaly davidoff - end 2 end containers secure sdlc process
vitaly davidoff - end 2 end containers secure sdlc processvitaly davidoff - end 2 end containers secure sdlc process
vitaly davidoff - end 2 end containers secure sdlc process
 
yegor maksymchuk - open shift as a cloud for data science
yegor maksymchuk - open shift as a cloud for data scienceyegor maksymchuk - open shift as a cloud for data science
yegor maksymchuk - open shift as a cloud for data science
 

Recently uploaded

How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
Celine George
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
WaniBasim
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
Dr. Mulla Adam Ali
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
TechSoup
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
GeorgeMilliken2
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
RAHUL
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
Celine George
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
IreneSebastianRueco1
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Fajar Baskoro
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
Life upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for studentLife upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for student
NgcHiNguyn25
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
Community pharmacy- Social and preventive pharmacy UNIT 5
Community pharmacy- Social and preventive pharmacy UNIT 5Community pharmacy- Social and preventive pharmacy UNIT 5
Community pharmacy- Social and preventive pharmacy UNIT 5
sayalidalavi006
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 

Recently uploaded (20)

How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
Life upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for studentLife upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for student
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
Community pharmacy- Social and preventive pharmacy UNIT 5
Community pharmacy- Social and preventive pharmacy UNIT 5Community pharmacy- Social and preventive pharmacy UNIT 5
Community pharmacy- Social and preventive pharmacy UNIT 5
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 

Chris Homer - Moving the entire stack to k8s within a year – lessons learned

  • 1. Moving Our Entire Stack to K8S Within a Year - 7 Lessons Learned October 12, 2018
  • 2. Chris Homer Co-Founder & CTO at thredUPddd ● Largest Consignment Store ● $130M+ invested ● 1000+ employees ● 4 distribution centers ● Kiev & SF Engineering Offices ● We’re Hiring! Co-Founder & CTO at thredUP Solution Specialist at Microsoft Princeton University & Harvard Business School Chris Homer - @chrishomer
  • 3.
  • 4. Confidential 4 The thredUP Marketplace ● Convenient Pre-Paid Bag ● Earn Cash or Donate ● Do Good ● Amazing prices ● Wide assortment ● Fresh selection everyday
  • 5. Confidential 5 Visualizing The thredUP Marketplace
  • 7. Confidential 7 Augmenting the Marketplace Supplier Scoring Partners Supply Lifecycle Quality + Expected Value Proprietary Pricing Algorithm Personalization Search Notifications Discounting Algorithm Marketing
  • 8. Confidential 8 ● K8S Migration Begins Infrastructure Timeline A little history of our journey towards the promised-land 2010 201820152014 2016 2017 2019 ...2009 ● Slicehost ● Manual Config ● Capistrano Deploy ● Manual Tests ● AWS Hosted ● Manual Saved AMI’s ● Staging & Dev - cleansed prod copy ● “Outsourcing DevOps” ● Back to Chef ● “Microservices” ● Hand-crafted Staging ● Chef ● Ansible all the things ● “Insourcing DevOps” ● Back to Ansible - One Source of Truth ● Infrastructure Team ● DevOps is about Culture ● Security Assessment ● Terraform ● Ansible Hardening ● Dynamic Staging ● Service Mesh ● DevSecOps● Docker & ECS “Attempt”
  • 9. Confidential 9 The Current Infrastructure Stack After the migration, the picture is getting clearer and increasingly rational prod staging dev
  • 10. Confidential 10 Why Docker & Kubernetes? ● Obviously because it’s cool & hype :) ● Popularity - widely supported ● Scalable & fault-tolerant out of the box ● Flexibility & deep control ● Standardization & ownership ● Speed up development lifecycle ● Encourage more & smaller services ● Linux Foundation & CNCF
  • 11. Confidential 11 Learning #1 - Fear, Uncertainty & Doubt => Excitement & Ownership ● Not everyone will be on board ● Share the vision, explain the advantages, pains and short-comings ● A simple demo application helps “make it real” ● Emphasize that success requires app team and infra team ownership ● Cultivate champions and use their help ● Momentum is your friend ● Milestones are important for larger services ● Technical debt opportunities ● Knowledge sharing & workshops along the way and after
  • 12. Confidential 12 Learning #2 - Pay close attention to performance ➢ Setup k8s VPS that is peered with prod VPC ○ Redis ○ Memcached ○ Aurora ➢ scale haproxy instances ➢ update kubernetes nodes to c5.2xlarge ➢ disable ingress controller ➢ disable kubeDNS
  • 13. Confidential 13 Learning #2 - Pay close attention to performance ec2 response time p90 k8s response time p90
  • 14. Confidential 14 Learning #2 cont’d - Internal communication is way faster access by cluster IP access by public DNS name
  • 15. Confidential 15 Learning #3 - Liveness probe is not always your friend Response time time k8s healthcheck timeout External Request Our Code
  • 16. Confidential 16 Learning #3 - Liveness probe is not always your friend Response time time
  • 17. Confidential 17 Many DNS errors and ~5 seconds delays Learning #4 – DNS
  • 18. Confidential 18 Many DNS errors and ~5 seconds delays Learning #4 – DNS ● It’s a well-known issue with UDP & Dynamic NAT ● It has a bug report - https://github.com/kubernetes/kubernetes/issues/56903 ● And good problem explanation https://www.weave.works/blog/racy-conntrack-and-dns- lookup-timeouts Solution – use TCP as a protocol dnsConfig: options: - name: use-vc dnsPolicy: ClusterFirst Another Solution dnsConfig: options: - name: single-request-reopen dnsPolicy: ClusterFirst
  • 19. Confidential 19 Learning #5 - Too many open files Ok, Google =) max_user_watches=8192 → this looks too low, let's bump it a little! That did seem to help … For some time ....
  • 20. Confidential 20 Learning #5 - Too many open files Spikes seem to correlate with POD Crash Loops? Why?
  • 21. Confidential 21 Learning #5 - Too many open files Logaggregatorclient Docker container container container log file log file log file fd fd fd
  • 22. Confidential 22 Learning #5 - Too many open files Docker container container container log file log file log file fd fd container log file fd fd fd fdfd Logaggregatorclient These are still opened
  • 23. Confidential 23 Learning #6 - Pod Distribution after Cluster Maintenance Worker node#1 Worker node#2 Worker node#3 Service A pod Service A pod Service A pod
  • 24. Confidential 24 Learning #6 - Pod Distribution after Cluster Maintenance Under Maintenance Worker node#1 Worker node#2 Worker node#3 Service A pod Service A pod Service A pod Service A pod
  • 25. Confidential 25 Learning #6 - Pod Distribution after Cluster Maintenance Alive and functioning Worker node#1 Worker node#2 Worker node#3 Service A pod Service A pod Service A pod
  • 26. Confidential 26 Learning #6 - Pod Distribution after Cluster Maintenance Worker node#1 Worker node#2 Worker node#3 Service A pod Service A pod Service A pod
  • 27. Confidential 27 Learning #6 - Pod Distribution after Cluster Maintenance Worker node#1 Worker node#2 Worker node#3 Service A pod Service A pod Service A pod Service A pod Service A pod All traffic goes here
  • 28. Confidential 28 Learning #6 - Pod Distribution after Cluster Maintenance Worker node#1 Worker node#2 Worker node#3 Service A pod Service A pod Service A pod Service A pod Service A pod All traffic goes here Solution: Redeploy to redistribute pods
  • 29. Confidential 29 Learning #6 - Pod Distribution after Cluster Maintenance Worker node#1 Worker node#2 Worker node#3 Service A pod Service A pod Service A pod
  • 30. Confidential 30 Learning #7 - Building Docker Images within the K8s Cluster Kubernetes worker node Docker daemon docker.sock Jenkins slave Container BContainer A Docker cli Containers Jenkinsfile docker build ... docker build ...
  • 31. Confidential 31 Learning #7 - Building Docker Images within the K8s Cluster Kubernetes worker node Docker daemon docker.sock Jenkins slave Container BContainer A Docker cli Containers Jenkinsfile docker rm ... docker rm ...
  • 32. Confidential 32 Learning #7 - Building Docker Images within the K8s Cluster Kubernetes worker node Docker daemon Jenkins slave Container BContainer A Docker cli Containers Separate ec2 instance
  • 33. Confidential 33 Was it worth it? YES! ● Deployment time halved ~ (main service – from 12 min to 5 min) ● Rollback is very easy and fast (nearly instant) ● Hardware provisioned decreased by a factor of 3 ● Pods autoscaling eliminated manual work to support traffic spikes ● System level upgrades are now non-blocking and easy to execute ● Time to provision and deploy a new service in production changed from days/weeks to minutes/hours ● Each project has its own simple helm chart in a project repo ~ 3200 ansible config files deprecated.
  • 34. Confidential 34 What’s next? ● Dynamic Staging Environments ○ Encourage better development workflow ○ Easily enable cross-team review with design, marketing and others ● Telepresence for Complex Local Development ○ Easier onboarding & dev env refresh ○ More consistent behavior with production ● End-to-end integration suite ● Iterate for Improvements ○ Faster builds ○ Cluster Performance ○ Observability ○ Cost Improvements ● Service mesh with Istio