SlideShare a Scribd company logo
Kubernetes @ Squarespace
Microservices on Kubernetes in a Datacenter
Kevin Lynch
klynch@squarespace.com
Agenda
01 The problem with static infrastructure
02 Kubernetes Fundamentals
03 Adapting Microservices to Kubernetes
04 Kubernetes in a datacenter?
Microservices Journey: A Story of Growth
2013: small (< 50 engineers)
build product & grow customer base
whatever works
2014: medium (< 100 engineers)
we have a lot of customers now!
whatever works doesn't work anymore
2016: large (100+ engineers)
architect for scalability and reliability
organizational structures
?: XL (200+ engineers)
Challenges with a Monolith
● Reliability
● Performance
● Engineering agility/speed, cross-team coupling
● Engineering time spent fire fighting rather than building new
functionality
What were the increasingly difficult challenges with a
monolith?
Challenges with a Monolith
● Minimize failure domains
● Developers are more confident in their changes
● Squarespace can move faster
Solution: Microservices!
Operational Challenges
● Engineering org grows…
● More features...
● More services…
● More infrastructure to spin up…
● Ops becomes a blocker...
Stuck in a loop
Traditional Provisioning Process
● Pick ESX with available resources
● Pick IP
● Register host to Cobbler
● Register DNS entry
● Create new VM on ESX
● PXE boot VM and install OS and base configuration
● Install system dependencies (LDAP, NTP, CollectD, Sensu…)
● Install app dependencies (Java, FluentD/Filebeat, Consul, Mongo-
S…)
● Install the app
● App registers with discovery system and begins receiving traffic
Containerization & Kubernetes Orchestration
● Difficult to find resources
● Slow to provision and scale
● Discovery is a must
● Metrics system must support short lived metrics
● Alerts are usually per instance
Static infrastructure and microservices do not mix!
Kubernetes Provisioning Process
● kubectl apply -f app.yaml
Kubernetes Fundamentals
● ApiVersion & Kind
○ type of object
● Metadata
○ Names, annotations, labels
● Spec & Status
○ What you want to happen...
○ … versus reality
apiVersion: v1
kind: Pod
metadata:
name: nginx
namespace: default
annotations:
squarespace.net/build: nginx-42
labels:
app: frontend
...
spec:
containers:
- name: nginx
image: nginx:latest
...
status:
hostIP: 10.122.1.201
podIP: 10.123.185.9
phase: Running
qosClass: BestEffort
startTime: 2017-07-31T02:08:25Z
...
Kubernetes Fundamentals
● Labels
○ KV pairs used for identification
○ Indexed for efficient querying
● Annotations
○ Non identifying information
○ Can be unstructured
apiVersion: v1
kind: Pod
metadata:
name: nginx
namespace: default
annotations:
squarespace.net/build: nginx-42
labels:
app: frontend
...
spec:
containers:
- name: nginx
image: nginx:1.8.1
...
status:
hostIP: 10.122.1.201
podIP: 10.123.185.9
phase: Running
qosClass: BestEffort
startTime: 2017-07-31T02:08:25Z
...
Common Objects: Pods
● Basic deployable workload
● Group of 1+ containers
● Define resource requirements
● Defines storage volumes
○ Ephemeral storage
○ Shared storage (NFS, CephFS)
○ Block storage (RBD)
○ Secrets
○ ConfigMaps
○ more...
spec:
containers:
- name: location
image: .../location:master-269
ports: ...
resources:
limits:
cpu: 2
memory: 4Gi
requests:
cpu: 2
memory: 4Gi
volumeMounts:
- name: config
mountPath: /service/config
- name: log-dir
mountPath: /data/logs
volumes:
- name: config
configMap:
name: location-config
- name: log-dir
emptyDir: {}
Common Objects: Deployments
● Declarative
● Defines a type of pod to run
● Defines desired #
● Supports basic operations
○ Can be rolled back quickly!
○ Can be scaled up/down
● Meant to be stateless apps!
kind: Deployment
spec:
replicas: 3
selector:
matchLabels:
service: location
strategy:
rollingUpdate:
maxSurge: 100%
maxUnavailable: 0
type: RollingUpdate
template:
... pod info here ...
Common Objects: Services
● Make pods addressable
● Assigned an IP
● Addressable DNS entries!
apiVersion: v1
kind: Service
metadata:
name: location
namespace: core-services
spec:
type: ClusterIP
clusterIP: 10.123.79.211
selector:
service: location
ports:
- name: traffic
port: 8080
- name: admin
port: 8081
Common Objects: Namespaces
● Namespaces
○ Isolates groups of objects
■ Developer
■ Team
■ System or Service
○ Good for permission boundaries
○ Good for network boundaries
● Most objects are namespaced
apiVersion: v1
kind: Namespace
metadata:
name: core-services
annotations:
squarespace.net/contact: |
team@squarespace.com
creationTimestamp: 2017-06-14T..
spec:
finalizers:
- kubernetes
status:
phase: Active
Microservice Pod Definition
resources:
requests:
cpu: 2
memory: 4Gi
limits:
cpu: 2
memory: 4Gi
Microservice Pod
Java Microservice
fluentd consul
Future Work: Updating Common Dependencies
● Custom Initializers
○ Inject container dependencies into deployments (consul, fluentd)
○ Configure Prometheus instances for each namespace
● Trigger rescheduling of pods when dependencies need updating
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: location
namespace: core-services
annotations:
initializer.squarespace.net/consul: "true"
Future Work: Enforce Squarespace Standards
● Custom Admission Controller requires all services, deployments, etc.
meet certain standards
○ Resource requests/limits
○ Owner annotations
○ Service labels
Quality of Service Classes
resources:
requests:
cpu: 2
memory: 4Gi
limits:
cpu: 2
memory: 4Gi
● BestEffort
○ No resource constraints
○ First to be killed under pressure
● Guaranteed
○ Requests == Limits
○ Last to kill under pressure
○ Easier to reason about resources
● Burstable
○ Take advantage of unused resources!
○ Can be tricky with some languages
Microservice Pod Definition
resources:
requests:
cpu: 2
memory: 4Gi
limits:
cpu: 2
memory: 4Gi
● Kubernetes assumes no other processes are
consuming significant resources
● Completely Fair Scheduler (CFS)
○ Schedules a task based on CPU Shares
○ Throttles a task once it hits CPU Quota
● OOM Killed when memory limit exceeded
Microservice Pod Definition
resources:
requests:
cpu: 2
memory: 4Gi
limits:
cpu: 2
memory: 4Gi
● Shares = CPU Request * 1024
● Total Kubernetes Shares = # Cores * 1024
● Quota = CPU Limit * 100ms
● Period = 100ms
Java in a Container
● JVM is able to detect # of cores via sysconf(_SC_NPROCESSORS_ONLN)
● Many libraries rely on Runtime.getRuntime.availableProcessors()
○ Jetty
○ ForkJoinPool
○ GC Threads
○ That mystery dependency...
Java in a Container
● Provide a base container that calculates the container’s resources!
● Detect # of “cores” assigned
○ /sys/fs/cgroup/cpu/cpu.cfs_quota_us divided by
/sys/fs/cgroup/cpu/cpu.cfs_period_us
● Automatically tune the JVM:
○ -XX:ParallelGCThreads=${core_limit}
○ -XX:ConcGCThreads=${core_limit}
○ -Djava.util.concurrent.ForkJoinPool.common.parallelism=${core_limit}
Java in a Container
● Use Linux preloading to override availableProcessors()
#include <stdlib.h>
#include <unistd.h>
int JVM_ActiveProcessorCount(void) {
char* val = getenv("CONTAINER_CORE_LIMIT");
return val != NULL ? atoi(val) : sysconf(_SC_NPROCESSORS_ONLN);
}
https://engineering.squarespace.com/blog/2017/understanding-linux-container-scheduling
Service Lifecycle
● How do we observe the health of services?
● How do we handle rollbacks?
Monitoring
● Graphite does not scale well with ephemeral instances
● Easy to have combinatoric explosion of metrics
Traditional Monitoring & Alerting
● Application and system alerts are tightly coupled
● Difficult to create alerts on SLAs
● Difficult to route alerts
Traditional Monitoring & Alerting
Falco: Centralized Service Management
● Kubernetes Dashboard is too complex and powerful
● Centralized deployment status and history
● Manual rollbacks of deploys
● Quick access to scaling controls
Kubernetes Dashboard
http://kubernetes-dashboard.kube-system.svc.eqx.dal.prod.kubernetes/
Falco: Centralized Service Management
● Efficient for ephemeral instances
● Stores tagged data
● Easy to have many smaller instances (per team or complex system)
● Prometheus Operator runs everything in Kubernetes!
Kubernetes Monitoring & Alerting
● Alerts are defined with the application code!
● Easy to define SLA alerts
● Routing is still difficult
Kubernetes Monitoring & Alerting
Prometheus Operator
Kubernetes Monitoring & Alerting
● Sensu checks for all core components
● Sent to PagerDuty
Kubernetes Monitoring & Alerting
http://prometheus.kube-system.svc.eqx.dal.prod.kubernetes:9090/alerts
Kubernetes in a datacenter?
Kubernetes Architecture
Kubernetes Networking
Spine and Leaf Layer 3 Clos Topology
● All work is performed at the leaf/ToR switch
● Each leaf switch is separate Layer 3 domain
● Each leaf is a separate BGP domain (ASN)
● No Spanning Tree Protocol issues seen in L2 networks (convergence
time, loops)
Leaf Leaf Leaf Leaf
Spine Spine
Spine and Leaf Layer 3 Clos Topology
● Simple to understand
● Easy to scale
● Predictable and consistent latency (hops = 2)
● Allows for Anycast IPs
Leaf Leaf Leaf Leaf
Spine Spine
Calico Networking
● No network overlay required!
○ No nasty MTU issues
○ No performance impact
● Communicates directly with existing L3 network
● BGP Peering with Top of Rack switch
Calico Networking
● Engineers can think of Pod IPs as normal hosts
(they’re not)
○ Ping works
○ Consul works normally
○ Browser communication works
○ Shell sorta works (kubectl exec -it pod sh)
Calico Networking
● Each worker announces it’s pod IP ranges
○ Aggregated to /26
● Each master announces an External Anycast IP
○ Used for component communication
● Each ingress tier announces the Service IP range
ip addr add 10.123.0.0/17 dev lo
etcdctl set
/calico/bgp/v1/global/custom_filters/v4/services
'if ( net = 10.123.0.0/17 ) then { accept; }'
Calico Networking: Firewalls
● Calico supports NetworkPolicy firewall rules
○ We aren’t using this yet!
● Add DefaultDeny to block traffic into namespace
● Add Ingress rules for whitelisted communication
○ Works across namespaces
○ Works with raw IP ranges
QUESTIONS?
Thank you!
squarespace.com/careers
Kevin Lynch
klynch@squarespace.com
Future Work: Security!
● PodSecurityPolicy
● Mutual TLS in our environment:
○ Kubernetes WG Components WG
○ SIG Auth
○ SPIFFE
○ ISTIO
How do I connect to the cluster?
● Look at Getting Started guide on the wiki
● Generate a kubeconfig file
○ curl --user $(whoami) https://kubeconfig-generator.squarespace.net
● Uses KeyCloak OIDC to authenticate users
● Automatically refreshes credentials!
● Audit Logs are sent to logs.squarespace.net
Squarespace Clusters
Dallas Production
8 nodes
512 cores
2 TB RAM
NJ Production
4 nodes
256 cores
1 TB RAM
Dallas Staging
6 nodes
384 cores
1.5 TB RAM
NJ Staging
Coming soon!
Dallas Corp
8 nodes
416 cores
1.5 TB RAM
NJ Corp
Coming soon!
Testbed
6 Mac Minis :-p
Kube-Proxy: Internal Networking
● Runs on every host
● Routes service IPs to pods
● Watches for changes
● Updates IPTables rules
Communication With External Services
● Environment specific services should not be encoded in application
● Single deployment for all environments and datacenters
● Federation API expects same deployment
● Not all applications are using consul
Communication With External Services
Communication With External Services
apiVersion: v1
kind: Service
metadata:
name: kafka
namespace: elk
spec:
type: ClusterIP
clusterIP: None
sessionAffinity: None
ports:
- port: 9092
protocol: TCP
targetPort: 9092
apiVersion: v1
kind: Endpoints
metadata:
name: kafka
namespace: elk
subsets:
- addresses:
- ip: 10.120.201.33
- ip: 10.120.201.34
- ip: 10.120.201.35
...
ports:
- port: 9092
protocol: TCP

More Related Content

What's hot

Energy economics
Energy economicsEnergy economics
MODULE - I : BATTERY TECHNOLOGY
MODULE - I : BATTERY TECHNOLOGYMODULE - I : BATTERY TECHNOLOGY
MODULE - I : BATTERY TECHNOLOGY
rashmi m rashmi
 
Electrical Energy Storage
Electrical Energy StorageElectrical Energy Storage
Electrical Energy StorageCockrell School
 
Heavy Water Plant Rawatbhata PPT
Heavy Water Plant Rawatbhata PPTHeavy Water Plant Rawatbhata PPT
Heavy Water Plant Rawatbhata PPT
Umesh Mahawar
 
mahfooz_ supercapacitor
 mahfooz_ supercapacitor mahfooz_ supercapacitor
mahfooz_ supercapacitor
Mahfooz Alam
 
Super Capacitors
Super CapacitorsSuper Capacitors
Super Capacitors
NITIN GUPTA
 
M2 plasmons
M2 plasmonsM2 plasmons
M2 plasmons
Anuradha Verma
 
Arc phenomena
Arc phenomenaArc phenomena
Energy storage introduction
Energy storage introductionEnergy storage introduction
Energy storage introduction
Dr. Shagufta K
 
Solar Power Based Micro Grid Systems.pptx
Solar Power Based Micro Grid Systems.pptxSolar Power Based Micro Grid Systems.pptx
Solar Power Based Micro Grid Systems.pptx
Niuru Ranaweera
 
Fundamentals of power system
Fundamentals of power systemFundamentals of power system
Fundamentals of power system
Balaram Das
 
Wave Energy
Wave EnergyWave Energy
Wave Energy
Sena Koyuncu
 
Energy storage technologies
Energy storage technologiesEnergy storage technologies
Energy storage technologies
srikanth reddy
 
Electricity demand side management and end use efficiency
Electricity demand side management and end use efficiencyElectricity demand side management and end use efficiency
Electricity demand side management and end use efficiencyD.Pawan Kumar
 
Flexible ac transmission FACTs
Flexible ac transmission FACTsFlexible ac transmission FACTs
Flexible ac transmission FACTs
MOHAN RAKIB
 
ULTRACAPACITOR
ULTRACAPACITORULTRACAPACITOR
ULTRACAPACITOR
Chetan Chavan
 
Demand side management: Demand response
Demand side management: Demand response Demand side management: Demand response
Demand side management: Demand response
Siksha 'O' Anusandhan (Deemed to be University )
 

What's hot (20)

Energy economics
Energy economicsEnergy economics
Energy economics
 
MODULE - I : BATTERY TECHNOLOGY
MODULE - I : BATTERY TECHNOLOGYMODULE - I : BATTERY TECHNOLOGY
MODULE - I : BATTERY TECHNOLOGY
 
Electrical Energy Storage
Electrical Energy StorageElectrical Energy Storage
Electrical Energy Storage
 
Heavy Water Plant Rawatbhata PPT
Heavy Water Plant Rawatbhata PPTHeavy Water Plant Rawatbhata PPT
Heavy Water Plant Rawatbhata PPT
 
Supercapacitors as an Energy Storage Device
Supercapacitors as an Energy Storage DeviceSupercapacitors as an Energy Storage Device
Supercapacitors as an Energy Storage Device
 
mahfooz_ supercapacitor
 mahfooz_ supercapacitor mahfooz_ supercapacitor
mahfooz_ supercapacitor
 
Super Capacitors
Super CapacitorsSuper Capacitors
Super Capacitors
 
M2 plasmons
M2 plasmonsM2 plasmons
M2 plasmons
 
Arc phenomena
Arc phenomenaArc phenomena
Arc phenomena
 
Introduction To Photovoltaic Device Physics
Introduction To Photovoltaic Device PhysicsIntroduction To Photovoltaic Device Physics
Introduction To Photovoltaic Device Physics
 
Energy storage introduction
Energy storage introductionEnergy storage introduction
Energy storage introduction
 
Solar Power Based Micro Grid Systems.pptx
Solar Power Based Micro Grid Systems.pptxSolar Power Based Micro Grid Systems.pptx
Solar Power Based Micro Grid Systems.pptx
 
Fundamentals of power system
Fundamentals of power systemFundamentals of power system
Fundamentals of power system
 
Wave Energy
Wave EnergyWave Energy
Wave Energy
 
Energy storage technologies
Energy storage technologiesEnergy storage technologies
Energy storage technologies
 
ultracapacitor
ultracapacitorultracapacitor
ultracapacitor
 
Electricity demand side management and end use efficiency
Electricity demand side management and end use efficiencyElectricity demand side management and end use efficiency
Electricity demand side management and end use efficiency
 
Flexible ac transmission FACTs
Flexible ac transmission FACTsFlexible ac transmission FACTs
Flexible ac transmission FACTs
 
ULTRACAPACITOR
ULTRACAPACITORULTRACAPACITOR
ULTRACAPACITOR
 
Demand side management: Demand response
Demand side management: Demand response Demand side management: Demand response
Demand side management: Demand response
 

Similar to Kubernetes @ Squarespace: Kubernetes in the Datacenter

Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kevin Lynch
 
Kubernetes for Beginners
Kubernetes for BeginnersKubernetes for Beginners
Kubernetes for Beginners
DigitalOcean
 
Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016
aspyker
 
Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016
Sharma Podila
 
Kubernetes from scratch at veepee sysadmins days 2019
Kubernetes from scratch at veepee   sysadmins days 2019Kubernetes from scratch at veepee   sysadmins days 2019
Kubernetes from scratch at veepee sysadmins days 2019
🔧 Loïc BLOT
 
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
Ambassador Labs
 
Kubernetes #1 intro
Kubernetes #1   introKubernetes #1   intro
Kubernetes #1 intro
Terry Cho
 
Docker on docker leveraging kubernetes in docker ee
Docker on docker leveraging kubernetes in docker eeDocker on docker leveraging kubernetes in docker ee
Docker on docker leveraging kubernetes in docker ee
Docker, Inc.
 
Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)
Idan Atias
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
Rishabh Indoria
 
Container orchestration and microservices world
Container orchestration and microservices worldContainer orchestration and microservices world
Container orchestration and microservices world
Karol Chrapek
 
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOpsDevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
Ambassador Labs
 
Containerizing MongoDB with kubernetes
Containerizing MongoDB with kubernetesContainerizing MongoDB with kubernetes
Containerizing MongoDB with kubernetes
Brian McNamara
 
Free GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsFree GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOps
Weaveworks
 
Composing services with Kubernetes
Composing services with KubernetesComposing services with Kubernetes
Composing services with Kubernetes
Bart Spaans
 
Kubernetes at (Organizational) Scale
Kubernetes at (Organizational) ScaleKubernetes at (Organizational) Scale
Kubernetes at (Organizational) Scale
Jeff Zellner
 
Intro to Kubernetes & GitOps Workshop
Intro to Kubernetes & GitOps WorkshopIntro to Kubernetes & GitOps Workshop
Intro to Kubernetes & GitOps Workshop
Weaveworks
 
Introduction to Kubernetes Workshop
Introduction to Kubernetes WorkshopIntroduction to Kubernetes Workshop
Introduction to Kubernetes Workshop
Bob Killen
 
Method of NUMA-Aware Resource Management for Kubernetes 5G NFV Cluster
Method of NUMA-Aware Resource Management for Kubernetes 5G NFV ClusterMethod of NUMA-Aware Resource Management for Kubernetes 5G NFV Cluster
Method of NUMA-Aware Resource Management for Kubernetes 5G NFV Cluster
byonggon chun
 
Comparison of existing cni plugins for kubernetes
Comparison of existing cni plugins for kubernetesComparison of existing cni plugins for kubernetes
Comparison of existing cni plugins for kubernetes
Adam Hamsik
 

Similar to Kubernetes @ Squarespace: Kubernetes in the Datacenter (20)

Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Kubernetes for Beginners
Kubernetes for BeginnersKubernetes for Beginners
Kubernetes for Beginners
 
Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016
 
Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016
 
Kubernetes from scratch at veepee sysadmins days 2019
Kubernetes from scratch at veepee   sysadmins days 2019Kubernetes from scratch at veepee   sysadmins days 2019
Kubernetes from scratch at veepee sysadmins days 2019
 
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
 
Kubernetes #1 intro
Kubernetes #1   introKubernetes #1   intro
Kubernetes #1 intro
 
Docker on docker leveraging kubernetes in docker ee
Docker on docker leveraging kubernetes in docker eeDocker on docker leveraging kubernetes in docker ee
Docker on docker leveraging kubernetes in docker ee
 
Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
Container orchestration and microservices world
Container orchestration and microservices worldContainer orchestration and microservices world
Container orchestration and microservices world
 
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOpsDevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
 
Containerizing MongoDB with kubernetes
Containerizing MongoDB with kubernetesContainerizing MongoDB with kubernetes
Containerizing MongoDB with kubernetes
 
Free GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsFree GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOps
 
Composing services with Kubernetes
Composing services with KubernetesComposing services with Kubernetes
Composing services with Kubernetes
 
Kubernetes at (Organizational) Scale
Kubernetes at (Organizational) ScaleKubernetes at (Organizational) Scale
Kubernetes at (Organizational) Scale
 
Intro to Kubernetes & GitOps Workshop
Intro to Kubernetes & GitOps WorkshopIntro to Kubernetes & GitOps Workshop
Intro to Kubernetes & GitOps Workshop
 
Introduction to Kubernetes Workshop
Introduction to Kubernetes WorkshopIntroduction to Kubernetes Workshop
Introduction to Kubernetes Workshop
 
Method of NUMA-Aware Resource Management for Kubernetes 5G NFV Cluster
Method of NUMA-Aware Resource Management for Kubernetes 5G NFV ClusterMethod of NUMA-Aware Resource Management for Kubernetes 5G NFV Cluster
Method of NUMA-Aware Resource Management for Kubernetes 5G NFV Cluster
 
Comparison of existing cni plugins for kubernetes
Comparison of existing cni plugins for kubernetesComparison of existing cni plugins for kubernetes
Comparison of existing cni plugins for kubernetes
 

Recently uploaded

road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
Kamal Acharya
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
MuhammadTufail242431
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
Kamal Acharya
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
ankuprajapati0525
 

Recently uploaded (20)

road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
 

Kubernetes @ Squarespace: Kubernetes in the Datacenter

  • 1. Kubernetes @ Squarespace Microservices on Kubernetes in a Datacenter Kevin Lynch klynch@squarespace.com
  • 2. Agenda 01 The problem with static infrastructure 02 Kubernetes Fundamentals 03 Adapting Microservices to Kubernetes 04 Kubernetes in a datacenter?
  • 3. Microservices Journey: A Story of Growth 2013: small (< 50 engineers) build product & grow customer base whatever works 2014: medium (< 100 engineers) we have a lot of customers now! whatever works doesn't work anymore 2016: large (100+ engineers) architect for scalability and reliability organizational structures ?: XL (200+ engineers)
  • 4. Challenges with a Monolith ● Reliability ● Performance ● Engineering agility/speed, cross-team coupling ● Engineering time spent fire fighting rather than building new functionality What were the increasingly difficult challenges with a monolith?
  • 5. Challenges with a Monolith ● Minimize failure domains ● Developers are more confident in their changes ● Squarespace can move faster Solution: Microservices!
  • 6. Operational Challenges ● Engineering org grows… ● More features... ● More services… ● More infrastructure to spin up… ● Ops becomes a blocker... Stuck in a loop
  • 7. Traditional Provisioning Process ● Pick ESX with available resources ● Pick IP ● Register host to Cobbler ● Register DNS entry ● Create new VM on ESX ● PXE boot VM and install OS and base configuration ● Install system dependencies (LDAP, NTP, CollectD, Sensu…) ● Install app dependencies (Java, FluentD/Filebeat, Consul, Mongo- S…) ● Install the app ● App registers with discovery system and begins receiving traffic
  • 8. Containerization & Kubernetes Orchestration ● Difficult to find resources ● Slow to provision and scale ● Discovery is a must ● Metrics system must support short lived metrics ● Alerts are usually per instance Static infrastructure and microservices do not mix!
  • 9. Kubernetes Provisioning Process ● kubectl apply -f app.yaml
  • 10. Kubernetes Fundamentals ● ApiVersion & Kind ○ type of object ● Metadata ○ Names, annotations, labels ● Spec & Status ○ What you want to happen... ○ … versus reality apiVersion: v1 kind: Pod metadata: name: nginx namespace: default annotations: squarespace.net/build: nginx-42 labels: app: frontend ... spec: containers: - name: nginx image: nginx:latest ... status: hostIP: 10.122.1.201 podIP: 10.123.185.9 phase: Running qosClass: BestEffort startTime: 2017-07-31T02:08:25Z ...
  • 11. Kubernetes Fundamentals ● Labels ○ KV pairs used for identification ○ Indexed for efficient querying ● Annotations ○ Non identifying information ○ Can be unstructured apiVersion: v1 kind: Pod metadata: name: nginx namespace: default annotations: squarespace.net/build: nginx-42 labels: app: frontend ... spec: containers: - name: nginx image: nginx:1.8.1 ... status: hostIP: 10.122.1.201 podIP: 10.123.185.9 phase: Running qosClass: BestEffort startTime: 2017-07-31T02:08:25Z ...
  • 12. Common Objects: Pods ● Basic deployable workload ● Group of 1+ containers ● Define resource requirements ● Defines storage volumes ○ Ephemeral storage ○ Shared storage (NFS, CephFS) ○ Block storage (RBD) ○ Secrets ○ ConfigMaps ○ more... spec: containers: - name: location image: .../location:master-269 ports: ... resources: limits: cpu: 2 memory: 4Gi requests: cpu: 2 memory: 4Gi volumeMounts: - name: config mountPath: /service/config - name: log-dir mountPath: /data/logs volumes: - name: config configMap: name: location-config - name: log-dir emptyDir: {}
  • 13. Common Objects: Deployments ● Declarative ● Defines a type of pod to run ● Defines desired # ● Supports basic operations ○ Can be rolled back quickly! ○ Can be scaled up/down ● Meant to be stateless apps! kind: Deployment spec: replicas: 3 selector: matchLabels: service: location strategy: rollingUpdate: maxSurge: 100% maxUnavailable: 0 type: RollingUpdate template: ... pod info here ...
  • 14. Common Objects: Services ● Make pods addressable ● Assigned an IP ● Addressable DNS entries! apiVersion: v1 kind: Service metadata: name: location namespace: core-services spec: type: ClusterIP clusterIP: 10.123.79.211 selector: service: location ports: - name: traffic port: 8080 - name: admin port: 8081
  • 15. Common Objects: Namespaces ● Namespaces ○ Isolates groups of objects ■ Developer ■ Team ■ System or Service ○ Good for permission boundaries ○ Good for network boundaries ● Most objects are namespaced apiVersion: v1 kind: Namespace metadata: name: core-services annotations: squarespace.net/contact: | team@squarespace.com creationTimestamp: 2017-06-14T.. spec: finalizers: - kubernetes status: phase: Active
  • 16. Microservice Pod Definition resources: requests: cpu: 2 memory: 4Gi limits: cpu: 2 memory: 4Gi Microservice Pod Java Microservice fluentd consul
  • 17. Future Work: Updating Common Dependencies ● Custom Initializers ○ Inject container dependencies into deployments (consul, fluentd) ○ Configure Prometheus instances for each namespace ● Trigger rescheduling of pods when dependencies need updating apiVersion: extensions/v1beta1 kind: Deployment metadata: name: location namespace: core-services annotations: initializer.squarespace.net/consul: "true"
  • 18. Future Work: Enforce Squarespace Standards ● Custom Admission Controller requires all services, deployments, etc. meet certain standards ○ Resource requests/limits ○ Owner annotations ○ Service labels
  • 19. Quality of Service Classes resources: requests: cpu: 2 memory: 4Gi limits: cpu: 2 memory: 4Gi ● BestEffort ○ No resource constraints ○ First to be killed under pressure ● Guaranteed ○ Requests == Limits ○ Last to kill under pressure ○ Easier to reason about resources ● Burstable ○ Take advantage of unused resources! ○ Can be tricky with some languages
  • 20. Microservice Pod Definition resources: requests: cpu: 2 memory: 4Gi limits: cpu: 2 memory: 4Gi ● Kubernetes assumes no other processes are consuming significant resources ● Completely Fair Scheduler (CFS) ○ Schedules a task based on CPU Shares ○ Throttles a task once it hits CPU Quota ● OOM Killed when memory limit exceeded
  • 21. Microservice Pod Definition resources: requests: cpu: 2 memory: 4Gi limits: cpu: 2 memory: 4Gi ● Shares = CPU Request * 1024 ● Total Kubernetes Shares = # Cores * 1024 ● Quota = CPU Limit * 100ms ● Period = 100ms
  • 22. Java in a Container ● JVM is able to detect # of cores via sysconf(_SC_NPROCESSORS_ONLN) ● Many libraries rely on Runtime.getRuntime.availableProcessors() ○ Jetty ○ ForkJoinPool ○ GC Threads ○ That mystery dependency...
  • 23. Java in a Container ● Provide a base container that calculates the container’s resources! ● Detect # of “cores” assigned ○ /sys/fs/cgroup/cpu/cpu.cfs_quota_us divided by /sys/fs/cgroup/cpu/cpu.cfs_period_us ● Automatically tune the JVM: ○ -XX:ParallelGCThreads=${core_limit} ○ -XX:ConcGCThreads=${core_limit} ○ -Djava.util.concurrent.ForkJoinPool.common.parallelism=${core_limit}
  • 24. Java in a Container ● Use Linux preloading to override availableProcessors() #include <stdlib.h> #include <unistd.h> int JVM_ActiveProcessorCount(void) { char* val = getenv("CONTAINER_CORE_LIMIT"); return val != NULL ? atoi(val) : sysconf(_SC_NPROCESSORS_ONLN); } https://engineering.squarespace.com/blog/2017/understanding-linux-container-scheduling
  • 25. Service Lifecycle ● How do we observe the health of services? ● How do we handle rollbacks?
  • 27. ● Graphite does not scale well with ephemeral instances ● Easy to have combinatoric explosion of metrics Traditional Monitoring & Alerting ● Application and system alerts are tightly coupled ● Difficult to create alerts on SLAs ● Difficult to route alerts
  • 29. Falco: Centralized Service Management ● Kubernetes Dashboard is too complex and powerful ● Centralized deployment status and history ● Manual rollbacks of deploys ● Quick access to scaling controls
  • 32. ● Efficient for ephemeral instances ● Stores tagged data ● Easy to have many smaller instances (per team or complex system) ● Prometheus Operator runs everything in Kubernetes! Kubernetes Monitoring & Alerting ● Alerts are defined with the application code! ● Easy to define SLA alerts ● Routing is still difficult
  • 35. Kubernetes Monitoring & Alerting ● Sensu checks for all core components ● Sent to PagerDuty
  • 36. Kubernetes Monitoring & Alerting http://prometheus.kube-system.svc.eqx.dal.prod.kubernetes:9090/alerts
  • 37. Kubernetes in a datacenter?
  • 40. Spine and Leaf Layer 3 Clos Topology ● All work is performed at the leaf/ToR switch ● Each leaf switch is separate Layer 3 domain ● Each leaf is a separate BGP domain (ASN) ● No Spanning Tree Protocol issues seen in L2 networks (convergence time, loops) Leaf Leaf Leaf Leaf Spine Spine
  • 41. Spine and Leaf Layer 3 Clos Topology ● Simple to understand ● Easy to scale ● Predictable and consistent latency (hops = 2) ● Allows for Anycast IPs Leaf Leaf Leaf Leaf Spine Spine
  • 42. Calico Networking ● No network overlay required! ○ No nasty MTU issues ○ No performance impact ● Communicates directly with existing L3 network ● BGP Peering with Top of Rack switch
  • 43. Calico Networking ● Engineers can think of Pod IPs as normal hosts (they’re not) ○ Ping works ○ Consul works normally ○ Browser communication works ○ Shell sorta works (kubectl exec -it pod sh)
  • 44. Calico Networking ● Each worker announces it’s pod IP ranges ○ Aggregated to /26 ● Each master announces an External Anycast IP ○ Used for component communication ● Each ingress tier announces the Service IP range ip addr add 10.123.0.0/17 dev lo etcdctl set /calico/bgp/v1/global/custom_filters/v4/services 'if ( net = 10.123.0.0/17 ) then { accept; }'
  • 45. Calico Networking: Firewalls ● Calico supports NetworkPolicy firewall rules ○ We aren’t using this yet! ● Add DefaultDeny to block traffic into namespace ● Add Ingress rules for whitelisted communication ○ Works across namespaces ○ Works with raw IP ranges
  • 47. Future Work: Security! ● PodSecurityPolicy ● Mutual TLS in our environment: ○ Kubernetes WG Components WG ○ SIG Auth ○ SPIFFE ○ ISTIO
  • 48. How do I connect to the cluster? ● Look at Getting Started guide on the wiki ● Generate a kubeconfig file ○ curl --user $(whoami) https://kubeconfig-generator.squarespace.net ● Uses KeyCloak OIDC to authenticate users ● Automatically refreshes credentials! ● Audit Logs are sent to logs.squarespace.net
  • 49. Squarespace Clusters Dallas Production 8 nodes 512 cores 2 TB RAM NJ Production 4 nodes 256 cores 1 TB RAM Dallas Staging 6 nodes 384 cores 1.5 TB RAM NJ Staging Coming soon! Dallas Corp 8 nodes 416 cores 1.5 TB RAM NJ Corp Coming soon! Testbed 6 Mac Minis :-p
  • 50. Kube-Proxy: Internal Networking ● Runs on every host ● Routes service IPs to pods ● Watches for changes ● Updates IPTables rules
  • 51. Communication With External Services ● Environment specific services should not be encoded in application ● Single deployment for all environments and datacenters ● Federation API expects same deployment ● Not all applications are using consul
  • 53. Communication With External Services apiVersion: v1 kind: Service metadata: name: kafka namespace: elk spec: type: ClusterIP clusterIP: None sessionAffinity: None ports: - port: 9092 protocol: TCP targetPort: 9092 apiVersion: v1 kind: Endpoints metadata: name: kafka namespace: elk subsets: - addresses: - ip: 10.120.201.33 - ip: 10.120.201.34 - ip: 10.120.201.35 ... ports: - port: 9092 protocol: TCP

Editor's Notes

  1. Not so great for operations
  2. last year less than a dozen services existed, today more than 50 are in production or actively developed
  3. Typical workflow for provisioning a VM at Squarespace Currently takes about 15 minutes to provision a VM There are definitily some optimizations to be made here: Use VM templates (hard to generalize space constraints in general, but not so much of a problem for microservices) Use VMware vMotion and other tools for auto migrating and finding free resources
  4. The big takeaway Requires a robust discovery mechanism for services; can’t easily get by with static names This can be as simple DNS or load balancers or something more complex (zookeeper, etcd, Consul) Each has tradeoffs Metrics: Graphite metrics are not meant to be ephemeral long lived metrics that are expensive to create, and are not efficiently aggregated (no tagging support!) Difficult to control where data is coming from and how much data is coming in Easy to blow out disk, or send faulty metrics Centralized metrics can lead to Alerts Sensu alerts are per instance; system
  5. A bit simplified, as there are a lot of moving parts Declarative Infrastructure
  6. All objects are represented by YAML descriptions
  7. Kubernetes resource constraints aren’t enough Need an understand of CGroups
  8. Kubernetes resource constraints aren’t enough Need an understand of CGroups
  9. Kubernetes resource constraints aren’t enough
  10. Push vs Pull metrics Same Grafana Same ELK
  11. Sensu: app and system alerts are tightly coupled Overwhelming & confusing to everyone except the guy who designed the system does not present a sense of ownership Hard to get a single view: graphite checks vs instance checks
  12. Alerts are defined with code Encourages developer ownership only relevent alerts are defined: active requests, error rates, response times, # of instances up
  13. Deployment logic is not colocated with code
  14. Depends on Networking
  15. Very SIMPLE Each leaf is a Top of Rack switch All devices are exactly the same number of segments away
  16. Calico is backed by Etcd… It’s super easy to leverage this
  17. TODO: add graphic of KeyCloak interaction
  18. We’re not moving all infrastructure to Kubernetes anytime soon stateful systems and hardware dependent services like Databases, Kafka, ELK will remain statically provisioned We need a way to automatically update these endpoints
  19. Solution: encode external endpoints into services
  20. Headless services return A records