SlideShare a Scribd company logo
1 of 53
Kubernetes @ Squarespace
Microservices on Kubernetes in a Datacenter
Kevin Lynch
klynch@squarespace.com
Agenda
01 The problem with static infrastructure
02 Kubernetes Fundamentals
03 Adapting Microservices to Kubernetes
04 Kubernetes in a datacenter?
Microservices Journey: A Story of Growth
2013: small (< 50 engineers)
build product & grow customer base
whatever works
2014: medium (< 100 engineers)
we have a lot of customers now!
whatever works doesn't work anymore
2016: large (100+ engineers)
architect for scalability and reliability
organizational structures
?: XL (200+ engineers)
Challenges with a Monolith
● Reliability
● Performance
● Engineering agility/speed, cross-team coupling
● Engineering time spent fire fighting rather than building new
functionality
What were the increasingly difficult challenges with a
monolith?
Challenges with a Monolith
● Minimize failure domains
● Developers are more confident in their changes
● Squarespace can move faster
Solution: Microservices!
Operational Challenges
● Engineering org grows…
● More features...
● More services…
● More infrastructure to spin up…
● Ops becomes a blocker...
Stuck in a loop
Traditional Provisioning Process
● Pick ESX with available resources
● Pick IP
● Register host to Cobbler
● Register DNS entry
● Create new VM on ESX
● PXE boot VM and install OS and base configuration
● Install system dependencies (LDAP, NTP, CollectD, Sensu…)
● Install app dependencies (Java, FluentD/Filebeat, Consul, Mongo-
S…)
● Install the app
● App registers with discovery system and begins receiving traffic
Containerization & Kubernetes Orchestration
● Difficult to find resources
● Slow to provision and scale
● Discovery is a must
● Metrics system must support short lived metrics
● Alerts are usually per instance
Static infrastructure and microservices do not mix!
Kubernetes Provisioning Process
● kubectl apply -f app.yaml
Kubernetes Fundamentals
● ApiVersion & Kind
○ type of object
● Metadata
○ Names, annotations, labels
● Spec & Status
○ What you want to happen...
○ … versus reality
apiVersion: v1
kind: Pod
metadata:
name: nginx
namespace: default
annotations:
squarespace.net/build: nginx-42
labels:
app: frontend
...
spec:
containers:
- name: nginx
image: nginx:latest
...
status:
hostIP: 10.122.1.201
podIP: 10.123.185.9
phase: Running
qosClass: BestEffort
startTime: 2017-07-31T02:08:25Z
...
Kubernetes Fundamentals
● Labels
○ KV pairs used for identification
○ Indexed for efficient querying
● Annotations
○ Non identifying information
○ Can be unstructured
apiVersion: v1
kind: Pod
metadata:
name: nginx
namespace: default
annotations:
squarespace.net/build: nginx-42
labels:
app: frontend
...
spec:
containers:
- name: nginx
image: nginx:1.8.1
...
status:
hostIP: 10.122.1.201
podIP: 10.123.185.9
phase: Running
qosClass: BestEffort
startTime: 2017-07-31T02:08:25Z
...
Common Objects: Pods
● Basic deployable workload
● Group of 1+ containers
● Define resource requirements
● Defines storage volumes
○ Ephemeral storage
○ Shared storage (NFS, CephFS)
○ Block storage (RBD)
○ Secrets
○ ConfigMaps
○ more...
spec:
containers:
- name: location
image: .../location:master-269
ports: ...
resources:
limits:
cpu: 2
memory: 4Gi
requests:
cpu: 2
memory: 4Gi
volumeMounts:
- name: config
mountPath: /service/config
- name: log-dir
mountPath: /data/logs
volumes:
- name: config
configMap:
name: location-config
- name: log-dir
emptyDir: {}
Common Objects: Deployments
● Declarative
● Defines a type of pod to run
● Defines desired #
● Supports basic operations
○ Can be rolled back quickly!
○ Can be scaled up/down
● Meant to be stateless apps!
kind: Deployment
spec:
replicas: 3
selector:
matchLabels:
service: location
strategy:
rollingUpdate:
maxSurge: 100%
maxUnavailable: 0
type: RollingUpdate
template:
... pod info here ...
Common Objects: Services
● Make pods addressable
● Assigned an IP
● Addressable DNS entries!
apiVersion: v1
kind: Service
metadata:
name: location
namespace: core-services
spec:
type: ClusterIP
clusterIP: 10.123.79.211
selector:
service: location
ports:
- name: traffic
port: 8080
- name: admin
port: 8081
Common Objects: Namespaces
● Namespaces
○ Isolates groups of objects
■ Developer
■ Team
■ System or Service
○ Good for permission boundaries
○ Good for network boundaries
● Most objects are namespaced
apiVersion: v1
kind: Namespace
metadata:
name: core-services
annotations:
squarespace.net/contact: |
team@squarespace.com
creationTimestamp: 2017-06-14T..
spec:
finalizers:
- kubernetes
status:
phase: Active
Microservice Pod Definition
resources:
requests:
cpu: 2
memory: 4Gi
limits:
cpu: 2
memory: 4Gi
Microservice Pod
Java Microservice
fluentd consul
Future Work: Updating Common Dependencies
● Custom Initializers
○ Inject container dependencies into deployments (consul, fluentd)
○ Configure Prometheus instances for each namespace
● Trigger rescheduling of pods when dependencies need updating
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: location
namespace: core-services
annotations:
initializer.squarespace.net/consul: "true"
Future Work: Enforce Squarespace Standards
● Custom Admission Controller requires all services, deployments, etc.
meet certain standards
○ Resource requests/limits
○ Owner annotations
○ Service labels
Quality of Service Classes
resources:
requests:
cpu: 2
memory: 4Gi
limits:
cpu: 2
memory: 4Gi
● BestEffort
○ No resource constraints
○ First to be killed under pressure
● Guaranteed
○ Requests == Limits
○ Last to kill under pressure
○ Easier to reason about resources
● Burstable
○ Take advantage of unused resources!
○ Can be tricky with some languages
Microservice Pod Definition
resources:
requests:
cpu: 2
memory: 4Gi
limits:
cpu: 2
memory: 4Gi
● Kubernetes assumes no other processes are
consuming significant resources
● Completely Fair Scheduler (CFS)
○ Schedules a task based on CPU Shares
○ Throttles a task once it hits CPU Quota
● OOM Killed when memory limit exceeded
Microservice Pod Definition
resources:
requests:
cpu: 2
memory: 4Gi
limits:
cpu: 2
memory: 4Gi
● Shares = CPU Request * 1024
● Total Kubernetes Shares = # Cores * 1024
● Quota = CPU Limit * 100ms
● Period = 100ms
Java in a Container
● JVM is able to detect # of cores via sysconf(_SC_NPROCESSORS_ONLN)
● Many libraries rely on Runtime.getRuntime.availableProcessors()
○ Jetty
○ ForkJoinPool
○ GC Threads
○ That mystery dependency...
Java in a Container
● Provide a base container that calculates the container’s resources!
● Detect # of “cores” assigned
○ /sys/fs/cgroup/cpu/cpu.cfs_quota_us divided by
/sys/fs/cgroup/cpu/cpu.cfs_period_us
● Automatically tune the JVM:
○ -XX:ParallelGCThreads=${core_limit}
○ -XX:ConcGCThreads=${core_limit}
○ -Djava.util.concurrent.ForkJoinPool.common.parallelism=${core_limit}
Java in a Container
● Use Linux preloading to override availableProcessors()
#include <stdlib.h>
#include <unistd.h>
int JVM_ActiveProcessorCount(void) {
char* val = getenv("CONTAINER_CORE_LIMIT");
return val != NULL ? atoi(val) : sysconf(_SC_NPROCESSORS_ONLN);
}
https://engineering.squarespace.com/blog/2017/understanding-linux-container-scheduling
Service Lifecycle
● How do we observe the health of services?
● How do we handle rollbacks?
Monitoring
● Graphite does not scale well with ephemeral instances
● Easy to have combinatoric explosion of metrics
Traditional Monitoring & Alerting
● Application and system alerts are tightly coupled
● Difficult to create alerts on SLAs
● Difficult to route alerts
Traditional Monitoring & Alerting
Falco: Centralized Service Management
● Kubernetes Dashboard is too complex and powerful
● Centralized deployment status and history
● Manual rollbacks of deploys
● Quick access to scaling controls
Kubernetes Dashboard
http://kubernetes-dashboard.kube-system.svc.eqx.dal.prod.kubernetes/
Falco: Centralized Service Management
● Efficient for ephemeral instances
● Stores tagged data
● Easy to have many smaller instances (per team or complex system)
● Prometheus Operator runs everything in Kubernetes!
Kubernetes Monitoring & Alerting
● Alerts are defined with the application code!
● Easy to define SLA alerts
● Routing is still difficult
Kubernetes Monitoring & Alerting
Prometheus Operator
Kubernetes Monitoring & Alerting
● Sensu checks for all core components
● Sent to PagerDuty
Kubernetes Monitoring & Alerting
http://prometheus.kube-system.svc.eqx.dal.prod.kubernetes:9090/alerts
Kubernetes in a datacenter?
Kubernetes Architecture
Kubernetes Networking
Spine and Leaf Layer 3 Clos Topology
● All work is performed at the leaf/ToR switch
● Each leaf switch is separate Layer 3 domain
● Each leaf is a separate BGP domain (ASN)
● No Spanning Tree Protocol issues seen in L2 networks (convergence
time, loops)
Leaf Leaf Leaf Leaf
Spine Spine
Spine and Leaf Layer 3 Clos Topology
● Simple to understand
● Easy to scale
● Predictable and consistent latency (hops = 2)
● Allows for Anycast IPs
Leaf Leaf Leaf Leaf
Spine Spine
Calico Networking
● No network overlay required!
○ No nasty MTU issues
○ No performance impact
● Communicates directly with existing L3 network
● BGP Peering with Top of Rack switch
Calico Networking
● Engineers can think of Pod IPs as normal hosts
(they’re not)
○ Ping works
○ Consul works normally
○ Browser communication works
○ Shell sorta works (kubectl exec -it pod sh)
Calico Networking
● Each worker announces it’s pod IP ranges
○ Aggregated to /26
● Each master announces an External Anycast IP
○ Used for component communication
● Each ingress tier announces the Service IP range
ip addr add 10.123.0.0/17 dev lo
etcdctl set
/calico/bgp/v1/global/custom_filters/v4/services
'if ( net = 10.123.0.0/17 ) then { accept; }'
Calico Networking: Firewalls
● Calico supports NetworkPolicy firewall rules
○ We aren’t using this yet!
● Add DefaultDeny to block traffic into namespace
● Add Ingress rules for whitelisted communication
○ Works across namespaces
○ Works with raw IP ranges
QUESTIONS?
Thank you!
squarespace.com/careers
Kevin Lynch
klynch@squarespace.com
Future Work: Security!
● PodSecurityPolicy
● Mutual TLS in our environment:
○ Kubernetes WG Components WG
○ SIG Auth
○ SPIFFE
○ ISTIO
How do I connect to the cluster?
● Look at Getting Started guide on the wiki
● Generate a kubeconfig file
○ curl --user $(whoami) https://kubeconfig-generator.squarespace.net
● Uses KeyCloak OIDC to authenticate users
● Automatically refreshes credentials!
● Audit Logs are sent to logs.squarespace.net
Squarespace Clusters
Dallas Production
8 nodes
512 cores
2 TB RAM
NJ Production
4 nodes
256 cores
1 TB RAM
Dallas Staging
6 nodes
384 cores
1.5 TB RAM
NJ Staging
Coming soon!
Dallas Corp
8 nodes
416 cores
1.5 TB RAM
NJ Corp
Coming soon!
Testbed
6 Mac Minis :-p
Kube-Proxy: Internal Networking
● Runs on every host
● Routes service IPs to pods
● Watches for changes
● Updates IPTables rules
Communication With External Services
● Environment specific services should not be encoded in application
● Single deployment for all environments and datacenters
● Federation API expects same deployment
● Not all applications are using consul
Communication With External Services
Communication With External Services
apiVersion: v1
kind: Service
metadata:
name: kafka
namespace: elk
spec:
type: ClusterIP
clusterIP: None
sessionAffinity: None
ports:
- port: 9092
protocol: TCP
targetPort: 9092
apiVersion: v1
kind: Endpoints
metadata:
name: kafka
namespace: elk
subsets:
- addresses:
- ip: 10.120.201.33
- ip: 10.120.201.34
- ip: 10.120.201.35
...
ports:
- port: 9092
protocol: TCP

More Related Content

What's hot

3. Security Engineering
3. Security Engineering3. Security Engineering
3. Security EngineeringSam Bowne
 
IRJET- Blockchain based Certificate Issuing and Validation
IRJET-  	  Blockchain based Certificate Issuing and ValidationIRJET-  	  Blockchain based Certificate Issuing and Validation
IRJET- Blockchain based Certificate Issuing and ValidationIRJET Journal
 
Gartner Magic Quadrant for Secure Email Gateways 2014
Gartner Magic Quadrant for Secure Email Gateways 2014Gartner Magic Quadrant for Secure Email Gateways 2014
Gartner Magic Quadrant for Secure Email Gateways 2014Michael Bunn
 
CMACs and MACS based on block ciphers, Digital signature
CMACs and MACS based on block ciphers, Digital signatureCMACs and MACS based on block ciphers, Digital signature
CMACs and MACS based on block ciphers, Digital signatureAdarsh Patel
 
Content centric networks
Content centric networksContent centric networks
Content centric networksMeshingo Jack
 
Overview of Spanning Tree Protocol (STP & RSTP)
Overview of Spanning Tree Protocol (STP & RSTP)Overview of Spanning Tree Protocol (STP & RSTP)
Overview of Spanning Tree Protocol (STP & RSTP)Peter R. Egli
 
Java Card 2.x FAQ (2001)
Java Card 2.x FAQ (2001)Java Card 2.x FAQ (2001)
Java Card 2.x FAQ (2001)Julien SIMON
 
Block Cipher Modes of Operation And Cmac For Authentication
Block Cipher Modes of Operation And Cmac For AuthenticationBlock Cipher Modes of Operation And Cmac For Authentication
Block Cipher Modes of Operation And Cmac For AuthenticationVittorio Giovara
 
Introduction to Blockchain and Smart Contracts
Introduction to Blockchain and Smart ContractsIntroduction to Blockchain and Smart Contracts
Introduction to Blockchain and Smart ContractsTechracers
 
Digital Signatures
Digital SignaturesDigital Signatures
Digital SignaturesEhtisham Ali
 
Network security
Network securityNetwork security
Network securityhajra azam
 
Chapter 7 security tools i
Chapter 7   security tools iChapter 7   security tools i
Chapter 7 security tools iSyaiful Ahdan
 
Mesh Topology Design with Cisco Packet Tracer
Mesh Topology Design with Cisco Packet TracerMesh Topology Design with Cisco Packet Tracer
Mesh Topology Design with Cisco Packet TracerMaksudujjaman
 
Blockchain ecosystem and evolution
Blockchain ecosystem and evolutionBlockchain ecosystem and evolution
Blockchain ecosystem and evolutionChandra Sekhar AKNR
 
CCNA 1 Routing and Switching v5.0 Chapter 4
CCNA 1 Routing and Switching v5.0 Chapter 4CCNA 1 Routing and Switching v5.0 Chapter 4
CCNA 1 Routing and Switching v5.0 Chapter 4Nil Menon
 

What's hot (20)

3. Security Engineering
3. Security Engineering3. Security Engineering
3. Security Engineering
 
IRJET- Blockchain based Certificate Issuing and Validation
IRJET-  	  Blockchain based Certificate Issuing and ValidationIRJET-  	  Blockchain based Certificate Issuing and Validation
IRJET- Blockchain based Certificate Issuing and Validation
 
Gartner Magic Quadrant for Secure Email Gateways 2014
Gartner Magic Quadrant for Secure Email Gateways 2014Gartner Magic Quadrant for Secure Email Gateways 2014
Gartner Magic Quadrant for Secure Email Gateways 2014
 
Blockchain in healthcare
Blockchain in healthcareBlockchain in healthcare
Blockchain in healthcare
 
CMACs and MACS based on block ciphers, Digital signature
CMACs and MACS based on block ciphers, Digital signatureCMACs and MACS based on block ciphers, Digital signature
CMACs and MACS based on block ciphers, Digital signature
 
Content centric networks
Content centric networksContent centric networks
Content centric networks
 
Rsa
RsaRsa
Rsa
 
Overview of Spanning Tree Protocol (STP & RSTP)
Overview of Spanning Tree Protocol (STP & RSTP)Overview of Spanning Tree Protocol (STP & RSTP)
Overview of Spanning Tree Protocol (STP & RSTP)
 
Java Card 2.x FAQ (2001)
Java Card 2.x FAQ (2001)Java Card 2.x FAQ (2001)
Java Card 2.x FAQ (2001)
 
Block Cipher Modes of Operation And Cmac For Authentication
Block Cipher Modes of Operation And Cmac For AuthenticationBlock Cipher Modes of Operation And Cmac For Authentication
Block Cipher Modes of Operation And Cmac For Authentication
 
Introduction to Blockchain and Smart Contracts
Introduction to Blockchain and Smart ContractsIntroduction to Blockchain and Smart Contracts
Introduction to Blockchain and Smart Contracts
 
Digital Signatures
Digital SignaturesDigital Signatures
Digital Signatures
 
Hash Function
Hash FunctionHash Function
Hash Function
 
Network security
Network securityNetwork security
Network security
 
Chapter 7 security tools i
Chapter 7   security tools iChapter 7   security tools i
Chapter 7 security tools i
 
Cryptography and Network Security William Stallings Lawrie Brown
Cryptography and Network Security William Stallings Lawrie BrownCryptography and Network Security William Stallings Lawrie Brown
Cryptography and Network Security William Stallings Lawrie Brown
 
Mesh Topology Design with Cisco Packet Tracer
Mesh Topology Design with Cisco Packet TracerMesh Topology Design with Cisco Packet Tracer
Mesh Topology Design with Cisco Packet Tracer
 
Digital signature
Digital signatureDigital signature
Digital signature
 
Blockchain ecosystem and evolution
Blockchain ecosystem and evolutionBlockchain ecosystem and evolution
Blockchain ecosystem and evolution
 
CCNA 1 Routing and Switching v5.0 Chapter 4
CCNA 1 Routing and Switching v5.0 Chapter 4CCNA 1 Routing and Switching v5.0 Chapter 4
CCNA 1 Routing and Switching v5.0 Chapter 4
 

Similar to Kubernetes @ Squarespace: Kubernetes in the Datacenter

Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kevin Lynch
 
Kubernetes for Beginners
Kubernetes for BeginnersKubernetes for Beginners
Kubernetes for BeginnersDigitalOcean
 
Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016aspyker
 
Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016Sharma Podila
 
Kubernetes from scratch at veepee sysadmins days 2019
Kubernetes from scratch at veepee   sysadmins days 2019Kubernetes from scratch at veepee   sysadmins days 2019
Kubernetes from scratch at veepee sysadmins days 2019🔧 Loïc BLOT
 
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...Ambassador Labs
 
Kubernetes #1 intro
Kubernetes #1   introKubernetes #1   intro
Kubernetes #1 introTerry Cho
 
Docker on docker leveraging kubernetes in docker ee
Docker on docker leveraging kubernetes in docker eeDocker on docker leveraging kubernetes in docker ee
Docker on docker leveraging kubernetes in docker eeDocker, Inc.
 
Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)Idan Atias
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetesRishabh Indoria
 
Container orchestration and microservices world
Container orchestration and microservices worldContainer orchestration and microservices world
Container orchestration and microservices worldKarol Chrapek
 
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOpsDevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOpsAmbassador Labs
 
Containerizing MongoDB with kubernetes
Containerizing MongoDB with kubernetesContainerizing MongoDB with kubernetes
Containerizing MongoDB with kubernetesBrian McNamara
 
Free GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsFree GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsWeaveworks
 
Composing services with Kubernetes
Composing services with KubernetesComposing services with Kubernetes
Composing services with KubernetesBart Spaans
 
Kubernetes at (Organizational) Scale
Kubernetes at (Organizational) ScaleKubernetes at (Organizational) Scale
Kubernetes at (Organizational) ScaleJeff Zellner
 
Intro to Kubernetes & GitOps Workshop
Intro to Kubernetes & GitOps WorkshopIntro to Kubernetes & GitOps Workshop
Intro to Kubernetes & GitOps WorkshopWeaveworks
 
Introduction to Kubernetes Workshop
Introduction to Kubernetes WorkshopIntroduction to Kubernetes Workshop
Introduction to Kubernetes WorkshopBob Killen
 
Method of NUMA-Aware Resource Management for Kubernetes 5G NFV Cluster
Method of NUMA-Aware Resource Management for Kubernetes 5G NFV ClusterMethod of NUMA-Aware Resource Management for Kubernetes 5G NFV Cluster
Method of NUMA-Aware Resource Management for Kubernetes 5G NFV Clusterbyonggon chun
 
Comparison of existing cni plugins for kubernetes
Comparison of existing cni plugins for kubernetesComparison of existing cni plugins for kubernetes
Comparison of existing cni plugins for kubernetesAdam Hamsik
 

Similar to Kubernetes @ Squarespace: Kubernetes in the Datacenter (20)

Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Kubernetes for Beginners
Kubernetes for BeginnersKubernetes for Beginners
Kubernetes for Beginners
 
Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016
 
Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016
 
Kubernetes from scratch at veepee sysadmins days 2019
Kubernetes from scratch at veepee   sysadmins days 2019Kubernetes from scratch at veepee   sysadmins days 2019
Kubernetes from scratch at veepee sysadmins days 2019
 
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
 
Kubernetes #1 intro
Kubernetes #1   introKubernetes #1   intro
Kubernetes #1 intro
 
Docker on docker leveraging kubernetes in docker ee
Docker on docker leveraging kubernetes in docker eeDocker on docker leveraging kubernetes in docker ee
Docker on docker leveraging kubernetes in docker ee
 
Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
Container orchestration and microservices world
Container orchestration and microservices worldContainer orchestration and microservices world
Container orchestration and microservices world
 
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOpsDevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
 
Containerizing MongoDB with kubernetes
Containerizing MongoDB with kubernetesContainerizing MongoDB with kubernetes
Containerizing MongoDB with kubernetes
 
Free GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsFree GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOps
 
Composing services with Kubernetes
Composing services with KubernetesComposing services with Kubernetes
Composing services with Kubernetes
 
Kubernetes at (Organizational) Scale
Kubernetes at (Organizational) ScaleKubernetes at (Organizational) Scale
Kubernetes at (Organizational) Scale
 
Intro to Kubernetes & GitOps Workshop
Intro to Kubernetes & GitOps WorkshopIntro to Kubernetes & GitOps Workshop
Intro to Kubernetes & GitOps Workshop
 
Introduction to Kubernetes Workshop
Introduction to Kubernetes WorkshopIntroduction to Kubernetes Workshop
Introduction to Kubernetes Workshop
 
Method of NUMA-Aware Resource Management for Kubernetes 5G NFV Cluster
Method of NUMA-Aware Resource Management for Kubernetes 5G NFV ClusterMethod of NUMA-Aware Resource Management for Kubernetes 5G NFV Cluster
Method of NUMA-Aware Resource Management for Kubernetes 5G NFV Cluster
 
Comparison of existing cni plugins for kubernetes
Comparison of existing cni plugins for kubernetesComparison of existing cni plugins for kubernetes
Comparison of existing cni plugins for kubernetes
 

Recently uploaded

Lect_Z_Transform_Main_digital_image_processing.pptx
Lect_Z_Transform_Main_digital_image_processing.pptxLect_Z_Transform_Main_digital_image_processing.pptx
Lect_Z_Transform_Main_digital_image_processing.pptxMonirHossain707319
 
Furniture showroom management system project.pdf
Furniture showroom management system project.pdfFurniture showroom management system project.pdf
Furniture showroom management system project.pdfKamal Acharya
 
İTÜ CAD and Reverse Engineering Workshop
İTÜ CAD and Reverse Engineering WorkshopİTÜ CAD and Reverse Engineering Workshop
İTÜ CAD and Reverse Engineering WorkshopEmre Günaydın
 
Laundry management system project report.pdf
Laundry management system project report.pdfLaundry management system project report.pdf
Laundry management system project report.pdfKamal Acharya
 
Lecture_8-Digital implementation of analog controller design.pdf
Lecture_8-Digital implementation of analog controller design.pdfLecture_8-Digital implementation of analog controller design.pdf
Lecture_8-Digital implementation of analog controller design.pdfmohamedsamy9878
 
Online book store management system project.pdf
Online book store management system project.pdfOnline book store management system project.pdf
Online book store management system project.pdfKamal Acharya
 
NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...
NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...
NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...Amil baba
 
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...Prakhyath Rai
 
2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edgePaco Orozco
 
Construction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptxConstruction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptxwendy cai
 
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringC Sai Kiran
 
Top 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering ScientistTop 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering Scientistgettygaming1
 
ONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdf
ONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdfONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdf
ONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdfKamal Acharya
 
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdfRESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdfKamal Acharya
 
internship exam ppt.pptx on embedded system and IOT
internship exam ppt.pptx on embedded system and IOTinternship exam ppt.pptx on embedded system and IOT
internship exam ppt.pptx on embedded system and IOTNavyashreeS6
 
Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2T.D. Shashikala
 
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWINGBRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWINGKOUSTAV SARKAR
 
Electrical shop management system project report.pdf
Electrical shop management system project report.pdfElectrical shop management system project report.pdf
Electrical shop management system project report.pdfKamal Acharya
 
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data StreamKIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data StreamDr. Radhey Shyam
 
一比一原版(UNK毕业证)内布拉斯加州立大学科尼分校毕业证成绩单
一比一原版(UNK毕业证)内布拉斯加州立大学科尼分校毕业证成绩单一比一原版(UNK毕业证)内布拉斯加州立大学科尼分校毕业证成绩单
一比一原版(UNK毕业证)内布拉斯加州立大学科尼分校毕业证成绩单tuuww
 

Recently uploaded (20)

Lect_Z_Transform_Main_digital_image_processing.pptx
Lect_Z_Transform_Main_digital_image_processing.pptxLect_Z_Transform_Main_digital_image_processing.pptx
Lect_Z_Transform_Main_digital_image_processing.pptx
 
Furniture showroom management system project.pdf
Furniture showroom management system project.pdfFurniture showroom management system project.pdf
Furniture showroom management system project.pdf
 
İTÜ CAD and Reverse Engineering Workshop
İTÜ CAD and Reverse Engineering WorkshopİTÜ CAD and Reverse Engineering Workshop
İTÜ CAD and Reverse Engineering Workshop
 
Laundry management system project report.pdf
Laundry management system project report.pdfLaundry management system project report.pdf
Laundry management system project report.pdf
 
Lecture_8-Digital implementation of analog controller design.pdf
Lecture_8-Digital implementation of analog controller design.pdfLecture_8-Digital implementation of analog controller design.pdf
Lecture_8-Digital implementation of analog controller design.pdf
 
Online book store management system project.pdf
Online book store management system project.pdfOnline book store management system project.pdf
Online book store management system project.pdf
 
NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...
NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...
NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...
 
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
 
2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge
 
Construction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptxConstruction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptx
 
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
 
Top 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering ScientistTop 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering Scientist
 
ONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdf
ONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdfONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdf
ONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdf
 
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdfRESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
 
internship exam ppt.pptx on embedded system and IOT
internship exam ppt.pptx on embedded system and IOTinternship exam ppt.pptx on embedded system and IOT
internship exam ppt.pptx on embedded system and IOT
 
Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2
 
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWINGBRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
 
Electrical shop management system project report.pdf
Electrical shop management system project report.pdfElectrical shop management system project report.pdf
Electrical shop management system project report.pdf
 
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data StreamKIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
 
一比一原版(UNK毕业证)内布拉斯加州立大学科尼分校毕业证成绩单
一比一原版(UNK毕业证)内布拉斯加州立大学科尼分校毕业证成绩单一比一原版(UNK毕业证)内布拉斯加州立大学科尼分校毕业证成绩单
一比一原版(UNK毕业证)内布拉斯加州立大学科尼分校毕业证成绩单
 

Kubernetes @ Squarespace: Kubernetes in the Datacenter

  • 1. Kubernetes @ Squarespace Microservices on Kubernetes in a Datacenter Kevin Lynch klynch@squarespace.com
  • 2. Agenda 01 The problem with static infrastructure 02 Kubernetes Fundamentals 03 Adapting Microservices to Kubernetes 04 Kubernetes in a datacenter?
  • 3. Microservices Journey: A Story of Growth 2013: small (< 50 engineers) build product & grow customer base whatever works 2014: medium (< 100 engineers) we have a lot of customers now! whatever works doesn't work anymore 2016: large (100+ engineers) architect for scalability and reliability organizational structures ?: XL (200+ engineers)
  • 4. Challenges with a Monolith ● Reliability ● Performance ● Engineering agility/speed, cross-team coupling ● Engineering time spent fire fighting rather than building new functionality What were the increasingly difficult challenges with a monolith?
  • 5. Challenges with a Monolith ● Minimize failure domains ● Developers are more confident in their changes ● Squarespace can move faster Solution: Microservices!
  • 6. Operational Challenges ● Engineering org grows… ● More features... ● More services… ● More infrastructure to spin up… ● Ops becomes a blocker... Stuck in a loop
  • 7. Traditional Provisioning Process ● Pick ESX with available resources ● Pick IP ● Register host to Cobbler ● Register DNS entry ● Create new VM on ESX ● PXE boot VM and install OS and base configuration ● Install system dependencies (LDAP, NTP, CollectD, Sensu…) ● Install app dependencies (Java, FluentD/Filebeat, Consul, Mongo- S…) ● Install the app ● App registers with discovery system and begins receiving traffic
  • 8. Containerization & Kubernetes Orchestration ● Difficult to find resources ● Slow to provision and scale ● Discovery is a must ● Metrics system must support short lived metrics ● Alerts are usually per instance Static infrastructure and microservices do not mix!
  • 9. Kubernetes Provisioning Process ● kubectl apply -f app.yaml
  • 10. Kubernetes Fundamentals ● ApiVersion & Kind ○ type of object ● Metadata ○ Names, annotations, labels ● Spec & Status ○ What you want to happen... ○ … versus reality apiVersion: v1 kind: Pod metadata: name: nginx namespace: default annotations: squarespace.net/build: nginx-42 labels: app: frontend ... spec: containers: - name: nginx image: nginx:latest ... status: hostIP: 10.122.1.201 podIP: 10.123.185.9 phase: Running qosClass: BestEffort startTime: 2017-07-31T02:08:25Z ...
  • 11. Kubernetes Fundamentals ● Labels ○ KV pairs used for identification ○ Indexed for efficient querying ● Annotations ○ Non identifying information ○ Can be unstructured apiVersion: v1 kind: Pod metadata: name: nginx namespace: default annotations: squarespace.net/build: nginx-42 labels: app: frontend ... spec: containers: - name: nginx image: nginx:1.8.1 ... status: hostIP: 10.122.1.201 podIP: 10.123.185.9 phase: Running qosClass: BestEffort startTime: 2017-07-31T02:08:25Z ...
  • 12. Common Objects: Pods ● Basic deployable workload ● Group of 1+ containers ● Define resource requirements ● Defines storage volumes ○ Ephemeral storage ○ Shared storage (NFS, CephFS) ○ Block storage (RBD) ○ Secrets ○ ConfigMaps ○ more... spec: containers: - name: location image: .../location:master-269 ports: ... resources: limits: cpu: 2 memory: 4Gi requests: cpu: 2 memory: 4Gi volumeMounts: - name: config mountPath: /service/config - name: log-dir mountPath: /data/logs volumes: - name: config configMap: name: location-config - name: log-dir emptyDir: {}
  • 13. Common Objects: Deployments ● Declarative ● Defines a type of pod to run ● Defines desired # ● Supports basic operations ○ Can be rolled back quickly! ○ Can be scaled up/down ● Meant to be stateless apps! kind: Deployment spec: replicas: 3 selector: matchLabels: service: location strategy: rollingUpdate: maxSurge: 100% maxUnavailable: 0 type: RollingUpdate template: ... pod info here ...
  • 14. Common Objects: Services ● Make pods addressable ● Assigned an IP ● Addressable DNS entries! apiVersion: v1 kind: Service metadata: name: location namespace: core-services spec: type: ClusterIP clusterIP: 10.123.79.211 selector: service: location ports: - name: traffic port: 8080 - name: admin port: 8081
  • 15. Common Objects: Namespaces ● Namespaces ○ Isolates groups of objects ■ Developer ■ Team ■ System or Service ○ Good for permission boundaries ○ Good for network boundaries ● Most objects are namespaced apiVersion: v1 kind: Namespace metadata: name: core-services annotations: squarespace.net/contact: | team@squarespace.com creationTimestamp: 2017-06-14T.. spec: finalizers: - kubernetes status: phase: Active
  • 16. Microservice Pod Definition resources: requests: cpu: 2 memory: 4Gi limits: cpu: 2 memory: 4Gi Microservice Pod Java Microservice fluentd consul
  • 17. Future Work: Updating Common Dependencies ● Custom Initializers ○ Inject container dependencies into deployments (consul, fluentd) ○ Configure Prometheus instances for each namespace ● Trigger rescheduling of pods when dependencies need updating apiVersion: extensions/v1beta1 kind: Deployment metadata: name: location namespace: core-services annotations: initializer.squarespace.net/consul: "true"
  • 18. Future Work: Enforce Squarespace Standards ● Custom Admission Controller requires all services, deployments, etc. meet certain standards ○ Resource requests/limits ○ Owner annotations ○ Service labels
  • 19. Quality of Service Classes resources: requests: cpu: 2 memory: 4Gi limits: cpu: 2 memory: 4Gi ● BestEffort ○ No resource constraints ○ First to be killed under pressure ● Guaranteed ○ Requests == Limits ○ Last to kill under pressure ○ Easier to reason about resources ● Burstable ○ Take advantage of unused resources! ○ Can be tricky with some languages
  • 20. Microservice Pod Definition resources: requests: cpu: 2 memory: 4Gi limits: cpu: 2 memory: 4Gi ● Kubernetes assumes no other processes are consuming significant resources ● Completely Fair Scheduler (CFS) ○ Schedules a task based on CPU Shares ○ Throttles a task once it hits CPU Quota ● OOM Killed when memory limit exceeded
  • 21. Microservice Pod Definition resources: requests: cpu: 2 memory: 4Gi limits: cpu: 2 memory: 4Gi ● Shares = CPU Request * 1024 ● Total Kubernetes Shares = # Cores * 1024 ● Quota = CPU Limit * 100ms ● Period = 100ms
  • 22. Java in a Container ● JVM is able to detect # of cores via sysconf(_SC_NPROCESSORS_ONLN) ● Many libraries rely on Runtime.getRuntime.availableProcessors() ○ Jetty ○ ForkJoinPool ○ GC Threads ○ That mystery dependency...
  • 23. Java in a Container ● Provide a base container that calculates the container’s resources! ● Detect # of “cores” assigned ○ /sys/fs/cgroup/cpu/cpu.cfs_quota_us divided by /sys/fs/cgroup/cpu/cpu.cfs_period_us ● Automatically tune the JVM: ○ -XX:ParallelGCThreads=${core_limit} ○ -XX:ConcGCThreads=${core_limit} ○ -Djava.util.concurrent.ForkJoinPool.common.parallelism=${core_limit}
  • 24. Java in a Container ● Use Linux preloading to override availableProcessors() #include <stdlib.h> #include <unistd.h> int JVM_ActiveProcessorCount(void) { char* val = getenv("CONTAINER_CORE_LIMIT"); return val != NULL ? atoi(val) : sysconf(_SC_NPROCESSORS_ONLN); } https://engineering.squarespace.com/blog/2017/understanding-linux-container-scheduling
  • 25. Service Lifecycle ● How do we observe the health of services? ● How do we handle rollbacks?
  • 27. ● Graphite does not scale well with ephemeral instances ● Easy to have combinatoric explosion of metrics Traditional Monitoring & Alerting ● Application and system alerts are tightly coupled ● Difficult to create alerts on SLAs ● Difficult to route alerts
  • 29. Falco: Centralized Service Management ● Kubernetes Dashboard is too complex and powerful ● Centralized deployment status and history ● Manual rollbacks of deploys ● Quick access to scaling controls
  • 32. ● Efficient for ephemeral instances ● Stores tagged data ● Easy to have many smaller instances (per team or complex system) ● Prometheus Operator runs everything in Kubernetes! Kubernetes Monitoring & Alerting ● Alerts are defined with the application code! ● Easy to define SLA alerts ● Routing is still difficult
  • 35. Kubernetes Monitoring & Alerting ● Sensu checks for all core components ● Sent to PagerDuty
  • 36. Kubernetes Monitoring & Alerting http://prometheus.kube-system.svc.eqx.dal.prod.kubernetes:9090/alerts
  • 37. Kubernetes in a datacenter?
  • 40. Spine and Leaf Layer 3 Clos Topology ● All work is performed at the leaf/ToR switch ● Each leaf switch is separate Layer 3 domain ● Each leaf is a separate BGP domain (ASN) ● No Spanning Tree Protocol issues seen in L2 networks (convergence time, loops) Leaf Leaf Leaf Leaf Spine Spine
  • 41. Spine and Leaf Layer 3 Clos Topology ● Simple to understand ● Easy to scale ● Predictable and consistent latency (hops = 2) ● Allows for Anycast IPs Leaf Leaf Leaf Leaf Spine Spine
  • 42. Calico Networking ● No network overlay required! ○ No nasty MTU issues ○ No performance impact ● Communicates directly with existing L3 network ● BGP Peering with Top of Rack switch
  • 43. Calico Networking ● Engineers can think of Pod IPs as normal hosts (they’re not) ○ Ping works ○ Consul works normally ○ Browser communication works ○ Shell sorta works (kubectl exec -it pod sh)
  • 44. Calico Networking ● Each worker announces it’s pod IP ranges ○ Aggregated to /26 ● Each master announces an External Anycast IP ○ Used for component communication ● Each ingress tier announces the Service IP range ip addr add 10.123.0.0/17 dev lo etcdctl set /calico/bgp/v1/global/custom_filters/v4/services 'if ( net = 10.123.0.0/17 ) then { accept; }'
  • 45. Calico Networking: Firewalls ● Calico supports NetworkPolicy firewall rules ○ We aren’t using this yet! ● Add DefaultDeny to block traffic into namespace ● Add Ingress rules for whitelisted communication ○ Works across namespaces ○ Works with raw IP ranges
  • 47. Future Work: Security! ● PodSecurityPolicy ● Mutual TLS in our environment: ○ Kubernetes WG Components WG ○ SIG Auth ○ SPIFFE ○ ISTIO
  • 48. How do I connect to the cluster? ● Look at Getting Started guide on the wiki ● Generate a kubeconfig file ○ curl --user $(whoami) https://kubeconfig-generator.squarespace.net ● Uses KeyCloak OIDC to authenticate users ● Automatically refreshes credentials! ● Audit Logs are sent to logs.squarespace.net
  • 49. Squarespace Clusters Dallas Production 8 nodes 512 cores 2 TB RAM NJ Production 4 nodes 256 cores 1 TB RAM Dallas Staging 6 nodes 384 cores 1.5 TB RAM NJ Staging Coming soon! Dallas Corp 8 nodes 416 cores 1.5 TB RAM NJ Corp Coming soon! Testbed 6 Mac Minis :-p
  • 50. Kube-Proxy: Internal Networking ● Runs on every host ● Routes service IPs to pods ● Watches for changes ● Updates IPTables rules
  • 51. Communication With External Services ● Environment specific services should not be encoded in application ● Single deployment for all environments and datacenters ● Federation API expects same deployment ● Not all applications are using consul
  • 53. Communication With External Services apiVersion: v1 kind: Service metadata: name: kafka namespace: elk spec: type: ClusterIP clusterIP: None sessionAffinity: None ports: - port: 9092 protocol: TCP targetPort: 9092 apiVersion: v1 kind: Endpoints metadata: name: kafka namespace: elk subsets: - addresses: - ip: 10.120.201.33 - ip: 10.120.201.34 - ip: 10.120.201.35 ... ports: - port: 9092 protocol: TCP

Editor's Notes

  1. Not so great for operations
  2. last year less than a dozen services existed, today more than 50 are in production or actively developed
  3. Typical workflow for provisioning a VM at Squarespace Currently takes about 15 minutes to provision a VM There are definitily some optimizations to be made here: Use VM templates (hard to generalize space constraints in general, but not so much of a problem for microservices) Use VMware vMotion and other tools for auto migrating and finding free resources
  4. The big takeaway Requires a robust discovery mechanism for services; can’t easily get by with static names This can be as simple DNS or load balancers or something more complex (zookeeper, etcd, Consul) Each has tradeoffs Metrics: Graphite metrics are not meant to be ephemeral long lived metrics that are expensive to create, and are not efficiently aggregated (no tagging support!) Difficult to control where data is coming from and how much data is coming in Easy to blow out disk, or send faulty metrics Centralized metrics can lead to Alerts Sensu alerts are per instance; system
  5. A bit simplified, as there are a lot of moving parts Declarative Infrastructure
  6. All objects are represented by YAML descriptions
  7. Kubernetes resource constraints aren’t enough Need an understand of CGroups
  8. Kubernetes resource constraints aren’t enough Need an understand of CGroups
  9. Kubernetes resource constraints aren’t enough
  10. Push vs Pull metrics Same Grafana Same ELK
  11. Sensu: app and system alerts are tightly coupled Overwhelming & confusing to everyone except the guy who designed the system does not present a sense of ownership Hard to get a single view: graphite checks vs instance checks
  12. Alerts are defined with code Encourages developer ownership only relevent alerts are defined: active requests, error rates, response times, # of instances up
  13. Deployment logic is not colocated with code
  14. Depends on Networking
  15. Very SIMPLE Each leaf is a Top of Rack switch All devices are exactly the same number of segments away
  16. Calico is backed by Etcd… It’s super easy to leverage this
  17. TODO: add graphic of KeyCloak interaction
  18. We’re not moving all infrastructure to Kubernetes anytime soon stateful systems and hardware dependent services like Databases, Kafka, ELK will remain statically provisioned We need a way to automatically update these endpoints
  19. Solution: encode external endpoints into services
  20. Headless services return A records