SlideShare a Scribd company logo
1 of 31
Download to read offline
DevOps as a Contract
Subhas Dandapani
- Travel ticket search and booking platform
- 15 countries
- 100k+ destinations
- 800+ partners and providers
- 20m+ monthly visitors
GoEuro
50 to 150+ engineers
10 to 300+ services
How many changes can
we put in the user’s
hands in a week?
Centralization
bottlenecks
“Throw over the wall”
from dev to qa to ops
Exponential growth
Huge infrastructure QoS
requirements
Making best use of
available resources
Doing changes fast
Scaling without breaking
Log/metrics explosion
Challenges - circa 2017
Infrastructure Architecture
All forms of integration:
REST, SOAP, gRPC,
Event-driven, DB-driven,
File-driven,
Metrics-driven
Scala, Java, NodeJS,
Golang, Python, ...
Geo, Routing, Scheduling,
Ticketing, Big Data, ...
Delivery / DevOps
Probably only way forward
Distribute infrastructure ownership
But...InfrastructureMaturity
ServiceA
ServiceB
ServiceC
ServiceD
ServiceE
ServiceF
ServiceG
ServiceH
ServiceI
ServiceJ
ServiceK
… … … …
Continuous Delivery is for infrastructure too!InfrastructureMaturity
ServiceA
ServiceB
ServiceC
ServiceD
ServiceE
ServiceF
ServiceG
ServiceH
ServiceI
ServiceJ
ServiceK
… … … …
CI as a Tool
- Before: Jenkins-as-a-Tool
- Huge, complex CI jobs
- Partially configured by application teams and DevOps, no clear boundaries
- Several dedicated release managers who needed to maintain a "mind map" of
releases
- Engineers having to ping DevOps teams for releases
- Many CI plugins were installed for different teams
- Every Jenkins or Jenkins-plugin upgrade broke random jobs
- Agent configuration, auto-scaling, and job execution were also problematic
- Tried JenkinsFile - but still, Jenkins as a tool!
- Lots of copy-pasted config, shared functions, inability to change things as a whole from
outside, code injection?, etc.
Jenkinsfile
- Inability to instrument jobs as a whole and add global shared behavior
- Cannot parse code and modify AST tree
- Lots of copy-pasted config, shared functions
- Inability to do continuous changes on those functions
- Inability to prevent tie-in with internal plugins
- It’s still Jenkins-as-a-tool
- Should be semantically understandable and instrumentable
- Adopt a job definition contract
- Dots (Isolated Jobs), Pipelines (Lists), or Graphs, just pick one and adopt a YAML contract
CI as a Contract
image: jenkins-plain:latest
unit-tests:
stage: test
script: test.sh
master:
script: release.sh
deploy-qa:
environment:
name: qa
script: deploy.sh
deploy-preprod:
deploy-prod:
...
CI as a Contract
- Specify pipeline contract with container image, scripts, checkpoints, etc.
- We take care of the implementation
- Team adds everything else on top of this file
- Build notifications, Caching, Agent allocation, Autoscaling, Analytics,
Organizational context, Auditing, etc.
CI as a Contract
CI as a Contract
- Minimal stack that we needed for every service
- Artifact (JAR/WAR/Docker image/etc)
- Service (shell script/initscript/systemd unit/docker container/etc)
- + Supporting services (watchdogs, ancillary utilities, etc)
- Configuration for different environments
- Multiple Instances of the stack
- Connected to traffic (networking, firewall, load balancer)
- Stack as a unit
- Let’s worry about post-deployment activities next
Configuration Management
- Fulfilling the minimal package
- Distribute chef cookbooks + librarian chef + chef apply
- Or ansible playbooks + ansible galaxy + ansible apply
- Or puppet/salt/...
- Distribute terraform with strict credentials + terraform apply
- Distribute <X> container orchestrator + apply
- Distribute Kubernetes resources + kubectl apply
- Core difference between these tools?
Configuration Management
Kubernetes as a Contract
Instance = Podspec
Running unit = Container spec
Multiple Instances = Deployment spec
Configuration = ConfigMap spec
Traffic = Service, Ingress spec
All modeled as JSON or YAML, but has
a standard contract/spec
kind: Deployment
metadata:
name: {{ .Values.name }}
namespace: {{ .Values.name }}
spec:
replicas: 5
… … …
containers:
- name: my-app
image: my-container:latest
resources:
requests:
cpu: …
memory: …
limits:
cpu: …
memory: …
livenessProbe:
httpGet:
path: /_system/health
env: ...
Application Stack
Kubernetes API Server
Open, Secure HTTPS
Protocol
Kubernetes responds to fulfil
what has been applied
Artifact = Docker image
Instance = Pod
Running unit = Container
Multiple Instances = Deployment
Configuration = ConfigMap
Traffic = Service, Ingress
All modeled as JSON or YAML, but
has a standard Resource spec
kubectl apply
API Proxy
Validation, Linting, Org-wide
standards, etc.
Application Stack
Stack spec kubectl apply
Kubernetes API
Kubernetes as a Contract
- Health checks must exist
- CPU/memory must exist
- Images must not be external
- Entrypoint is for us, Script is for
app
- No alpha/beta stuff
- Whitelisted resources, separate
stateful & stateless clusters
- Minimum 2 replicas
- Similar for all resources
kind: Deployment
metadata:
name: {{ .Values.name }}
namespace: {{ .Values.name }}
spec:
replicas: 5
… … …
containers:
- name: my-app
image: my-container:latest
resources:
requests:
cpu: …
memory: …
limits:
cpu: …
memory: …
livenessProbe:
httpGet:
path: /_system/health
env: ...
API Proxy
Validation, Linting, Org-wide
standards, etc.
Cloud Resources as a Contract
Model your own contract
e.g. Cloud Bucket as a YAML
kubectl apply
Kubernetes API / Custom
controllers
Application Stack
- Using Kubernetes since 1.2 and on 1.10 right now
- Avoiding kubernetes API maze
- “src” and “ops” in every repository, completely self-contained
- Kubernetes upgrades go exactly as planned as we know what workloads are
running, and how to orchestrate/change the workloads
- Multiple features and standards rolled out to everyone who uses kubernetes
clusters
- Stateful and Stateless clusters separate from Day 1
- Not using kubernetes/helm as a tool, but as a contract
Application Stack
Logging as a Tool
- Logstash as a tool
- Hundreds of custom logstash transformations, pipelines, ports
- Snowflake configs for different fields for each service
- Slow, ticketed bootstrap process for new services
- Explosion of indices
- Multiple different ways to push logs from applications
Logging as a Contract
- Print logs on STDOUT, and they will come up on Kibana
- If it’s JSON, you get structured fields
- If it’s plain, you get plain message
- Standard enrichment rules to avoid stepping on each other’s toes:
- <field>_i = integer, <field>_s = string, <field>_geo = geo with lat/long, <field>_txt = text, etc.
- No application-specific code in logstash anymore
- Everything else taken care by team
- Scaling, Rotation, Retention, etc.
Routing as a Tool
- Started with one router handling all traffic, hardcoded service discovery
- As more services grew, more custom and inconsistent routing rules
- Nested rewrites and redirects, and randomly captured URL paths
- Most services had to carry custom nginx forwarders
- Proxies, then proxies-inside-proxies, etc.
- Unmanageable routing graph
- Complicated procedure to setup a new application
- Monitoring/logs was a different problem altogether
- Adopted Ingress as a Contract
- Every service was assigned fixed route based on namespace
- Consistent and predictable routing
- Team adds value on top of the contract
- Automatic logging and monitoring
- Global health and SLA checks
- Extensive instrumentation and tracing
- Scalability
- Load balancing
- Edge gateways
- Cross-zone failovers
- Dashboards and network policies
Routing as a Contract
Monitoring as a Contract
- Only tool-driven area in organization
- Prometheus as a Contract is great
- But we have some way to go (clustering, sharding, etc)
- All containers auto-injected with secrets
- Non-org images blacklisted in the API proxy
- GDPR as a contract
- ...
- In all the cases, tool is secondary, contract is discussed and agreed upon first
Security as a Contract
Looking back...
Continuous delivery for operationsInfrastructureMaturity
ServiceA
ServiceB
ServiceC
ServiceD
ServiceE
ServiceF
ServiceG
ServiceH
ServiceI
ServiceJ
ServiceK
… … … …
Lessons
- DevOps Team is another core engineering team providing services that
applications integrate with
- We provide contracts, and service implementations that fulfil that contract
- Have time to innovate and add value on top instead of handling tickets
- Wherever we added heavy tests around the contract with mock applications, infrastructure
quality went up
- Wherever possible, we dogfood the contract to ourselves
- Discuss hard before agreeing on a contract, and then go deliver
- Keep it simple/small, semantic, instrumentable and usable from dev machines to prod
- Adopt and reduce API surface of a mature industry contract wherever possible to avoid
re-design from scratch
- Not universally applicable, implementation of tool still matters
- Continuously upgrade infrastructure and good UX for engineers
Feedback/Queries
@rdsubhas

More Related Content

What's hot

Apache Pulsar at Tencent Game: Adoption, Operational Quality Optimization Exp...
Apache Pulsar at Tencent Game: Adoption, Operational Quality Optimization Exp...Apache Pulsar at Tencent Game: Adoption, Operational Quality Optimization Exp...
Apache Pulsar at Tencent Game: Adoption, Operational Quality Optimization Exp...StreamNative
 
MQTT and Apache Kafka: The Solution to Poor Internet Connectivity in Africa (...
MQTT and Apache Kafka: The Solution to Poor Internet Connectivity in Africa (...MQTT and Apache Kafka: The Solution to Poor Internet Connectivity in Africa (...
MQTT and Apache Kafka: The Solution to Poor Internet Connectivity in Africa (...confluent
 
Microservices in Scala - theory & practice
Microservices in Scala - theory & practiceMicroservices in Scala - theory & practice
Microservices in Scala - theory & practiceŁukasz Sowa
 
Event-Driven Microservices With NATS Streaming
Event-Driven Microservices With NATS StreamingEvent-Driven Microservices With NATS Streaming
Event-Driven Microservices With NATS StreamingShiju Varghese
 
How Apache Pulsar Helps Tencent Process Tens of Billions of Transactions Effi...
How Apache Pulsar Helps Tencent Process Tens of Billions of Transactions Effi...How Apache Pulsar Helps Tencent Process Tens of Billions of Transactions Effi...
How Apache Pulsar Helps Tencent Process Tens of Billions of Transactions Effi...StreamNative
 
Orion Context Broker NGSI-v2 Overview for Developers That Already Know NGSI-v...
Orion Context Broker NGSI-v2 Overview for Developers That Already Know NGSI-v...Orion Context Broker NGSI-v2 Overview for Developers That Already Know NGSI-v...
Orion Context Broker NGSI-v2 Overview for Developers That Already Know NGSI-v...Fermin Galan
 
"Плюси та мінуси впровадження AWS Lambda в проєкт" Віталій Григоришин
"Плюси та мінуси впровадження AWS Lambda в проєкт" Віталій Григоришин"Плюси та мінуси впровадження AWS Lambda в проєкт" Віталій Григоришин
"Плюси та мінуси впровадження AWS Lambda в проєкт" Віталій ГригоришинFwdays
 
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQLIngesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQLGuido Schmutz
 
Kong ingress controller kubernetes ingress on steroids
Kong ingress controller   kubernetes ingress on steroidsKong ingress controller   kubernetes ingress on steroids
Kong ingress controller kubernetes ingress on steroidsLibbySchulze
 
Orion Context Broker NGSI-v2 Overview for Developers That Already Know NGSI-v...
Orion Context Broker NGSI-v2 Overview for Developers That Already Know NGSI-v...Orion Context Broker NGSI-v2 Overview for Developers That Already Know NGSI-v...
Orion Context Broker NGSI-v2 Overview for Developers That Already Know NGSI-v...Fermin Galan
 
Software Defined Service Networking (SDSN) - by Dr. Indika Kumara
Software Defined Service Networking (SDSN) - by Dr. Indika KumaraSoftware Defined Service Networking (SDSN) - by Dr. Indika Kumara
Software Defined Service Networking (SDSN) - by Dr. Indika KumaraThejan Wijesinghe
 
Performance is not an Option - gRPC and Cassandra
Performance is not an Option - gRPC and CassandraPerformance is not an Option - gRPC and Cassandra
Performance is not an Option - gRPC and CassandraDave Bechberger
 
Microservices in Go with Go kit
Microservices in Go with Go kitMicroservices in Go with Go kit
Microservices in Go with Go kitShiju Varghese
 
Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020
Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020
Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020confluent
 
4th SDN Interest Group Seminar-Session 2-3(130313)
4th SDN Interest Group Seminar-Session 2-3(130313)4th SDN Interest Group Seminar-Session 2-3(130313)
4th SDN Interest Group Seminar-Session 2-3(130313)NAIM Networks, Inc.
 
OAuth2 Authorization Server Under the Hood
OAuth2 Authorization Server Under the HoodOAuth2 Authorization Server Under the Hood
OAuth2 Authorization Server Under the HoodLohika_Odessa_TechTalks
 
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming ApplicationsRunning Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming ApplicationsLightbend
 

What's hot (20)

Apache Pulsar at Tencent Game: Adoption, Operational Quality Optimization Exp...
Apache Pulsar at Tencent Game: Adoption, Operational Quality Optimization Exp...Apache Pulsar at Tencent Game: Adoption, Operational Quality Optimization Exp...
Apache Pulsar at Tencent Game: Adoption, Operational Quality Optimization Exp...
 
Envoy @ Lyft: Developer Productivity
Envoy @ Lyft: Developer ProductivityEnvoy @ Lyft: Developer Productivity
Envoy @ Lyft: Developer Productivity
 
MQTT and Apache Kafka: The Solution to Poor Internet Connectivity in Africa (...
MQTT and Apache Kafka: The Solution to Poor Internet Connectivity in Africa (...MQTT and Apache Kafka: The Solution to Poor Internet Connectivity in Africa (...
MQTT and Apache Kafka: The Solution to Poor Internet Connectivity in Africa (...
 
Microservices in Scala - theory & practice
Microservices in Scala - theory & practiceMicroservices in Scala - theory & practice
Microservices in Scala - theory & practice
 
Event-Driven Microservices With NATS Streaming
Event-Driven Microservices With NATS StreamingEvent-Driven Microservices With NATS Streaming
Event-Driven Microservices With NATS Streaming
 
gRPC Overview
gRPC OverviewgRPC Overview
gRPC Overview
 
How Apache Pulsar Helps Tencent Process Tens of Billions of Transactions Effi...
How Apache Pulsar Helps Tencent Process Tens of Billions of Transactions Effi...How Apache Pulsar Helps Tencent Process Tens of Billions of Transactions Effi...
How Apache Pulsar Helps Tencent Process Tens of Billions of Transactions Effi...
 
A10 Itil Oasys Webex 090309
A10 Itil Oasys  Webex 090309A10 Itil Oasys  Webex 090309
A10 Itil Oasys Webex 090309
 
Orion Context Broker NGSI-v2 Overview for Developers That Already Know NGSI-v...
Orion Context Broker NGSI-v2 Overview for Developers That Already Know NGSI-v...Orion Context Broker NGSI-v2 Overview for Developers That Already Know NGSI-v...
Orion Context Broker NGSI-v2 Overview for Developers That Already Know NGSI-v...
 
"Плюси та мінуси впровадження AWS Lambda в проєкт" Віталій Григоришин
"Плюси та мінуси впровадження AWS Lambda в проєкт" Віталій Григоришин"Плюси та мінуси впровадження AWS Lambda в проєкт" Віталій Григоришин
"Плюси та мінуси впровадження AWS Lambda в проєкт" Віталій Григоришин
 
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQLIngesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
 
Kong ingress controller kubernetes ingress on steroids
Kong ingress controller   kubernetes ingress on steroidsKong ingress controller   kubernetes ingress on steroids
Kong ingress controller kubernetes ingress on steroids
 
Orion Context Broker NGSI-v2 Overview for Developers That Already Know NGSI-v...
Orion Context Broker NGSI-v2 Overview for Developers That Already Know NGSI-v...Orion Context Broker NGSI-v2 Overview for Developers That Already Know NGSI-v...
Orion Context Broker NGSI-v2 Overview for Developers That Already Know NGSI-v...
 
Software Defined Service Networking (SDSN) - by Dr. Indika Kumara
Software Defined Service Networking (SDSN) - by Dr. Indika KumaraSoftware Defined Service Networking (SDSN) - by Dr. Indika Kumara
Software Defined Service Networking (SDSN) - by Dr. Indika Kumara
 
Performance is not an Option - gRPC and Cassandra
Performance is not an Option - gRPC and CassandraPerformance is not an Option - gRPC and Cassandra
Performance is not an Option - gRPC and Cassandra
 
Microservices in Go with Go kit
Microservices in Go with Go kitMicroservices in Go with Go kit
Microservices in Go with Go kit
 
Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020
Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020
Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020
 
4th SDN Interest Group Seminar-Session 2-3(130313)
4th SDN Interest Group Seminar-Session 2-3(130313)4th SDN Interest Group Seminar-Session 2-3(130313)
4th SDN Interest Group Seminar-Session 2-3(130313)
 
OAuth2 Authorization Server Under the Hood
OAuth2 Authorization Server Under the HoodOAuth2 Authorization Server Under the Hood
OAuth2 Authorization Server Under the Hood
 
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming ApplicationsRunning Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
 

Similar to DevOps as a Contract

Blockchin Architecture on Azure-Part-3
Blockchin Architecture on Azure-Part-3Blockchin Architecture on Azure-Part-3
Blockchin Architecture on Azure-Part-3Mohammad Asif
 
Cisco Automation with Puppet and onePK - PuppetConf 2013
Cisco Automation with Puppet and onePK - PuppetConf 2013Cisco Automation with Puppet and onePK - PuppetConf 2013
Cisco Automation with Puppet and onePK - PuppetConf 2013Puppet
 
CNCF Singapore - Introduction to Envoy
CNCF Singapore - Introduction to EnvoyCNCF Singapore - Introduction to Envoy
CNCF Singapore - Introduction to EnvoyHarish
 
Up and Running with gRPC & Cloud Career [GDG-Cloud-Dhaka-IO/2022}
Up and Running with gRPC & Cloud Career [GDG-Cloud-Dhaka-IO/2022}Up and Running with gRPC & Cloud Career [GDG-Cloud-Dhaka-IO/2022}
Up and Running with gRPC & Cloud Career [GDG-Cloud-Dhaka-IO/2022}Md. Sadhan Sarker
 
Service Mesh on Kubernetes with Istio
Service Mesh on Kubernetes with IstioService Mesh on Kubernetes with Istio
Service Mesh on Kubernetes with IstioMichelle Holley
 
KubeCon 2017: Kubernetes from Dev to Prod
KubeCon 2017: Kubernetes from Dev to ProdKubeCon 2017: Kubernetes from Dev to Prod
KubeCon 2017: Kubernetes from Dev to ProdSubhas Dandapani
 
Open shift and docker - october,2014
Open shift and docker - october,2014Open shift and docker - october,2014
Open shift and docker - october,2014Hojoong Kim
 
Using Istio to Secure & Monitor Your Services
Using Istio to Secure & Monitor Your ServicesUsing Istio to Secure & Monitor Your Services
Using Istio to Secure & Monitor Your ServicesAlcide
 
Building an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsBuilding an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsDigitalOcean
 
Resume_Appaji
Resume_AppajiResume_Appaji
Resume_AppajiAppaji K
 
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Puppet
 
OSDC 2019 | Democratizing Data at Go-JEK by Maulik Soneji
OSDC 2019 | Democratizing Data at Go-JEK by Maulik SonejiOSDC 2019 | Democratizing Data at Go-JEK by Maulik Soneji
OSDC 2019 | Democratizing Data at Go-JEK by Maulik SonejiNETWAYS
 
Distributed Systems: How to connect your real-time applications
Distributed Systems: How to connect your real-time applicationsDistributed Systems: How to connect your real-time applications
Distributed Systems: How to connect your real-time applicationsJaime Martin Losa
 
Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...
Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...
Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...Amazon Web Services
 
Tv and video on the Internet
Tv and video on the InternetTv and video on the Internet
Tv and video on the InternetDivante
 
Lyft talks #4 Orchestrating big data and ML pipelines at Lyft
Lyft talks #4 Orchestrating big data and ML pipelines at LyftLyft talks #4 Orchestrating big data and ML pipelines at Lyft
Lyft talks #4 Orchestrating big data and ML pipelines at LyftConstantine Slisenka
 
WLCG Grid Infrastructure Monitoring
WLCG Grid Infrastructure MonitoringWLCG Grid Infrastructure Monitoring
WLCG Grid Infrastructure MonitoringJames Casey
 

Similar to DevOps as a Contract (20)

Blockchin Architecture on Azure-Part-3
Blockchin Architecture on Azure-Part-3Blockchin Architecture on Azure-Part-3
Blockchin Architecture on Azure-Part-3
 
Modern Monitoring
Modern MonitoringModern Monitoring
Modern Monitoring
 
Cisco Automation with Puppet and onePK - PuppetConf 2013
Cisco Automation with Puppet and onePK - PuppetConf 2013Cisco Automation with Puppet and onePK - PuppetConf 2013
Cisco Automation with Puppet and onePK - PuppetConf 2013
 
CNCF Singapore - Introduction to Envoy
CNCF Singapore - Introduction to EnvoyCNCF Singapore - Introduction to Envoy
CNCF Singapore - Introduction to Envoy
 
Up and Running with gRPC & Cloud Career [GDG-Cloud-Dhaka-IO/2022}
Up and Running with gRPC & Cloud Career [GDG-Cloud-Dhaka-IO/2022}Up and Running with gRPC & Cloud Career [GDG-Cloud-Dhaka-IO/2022}
Up and Running with gRPC & Cloud Career [GDG-Cloud-Dhaka-IO/2022}
 
Service Mesh on Kubernetes with Istio
Service Mesh on Kubernetes with IstioService Mesh on Kubernetes with Istio
Service Mesh on Kubernetes with Istio
 
KubeCon 2017: Kubernetes from Dev to Prod
KubeCon 2017: Kubernetes from Dev to ProdKubeCon 2017: Kubernetes from Dev to Prod
KubeCon 2017: Kubernetes from Dev to Prod
 
Open shift and docker - october,2014
Open shift and docker - october,2014Open shift and docker - october,2014
Open shift and docker - october,2014
 
Using Istio to Secure & Monitor Your Services
Using Istio to Secure & Monitor Your ServicesUsing Istio to Secure & Monitor Your Services
Using Istio to Secure & Monitor Your Services
 
Building an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsBuilding an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult Steps
 
Resume_Appaji
Resume_AppajiResume_Appaji
Resume_Appaji
 
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
 
OSDC 2019 | Democratizing Data at Go-JEK by Maulik Soneji
OSDC 2019 | Democratizing Data at Go-JEK by Maulik SonejiOSDC 2019 | Democratizing Data at Go-JEK by Maulik Soneji
OSDC 2019 | Democratizing Data at Go-JEK by Maulik Soneji
 
Distributed Systems: How to connect your real-time applications
Distributed Systems: How to connect your real-time applicationsDistributed Systems: How to connect your real-time applications
Distributed Systems: How to connect your real-time applications
 
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
 
Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...
Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...
Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...
 
Tv and video on the Internet
Tv and video on the InternetTv and video on the Internet
Tv and video on the Internet
 
Lyft talks #4 Orchestrating big data and ML pipelines at Lyft
Lyft talks #4 Orchestrating big data and ML pipelines at LyftLyft talks #4 Orchestrating big data and ML pipelines at Lyft
Lyft talks #4 Orchestrating big data and ML pipelines at Lyft
 
WLCG Grid Infrastructure Monitoring
WLCG Grid Infrastructure MonitoringWLCG Grid Infrastructure Monitoring
WLCG Grid Infrastructure Monitoring
 
Cisco project ideas
Cisco   project ideasCisco   project ideas
Cisco project ideas
 

Recently uploaded

chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?Watsoo Telematics
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 

Recently uploaded (20)

chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 

DevOps as a Contract

  • 1. DevOps as a Contract Subhas Dandapani
  • 2. - Travel ticket search and booking platform - 15 countries - 100k+ destinations - 800+ partners and providers - 20m+ monthly visitors GoEuro
  • 3. 50 to 150+ engineers 10 to 300+ services How many changes can we put in the user’s hands in a week? Centralization bottlenecks “Throw over the wall” from dev to qa to ops Exponential growth Huge infrastructure QoS requirements Making best use of available resources Doing changes fast Scaling without breaking Log/metrics explosion Challenges - circa 2017 Infrastructure Architecture All forms of integration: REST, SOAP, gRPC, Event-driven, DB-driven, File-driven, Metrics-driven Scala, Java, NodeJS, Golang, Python, ... Geo, Routing, Scheduling, Ticketing, Big Data, ... Delivery / DevOps
  • 4. Probably only way forward Distribute infrastructure ownership
  • 6. Continuous Delivery is for infrastructure too!InfrastructureMaturity ServiceA ServiceB ServiceC ServiceD ServiceE ServiceF ServiceG ServiceH ServiceI ServiceJ ServiceK … … … …
  • 7. CI as a Tool - Before: Jenkins-as-a-Tool - Huge, complex CI jobs - Partially configured by application teams and DevOps, no clear boundaries - Several dedicated release managers who needed to maintain a "mind map" of releases - Engineers having to ping DevOps teams for releases - Many CI plugins were installed for different teams - Every Jenkins or Jenkins-plugin upgrade broke random jobs - Agent configuration, auto-scaling, and job execution were also problematic - Tried JenkinsFile - but still, Jenkins as a tool! - Lots of copy-pasted config, shared functions, inability to change things as a whole from outside, code injection?, etc.
  • 8. Jenkinsfile - Inability to instrument jobs as a whole and add global shared behavior - Cannot parse code and modify AST tree - Lots of copy-pasted config, shared functions - Inability to do continuous changes on those functions - Inability to prevent tie-in with internal plugins - It’s still Jenkins-as-a-tool
  • 9. - Should be semantically understandable and instrumentable - Adopt a job definition contract - Dots (Isolated Jobs), Pipelines (Lists), or Graphs, just pick one and adopt a YAML contract CI as a Contract
  • 10. image: jenkins-plain:latest unit-tests: stage: test script: test.sh master: script: release.sh deploy-qa: environment: name: qa script: deploy.sh deploy-preprod: deploy-prod: ... CI as a Contract
  • 11. - Specify pipeline contract with container image, scripts, checkpoints, etc. - We take care of the implementation - Team adds everything else on top of this file - Build notifications, Caching, Agent allocation, Autoscaling, Analytics, Organizational context, Auditing, etc. CI as a Contract
  • 12. CI as a Contract
  • 13. - Minimal stack that we needed for every service - Artifact (JAR/WAR/Docker image/etc) - Service (shell script/initscript/systemd unit/docker container/etc) - + Supporting services (watchdogs, ancillary utilities, etc) - Configuration for different environments - Multiple Instances of the stack - Connected to traffic (networking, firewall, load balancer) - Stack as a unit - Let’s worry about post-deployment activities next Configuration Management
  • 14. - Fulfilling the minimal package - Distribute chef cookbooks + librarian chef + chef apply - Or ansible playbooks + ansible galaxy + ansible apply - Or puppet/salt/... - Distribute terraform with strict credentials + terraform apply - Distribute <X> container orchestrator + apply - Distribute Kubernetes resources + kubectl apply - Core difference between these tools? Configuration Management
  • 15. Kubernetes as a Contract Instance = Podspec Running unit = Container spec Multiple Instances = Deployment spec Configuration = ConfigMap spec Traffic = Service, Ingress spec All modeled as JSON or YAML, but has a standard contract/spec kind: Deployment metadata: name: {{ .Values.name }} namespace: {{ .Values.name }} spec: replicas: 5 … … … containers: - name: my-app image: my-container:latest resources: requests: cpu: … memory: … limits: cpu: … memory: … livenessProbe: httpGet: path: /_system/health env: ...
  • 16. Application Stack Kubernetes API Server Open, Secure HTTPS Protocol Kubernetes responds to fulfil what has been applied Artifact = Docker image Instance = Pod Running unit = Container Multiple Instances = Deployment Configuration = ConfigMap Traffic = Service, Ingress All modeled as JSON or YAML, but has a standard Resource spec kubectl apply
  • 17. API Proxy Validation, Linting, Org-wide standards, etc. Application Stack Stack spec kubectl apply Kubernetes API
  • 18. Kubernetes as a Contract - Health checks must exist - CPU/memory must exist - Images must not be external - Entrypoint is for us, Script is for app - No alpha/beta stuff - Whitelisted resources, separate stateful & stateless clusters - Minimum 2 replicas - Similar for all resources kind: Deployment metadata: name: {{ .Values.name }} namespace: {{ .Values.name }} spec: replicas: 5 … … … containers: - name: my-app image: my-container:latest resources: requests: cpu: … memory: … limits: cpu: … memory: … livenessProbe: httpGet: path: /_system/health env: ...
  • 19. API Proxy Validation, Linting, Org-wide standards, etc. Cloud Resources as a Contract Model your own contract e.g. Cloud Bucket as a YAML kubectl apply Kubernetes API / Custom controllers
  • 20. Application Stack - Using Kubernetes since 1.2 and on 1.10 right now - Avoiding kubernetes API maze - “src” and “ops” in every repository, completely self-contained - Kubernetes upgrades go exactly as planned as we know what workloads are running, and how to orchestrate/change the workloads - Multiple features and standards rolled out to everyone who uses kubernetes clusters - Stateful and Stateless clusters separate from Day 1 - Not using kubernetes/helm as a tool, but as a contract
  • 22. Logging as a Tool - Logstash as a tool - Hundreds of custom logstash transformations, pipelines, ports - Snowflake configs for different fields for each service - Slow, ticketed bootstrap process for new services - Explosion of indices - Multiple different ways to push logs from applications
  • 23. Logging as a Contract - Print logs on STDOUT, and they will come up on Kibana - If it’s JSON, you get structured fields - If it’s plain, you get plain message - Standard enrichment rules to avoid stepping on each other’s toes: - <field>_i = integer, <field>_s = string, <field>_geo = geo with lat/long, <field>_txt = text, etc. - No application-specific code in logstash anymore - Everything else taken care by team - Scaling, Rotation, Retention, etc.
  • 24. Routing as a Tool - Started with one router handling all traffic, hardcoded service discovery - As more services grew, more custom and inconsistent routing rules - Nested rewrites and redirects, and randomly captured URL paths - Most services had to carry custom nginx forwarders - Proxies, then proxies-inside-proxies, etc. - Unmanageable routing graph - Complicated procedure to setup a new application - Monitoring/logs was a different problem altogether
  • 25. - Adopted Ingress as a Contract - Every service was assigned fixed route based on namespace - Consistent and predictable routing - Team adds value on top of the contract - Automatic logging and monitoring - Global health and SLA checks - Extensive instrumentation and tracing - Scalability - Load balancing - Edge gateways - Cross-zone failovers - Dashboards and network policies Routing as a Contract
  • 26. Monitoring as a Contract - Only tool-driven area in organization - Prometheus as a Contract is great - But we have some way to go (clustering, sharding, etc)
  • 27. - All containers auto-injected with secrets - Non-org images blacklisted in the API proxy - GDPR as a contract - ... - In all the cases, tool is secondary, contract is discussed and agreed upon first Security as a Contract
  • 29. Continuous delivery for operationsInfrastructureMaturity ServiceA ServiceB ServiceC ServiceD ServiceE ServiceF ServiceG ServiceH ServiceI ServiceJ ServiceK … … … …
  • 30. Lessons - DevOps Team is another core engineering team providing services that applications integrate with - We provide contracts, and service implementations that fulfil that contract - Have time to innovate and add value on top instead of handling tickets - Wherever we added heavy tests around the contract with mock applications, infrastructure quality went up - Wherever possible, we dogfood the contract to ourselves - Discuss hard before agreeing on a contract, and then go deliver - Keep it simple/small, semantic, instrumentable and usable from dev machines to prod - Adopt and reduce API surface of a mature industry contract wherever possible to avoid re-design from scratch - Not universally applicable, implementation of tool still matters - Continuously upgrade infrastructure and good UX for engineers