SlideShare a Scribd company logo
Introducing Envoy-based
service mesh at
Booking.com
Ivan Kruglov
KubeCon Europe 2018
02.05.2018
based in Amsterdam
28M listings
1,5M room nights per day
1700 IT staff
Agenda
• what is service mesh?
• why did we start the project?
• our setup
• learnings
• conclusion
What is
service mesh*?
* the way I understand it
application provider
service consumer service provider
application provider
service consumer service provider
communication infra
application provider
service consumer service provider
proxy proxy
control plane
Why did we start
service mesh
project?
monolith
service
service
service service service
service
service
service
service
service
service
Consistency & Visibility
in communications
Our setup
application
provider
proxy
service consumer service provider
data plane
HTTP(S)
HTTP
• graceful restart
• TCP proxy
application
provider
envoy
service consumer service provider
data plane
HTTP(S)
HTTP
control plane
control plane
• routing rules
• timeout/retry/etc policies
• service discovery
control plane
• in-house (v1/v2 API)
• started in August 2017; Istio is around 0.2
• one complex system at a time
• start with bare-metal support
• minimal abstraction
• yup, just write Envoy config (partially)
• fine with Envoy’s set of features
• self-service for service owners
application
provider
envoy
service consumer service provider
data plane
HTTP(S)
HTTP
control plane
control plane
• routing rules
• timeout/retry/etc policies
• service discovery
ZooKeeper
(service discovery)
Kubernetes
(service discovery)
ZooKeeper
(configuration)
configuration
• Envoy configuration is quite rich
• power vs. usability:
• cluster specification
• virtual host specification
kind: VirtualHostSpec
spec:
name: api.vhost
domains: ["api.service"]
routes:
- match:
prefix: /
route:
cluster: api.prod.cluster
timeout: 10s
retry_policy:
retry_on: 5xx
num_retries: 3
per_try_timeout: 3s
kind: ClusterSpec
metadata:
annotations:
watcher_name: zookeeper.watcher
paths: ["/pools/api/dc"]
spec:
name: api.prod.cluster
lb_policy: LEAST_REQUEST
connect_timeout: 1s
lb_subset_config:
subset_selectors:
- keys: ["DC"]
- keys: ["IsLocal"]
fallback_policy: DEFAULT_SUBSET
default_subset:
IsLocal: true
kind: VirtualHostSpec
spec:
name: api.vhost
domains: ["api.service"]
routes:
- match:
prefix: /
route:
cluster: api.prod.cluster
timeout: 10s
retry_policy:
retry_on: 5xx
num_retries: 3
per_try_timeout: 3s
kind: ClusterSpec
metadata:
annotations:
watcher_name: zookeeper.watcher
paths: ["/pools/api/dc"]
spec:
name: api.prod.cluster
lb_policy: LEAST_REQUEST
connect_timeout: 1s
lb_subset_config:
subset_selectors:
- keys: ["DC"]
- keys: ["IsLocal"]
fallback_policy: DEFAULT_SUBSET
default_subset:
IsLocal: true
kind: VirtualHostSpec
spec:
name: api.vhost
domains: ["api.service"]
routes:
- match:
prefix: /
route:
cluster: api.prod.cluster
timeout: 10s
retry_policy:
retry_on: 5xx
num_retries: 3
per_try_timeout: 3s
kind: ClusterSpec
metadata:
annotations:
watcher_name: zookeeper.watcher
paths: ["/pools/api/dc"]
spec:
name: api.prod.cluster
lb_policy: LEAST_REQUEST
connect_timeout: 1s
lb_subset_config:
subset_selectors:
- keys: ["DC"]
- keys: ["IsLocal"]
fallback_policy: DEFAULT_SUBSET
default_subset:
IsLocal: true
kind: VirtualHostSpec
spec:
name: api.vhost
domains: ["api.service"]
routes:
- match:
prefix: /
route:
cluster: api.prod.cluster
timeout: 10s
retry_policy:
retry_on: 5xx
num_retries: 3
per_try_timeout: 3s
kind: ClusterSpec
metadata:
annotations:
watcher_name: zookeeper.watcher
paths: ["/pools/api/dc"]
spec:
name: api.prod.cluster
lb_policy: LEAST_REQUEST
connect_timeout: 1s
lb_subset_config:
subset_selectors:
- keys: ["DC"]
- keys: ["IsLocal"]
fallback_policy: DEFAULT_SUBSET
default_subset:
IsLocal: true
kind: VirtualHostSpec
spec:
name: api.vhost
domains: ["api.service"]
routes:
- match:
prefix: /
route:
cluster: api.prod.cluster
timeout: 10s
retry_policy:
retry_on: 5xx
num_retries: 3
per_try_timeout: 3s
kind: ClusterSpec
metadata:
annotations:
watcher_name: zookeeper.watcher
paths: ["/pools/api/dc"]
spec:
name: api.prod.cluster
lb_policy: LEAST_REQUEST
connect_timeout: 1s
lb_subset_config:
subset_selectors:
- keys: ["DC"]
- keys: ["IsLocal"]
fallback_policy: DEFAULT_SUBSET
default_subset:
IsLocal: true
envoy cluster spec envoy virtual host spec
kind: VirtualHostSpec
spec:
name: api.vhost
domains: ["api.service"]
routes:
- match:
prefix: /
route:
cluster: api.prod.cluster
timeout: 10s
retry_policy:
retry_on: 5xx
num_retries: 3
per_try_timeout: 3s
kind: ClusterSpec
metadata:
annotations:
watcher_name: zookeeper.watcher
paths: ["/pools/api/dc"]
spec:
name: api.prod.cluster
lb_policy: LEAST_REQUEST
connect_timeout: 1s
lb_subset_config:
subset_selectors:
- keys: ["DC"]
- keys: ["IsLocal"]
fallback_policy: DEFAULT_SUBSET
default_subset:
IsLocal: true
envoy cluster spec envoy virtual host spec
bootstrap wizard
control plane
ZooKeeper
VC VC VC
service A
expose to service D
service C
expose to service Dservice B
envoy
I’m service D
here is service A
and service C specs
application
provider
envoy
service consumer service provider
data plane
HTTP(S)
HTTP
control plane
control plane
• routing rules
• timeout/retry/etc policies
• service discovery
ZooKeeper
(service discovery)
Kubernetes
(service discovery)
ZooKeeper
(configuration)
configuration
changes
Deploying changes
• integration with existent infrastructure
• git-deploy
• vanguard (canary) deployment
• syntax and semantic validation
• fast rollback
application
provider
envoy
service consumer service provider
data plane
HTTP(S)
HTTP
control plane
control plane
• routing rules
• timeout/retry/etc policies
• service discovery
ZooKeeper
(service discovery)
ZooKeeper
(configuration)
configuration
changes
metrics
Kubernetes
(service discovery)
Monitoring
• approach #1 – graphite
• excellent in-house facilities
• aggregations is too heavy at scale
• approach #2 – prometheus
• statsd support
• statsd_exporter
• still iterating
• standard dashboard
envoy
statsd_exporter
envoy…
application
provider
envoy
service consumer service provider
data plane
HTTP(S)
HTTP
control plane
control plane
• routing rules
• timeout/retry/etc policies
• service discovery
ZooKeeper
(service discovery)
ZooKeeper
(configuration)
configuration
changes
metrics
Kubernetes
(service discovery)
Production
Some numbers
• ~ 6 months in production
• ~ 30 projects
• ~ 10K servers
• hundreds of thousands RPS
• overheads…
p50
p75 p90
p95
p99
HTTP
perceived latency
0ms
5ms
10ms
15ms
+ 1ms
HTTPS
perceived latency
p50
p75 p90
p95
p99
10ms
+ 1ms
15ms
20ms
25ms
30ms
40ms
Some findings
• graceful restart works, but not always!
• (major) Envoy upgrades may not be graceful
• ”x-envoy-retry-on = connect-failure” – too few
“x-envoy-retry-on = 5xx” – too much
• gateway-error – HTTP 502, 503, 504
• absence of TCP keepalive => stale connections
• coming in 1.7
• idle_timeout in TCP proxy in 1.6
• cluster_name (1.5) -> cluster_names (1.6)
Conclusion
service mesh* is
a building block on the path to SOA
technically & organizationally
* the way I understand it
Andrei Vereha
Envoy production stories
Friday, May 4th 14:00
Envoy Deep Dive
Thank you!
Ivan Kruglov
ivan.kruglov@booking.com

More Related Content

What's hot

Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Databricks
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
Rishabh Indoria
 
IBM RedHat OCP Vs xKS.pptx
IBM RedHat OCP Vs xKS.pptxIBM RedHat OCP Vs xKS.pptx
IBM RedHat OCP Vs xKS.pptx
ssuser666667
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Skillspeed
 
Redis overview for Software Architecture Forum
Redis overview for Software Architecture ForumRedis overview for Software Architecture Forum
Redis overview for Software Architecture Forum
Christopher Spring
 
Introducing MongoDB Atlas
Introducing MongoDB AtlasIntroducing MongoDB Atlas
Introducing MongoDB Atlas
MongoDB
 
Redis 101
Redis 101Redis 101
Redis 101
Geoff Hoffman
 
OpenStack @ Workday - CI/CD
OpenStack @ Workday - CI/CDOpenStack @ Workday - CI/CD
OpenStack @ Workday - CI/CD
Edgar Magana
 
Kubernetes Architecture and Introduction
Kubernetes Architecture and IntroductionKubernetes Architecture and Introduction
Kubernetes Architecture and Introduction
Stefan Schimanski
 
k8s practice 2023.pptx
k8s practice 2023.pptxk8s practice 2023.pptx
k8s practice 2023.pptx
wonyong hwang
 
MongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To TransactionsMongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To Transactions
Mydbops
 
Rancher MasterClass - Avoiding-configuration-drift.pptx
Rancher  MasterClass - Avoiding-configuration-drift.pptxRancher  MasterClass - Avoiding-configuration-drift.pptx
Rancher MasterClass - Avoiding-configuration-drift.pptx
LibbySchulze
 
Kubernetes Introduction
Kubernetes IntroductionKubernetes Introduction
Kubernetes Introduction
Red Hat Developers
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
lucenerevolution
 
Docker and Kubernetes 101 workshop
Docker and Kubernetes 101 workshopDocker and Kubernetes 101 workshop
Docker and Kubernetes 101 workshop
Sathish VJ
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
Gabriel Carro
 
Using Rook to Manage Kubernetes Storage with Ceph
Using Rook to Manage Kubernetes Storage with CephUsing Rook to Manage Kubernetes Storage with Ceph
Using Rook to Manage Kubernetes Storage with Ceph
CloudOps2005
 
Understanding Kubernetes
Understanding KubernetesUnderstanding Kubernetes
Understanding Kubernetes
Tu Pham
 
Kubernetes
KubernetesKubernetes
Kubernetes
erialc_w
 
Learning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline
Learning Rust the Hard Way for a Production Kafka + ScyllaDB PipelineLearning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline
Learning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline
ScyllaDB
 

What's hot (20)

Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
IBM RedHat OCP Vs xKS.pptx
IBM RedHat OCP Vs xKS.pptxIBM RedHat OCP Vs xKS.pptx
IBM RedHat OCP Vs xKS.pptx
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
 
Redis overview for Software Architecture Forum
Redis overview for Software Architecture ForumRedis overview for Software Architecture Forum
Redis overview for Software Architecture Forum
 
Introducing MongoDB Atlas
Introducing MongoDB AtlasIntroducing MongoDB Atlas
Introducing MongoDB Atlas
 
Redis 101
Redis 101Redis 101
Redis 101
 
OpenStack @ Workday - CI/CD
OpenStack @ Workday - CI/CDOpenStack @ Workday - CI/CD
OpenStack @ Workday - CI/CD
 
Kubernetes Architecture and Introduction
Kubernetes Architecture and IntroductionKubernetes Architecture and Introduction
Kubernetes Architecture and Introduction
 
k8s practice 2023.pptx
k8s practice 2023.pptxk8s practice 2023.pptx
k8s practice 2023.pptx
 
MongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To TransactionsMongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To Transactions
 
Rancher MasterClass - Avoiding-configuration-drift.pptx
Rancher  MasterClass - Avoiding-configuration-drift.pptxRancher  MasterClass - Avoiding-configuration-drift.pptx
Rancher MasterClass - Avoiding-configuration-drift.pptx
 
Kubernetes Introduction
Kubernetes IntroductionKubernetes Introduction
Kubernetes Introduction
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
Docker and Kubernetes 101 workshop
Docker and Kubernetes 101 workshopDocker and Kubernetes 101 workshop
Docker and Kubernetes 101 workshop
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
Using Rook to Manage Kubernetes Storage with Ceph
Using Rook to Manage Kubernetes Storage with CephUsing Rook to Manage Kubernetes Storage with Ceph
Using Rook to Manage Kubernetes Storage with Ceph
 
Understanding Kubernetes
Understanding KubernetesUnderstanding Kubernetes
Understanding Kubernetes
 
Kubernetes
KubernetesKubernetes
Kubernetes
 
Learning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline
Learning Rust the Hard Way for a Production Kafka + ScyllaDB PipelineLearning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline
Learning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline
 

Similar to Introducing envoy-based service mesh at Booking.com

Managing Microservices With The Istio Service Mesh on Kubernetes
Managing Microservices With The Istio Service Mesh on KubernetesManaging Microservices With The Istio Service Mesh on Kubernetes
Managing Microservices With The Istio Service Mesh on Kubernetes
Iftach Schonbaum
 
Canadian CNCF: "Emissary-ingress 101: An introduction to the CNCF incubation-...
Canadian CNCF: "Emissary-ingress 101: An introduction to the CNCF incubation-...Canadian CNCF: "Emissary-ingress 101: An introduction to the CNCF incubation-...
Canadian CNCF: "Emissary-ingress 101: An introduction to the CNCF incubation-...
Daniel Bryant
 
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & MobileIVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
Amazon Web Services Japan
 
Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...
Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...
Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...
Amazon Web Services
 
F5 Meetup presentation automation 2017
F5 Meetup presentation automation 2017F5 Meetup presentation automation 2017
F5 Meetup presentation automation 2017
Guy Brown
 
Load Balancing in the Cloud using Nginx & Kubernetes
Load Balancing in the Cloud using Nginx & KubernetesLoad Balancing in the Cloud using Nginx & Kubernetes
Load Balancing in the Cloud using Nginx & Kubernetes
Lee Calcote
 
DCEU 18: Docker Container Networking
DCEU 18: Docker Container NetworkingDCEU 18: Docker Container Networking
DCEU 18: Docker Container Networking
Docker, Inc.
 
Exploring Twitter's Finagle technology stack for microservices
Exploring Twitter's Finagle technology stack for microservicesExploring Twitter's Finagle technology stack for microservices
Exploring Twitter's Finagle technology stack for microservices
💡 Tomasz Kogut
 
Working with PowerVC via its REST APIs
Working with PowerVC via its REST APIsWorking with PowerVC via its REST APIs
Working with PowerVC via its REST APIs
Joe Cropper
 
OpenShift Meetup - Tokyo - Service Mesh and Serverless Overview
OpenShift Meetup - Tokyo - Service Mesh and Serverless OverviewOpenShift Meetup - Tokyo - Service Mesh and Serverless Overview
OpenShift Meetup - Tokyo - Service Mesh and Serverless Overview
María Angélica Bracho
 
REST APIs
REST APIsREST APIs
Microservice bus tutorial
Microservice bus tutorialMicroservice bus tutorial
Microservice bus tutorial
Huabing Zhao
 
Monitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSMonitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECS
Amazon Web Services
 
DEVNET-1128 Cisco Intercloud Fabric NB Api's for Business & Providers
DEVNET-1128	Cisco Intercloud Fabric NB Api's for Business & ProvidersDEVNET-1128	Cisco Intercloud Fabric NB Api's for Business & Providers
DEVNET-1128 Cisco Intercloud Fabric NB Api's for Business & Providers
Cisco DevNet
 
Service Discovery using etcd, Consul and Kubernetes
Service Discovery using etcd, Consul and KubernetesService Discovery using etcd, Consul and Kubernetes
Service Discovery using etcd, Consul and Kubernetes
Sreenivas Makam
 
DCUS17 : Docker networking deep dive
DCUS17 : Docker networking deep diveDCUS17 : Docker networking deep dive
DCUS17 : Docker networking deep dive
Madhu Venugopal
 
Web services - A Practical Approach
Web services - A Practical ApproachWeb services - A Practical Approach
Web services - A Practical Approach
Madhaiyan Muthu
 
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ... The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
Josef Adersberger
 
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
QAware GmbH
 
Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015
Pavel Chunyayev
 

Similar to Introducing envoy-based service mesh at Booking.com (20)

Managing Microservices With The Istio Service Mesh on Kubernetes
Managing Microservices With The Istio Service Mesh on KubernetesManaging Microservices With The Istio Service Mesh on Kubernetes
Managing Microservices With The Istio Service Mesh on Kubernetes
 
Canadian CNCF: "Emissary-ingress 101: An introduction to the CNCF incubation-...
Canadian CNCF: "Emissary-ingress 101: An introduction to the CNCF incubation-...Canadian CNCF: "Emissary-ingress 101: An introduction to the CNCF incubation-...
Canadian CNCF: "Emissary-ingress 101: An introduction to the CNCF incubation-...
 
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & MobileIVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
 
Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...
Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...
Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...
 
F5 Meetup presentation automation 2017
F5 Meetup presentation automation 2017F5 Meetup presentation automation 2017
F5 Meetup presentation automation 2017
 
Load Balancing in the Cloud using Nginx & Kubernetes
Load Balancing in the Cloud using Nginx & KubernetesLoad Balancing in the Cloud using Nginx & Kubernetes
Load Balancing in the Cloud using Nginx & Kubernetes
 
DCEU 18: Docker Container Networking
DCEU 18: Docker Container NetworkingDCEU 18: Docker Container Networking
DCEU 18: Docker Container Networking
 
Exploring Twitter's Finagle technology stack for microservices
Exploring Twitter's Finagle technology stack for microservicesExploring Twitter's Finagle technology stack for microservices
Exploring Twitter's Finagle technology stack for microservices
 
Working with PowerVC via its REST APIs
Working with PowerVC via its REST APIsWorking with PowerVC via its REST APIs
Working with PowerVC via its REST APIs
 
OpenShift Meetup - Tokyo - Service Mesh and Serverless Overview
OpenShift Meetup - Tokyo - Service Mesh and Serverless OverviewOpenShift Meetup - Tokyo - Service Mesh and Serverless Overview
OpenShift Meetup - Tokyo - Service Mesh and Serverless Overview
 
REST APIs
REST APIsREST APIs
REST APIs
 
Microservice bus tutorial
Microservice bus tutorialMicroservice bus tutorial
Microservice bus tutorial
 
Monitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSMonitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECS
 
DEVNET-1128 Cisco Intercloud Fabric NB Api's for Business & Providers
DEVNET-1128	Cisco Intercloud Fabric NB Api's for Business & ProvidersDEVNET-1128	Cisco Intercloud Fabric NB Api's for Business & Providers
DEVNET-1128 Cisco Intercloud Fabric NB Api's for Business & Providers
 
Service Discovery using etcd, Consul and Kubernetes
Service Discovery using etcd, Consul and KubernetesService Discovery using etcd, Consul and Kubernetes
Service Discovery using etcd, Consul and Kubernetes
 
DCUS17 : Docker networking deep dive
DCUS17 : Docker networking deep diveDCUS17 : Docker networking deep dive
DCUS17 : Docker networking deep dive
 
Web services - A Practical Approach
Web services - A Practical ApproachWeb services - A Practical Approach
Web services - A Practical Approach
 
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ... The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
 
Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015
 

More from Ivan Kruglov

SRE: Site Reliability Engineering
SRE: Site Reliability EngineeringSRE: Site Reliability Engineering
SRE: Site Reliability Engineering
Ivan Kruglov
 
Blue-green & canary deployments
Blue-green & canary deploymentsBlue-green & canary deployments
Blue-green & canary deployments
Ivan Kruglov
 
Обратная сторона сервис-ориентированной архитектуры
Обратная сторона сервис-ориентированной архитектурыОбратная сторона сервис-ориентированной архитектуры
Обратная сторона сервис-ориентированной архитектуры
Ivan Kruglov
 
Kubernetes в Booking.com
Kubernetes в Booking.comKubernetes в Booking.com
Kubernetes в Booking.com
Ivan Kruglov
 
Тернии контейнеризованных приложений и микросервисов
Тернии контейнеризованных приложений и микросервисовТернии контейнеризованных приложений и микросервисов
Тернии контейнеризованных приложений и микросервисов
Ivan Kruglov
 
Service mesh для микросервисов
Service mesh для микросервисовService mesh для микросервисов
Service mesh для микросервисов
Ivan Kruglov
 
SOA: Строим свой service mesh
SOA: Строим свой service meshSOA: Строим свой service mesh
SOA: Строим свой service mesh
Ivan Kruglov
 
Solving some of the scalability problems at booking.com
Solving some of the scalability problems at booking.comSolving some of the scalability problems at booking.com
Solving some of the scalability problems at booking.com
Ivan Kruglov
 
Sereal: a view from inside
Sereal: a view from insideSereal: a view from inside
Sereal: a view from inside
Ivan Kruglov
 
SOA: послать запрос на сервер? Что может быть проще?!
SOA: послать запрос на сервер? Что может быть проще?!SOA: послать запрос на сервер? Что может быть проще?!
SOA: послать запрос на сервер? Что может быть проще?!
Ivan Kruglov
 
Мониторинг, когда не тестируешь
Мониторинг, когда не тестируешьМониторинг, когда не тестируешь
Мониторинг, когда не тестируешь
Ivan Kruglov
 
Архитектура поиска в Booking.com
Архитектура поиска в Booking.comАрхитектура поиска в Booking.com
Архитектура поиска в Booking.com
Ivan Kruglov
 
Processing JSON messages in highspeed
Processing JSON messages in highspeedProcessing JSON messages in highspeed
Processing JSON messages in highspeed
Ivan Kruglov
 
Bringing code to the data: from MySQL to RocksDB for high volume searches
Bringing code to the data: from MySQL to RocksDB for high volume searchesBringing code to the data: from MySQL to RocksDB for high volume searches
Bringing code to the data: from MySQL to RocksDB for high volume searches
Ivan Kruglov
 
Optimize sereal
Optimize serealOptimize sereal
Optimize sereal
Ivan Kruglov
 
Sereal and its tooling
Sereal and its toolingSereal and its tooling
Sereal and its tooling
Ivan Kruglov
 

More from Ivan Kruglov (16)

SRE: Site Reliability Engineering
SRE: Site Reliability EngineeringSRE: Site Reliability Engineering
SRE: Site Reliability Engineering
 
Blue-green & canary deployments
Blue-green & canary deploymentsBlue-green & canary deployments
Blue-green & canary deployments
 
Обратная сторона сервис-ориентированной архитектуры
Обратная сторона сервис-ориентированной архитектурыОбратная сторона сервис-ориентированной архитектуры
Обратная сторона сервис-ориентированной архитектуры
 
Kubernetes в Booking.com
Kubernetes в Booking.comKubernetes в Booking.com
Kubernetes в Booking.com
 
Тернии контейнеризованных приложений и микросервисов
Тернии контейнеризованных приложений и микросервисовТернии контейнеризованных приложений и микросервисов
Тернии контейнеризованных приложений и микросервисов
 
Service mesh для микросервисов
Service mesh для микросервисовService mesh для микросервисов
Service mesh для микросервисов
 
SOA: Строим свой service mesh
SOA: Строим свой service meshSOA: Строим свой service mesh
SOA: Строим свой service mesh
 
Solving some of the scalability problems at booking.com
Solving some of the scalability problems at booking.comSolving some of the scalability problems at booking.com
Solving some of the scalability problems at booking.com
 
Sereal: a view from inside
Sereal: a view from insideSereal: a view from inside
Sereal: a view from inside
 
SOA: послать запрос на сервер? Что может быть проще?!
SOA: послать запрос на сервер? Что может быть проще?!SOA: послать запрос на сервер? Что может быть проще?!
SOA: послать запрос на сервер? Что может быть проще?!
 
Мониторинг, когда не тестируешь
Мониторинг, когда не тестируешьМониторинг, когда не тестируешь
Мониторинг, когда не тестируешь
 
Архитектура поиска в Booking.com
Архитектура поиска в Booking.comАрхитектура поиска в Booking.com
Архитектура поиска в Booking.com
 
Processing JSON messages in highspeed
Processing JSON messages in highspeedProcessing JSON messages in highspeed
Processing JSON messages in highspeed
 
Bringing code to the data: from MySQL to RocksDB for high volume searches
Bringing code to the data: from MySQL to RocksDB for high volume searchesBringing code to the data: from MySQL to RocksDB for high volume searches
Bringing code to the data: from MySQL to RocksDB for high volume searches
 
Optimize sereal
Optimize serealOptimize sereal
Optimize sereal
 
Sereal and its tooling
Sereal and its toolingSereal and its tooling
Sereal and its tooling
 

Recently uploaded

How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 

Recently uploaded (20)

How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 

Introducing envoy-based service mesh at Booking.com

  • 1. Introducing Envoy-based service mesh at Booking.com Ivan Kruglov KubeCon Europe 2018 02.05.2018
  • 2. based in Amsterdam 28M listings 1,5M room nights per day 1700 IT staff
  • 3. Agenda • what is service mesh? • why did we start the project? • our setup • learnings • conclusion
  • 4. What is service mesh*? * the way I understand it
  • 6. application provider service consumer service provider communication infra
  • 7. application provider service consumer service provider proxy proxy control plane
  • 8. Why did we start service mesh project?
  • 11.
  • 12.
  • 13. Consistency & Visibility in communications
  • 15. application provider proxy service consumer service provider data plane HTTP(S) HTTP
  • 17. application provider envoy service consumer service provider data plane HTTP(S) HTTP control plane control plane • routing rules • timeout/retry/etc policies • service discovery
  • 18. control plane • in-house (v1/v2 API) • started in August 2017; Istio is around 0.2 • one complex system at a time • start with bare-metal support • minimal abstraction • yup, just write Envoy config (partially) • fine with Envoy’s set of features • self-service for service owners
  • 19. application provider envoy service consumer service provider data plane HTTP(S) HTTP control plane control plane • routing rules • timeout/retry/etc policies • service discovery ZooKeeper (service discovery) Kubernetes (service discovery) ZooKeeper (configuration)
  • 20. configuration • Envoy configuration is quite rich • power vs. usability: • cluster specification • virtual host specification
  • 21. kind: VirtualHostSpec spec: name: api.vhost domains: ["api.service"] routes: - match: prefix: / route: cluster: api.prod.cluster timeout: 10s retry_policy: retry_on: 5xx num_retries: 3 per_try_timeout: 3s kind: ClusterSpec metadata: annotations: watcher_name: zookeeper.watcher paths: ["/pools/api/dc"] spec: name: api.prod.cluster lb_policy: LEAST_REQUEST connect_timeout: 1s lb_subset_config: subset_selectors: - keys: ["DC"] - keys: ["IsLocal"] fallback_policy: DEFAULT_SUBSET default_subset: IsLocal: true
  • 22. kind: VirtualHostSpec spec: name: api.vhost domains: ["api.service"] routes: - match: prefix: / route: cluster: api.prod.cluster timeout: 10s retry_policy: retry_on: 5xx num_retries: 3 per_try_timeout: 3s kind: ClusterSpec metadata: annotations: watcher_name: zookeeper.watcher paths: ["/pools/api/dc"] spec: name: api.prod.cluster lb_policy: LEAST_REQUEST connect_timeout: 1s lb_subset_config: subset_selectors: - keys: ["DC"] - keys: ["IsLocal"] fallback_policy: DEFAULT_SUBSET default_subset: IsLocal: true
  • 23. kind: VirtualHostSpec spec: name: api.vhost domains: ["api.service"] routes: - match: prefix: / route: cluster: api.prod.cluster timeout: 10s retry_policy: retry_on: 5xx num_retries: 3 per_try_timeout: 3s kind: ClusterSpec metadata: annotations: watcher_name: zookeeper.watcher paths: ["/pools/api/dc"] spec: name: api.prod.cluster lb_policy: LEAST_REQUEST connect_timeout: 1s lb_subset_config: subset_selectors: - keys: ["DC"] - keys: ["IsLocal"] fallback_policy: DEFAULT_SUBSET default_subset: IsLocal: true
  • 24. kind: VirtualHostSpec spec: name: api.vhost domains: ["api.service"] routes: - match: prefix: / route: cluster: api.prod.cluster timeout: 10s retry_policy: retry_on: 5xx num_retries: 3 per_try_timeout: 3s kind: ClusterSpec metadata: annotations: watcher_name: zookeeper.watcher paths: ["/pools/api/dc"] spec: name: api.prod.cluster lb_policy: LEAST_REQUEST connect_timeout: 1s lb_subset_config: subset_selectors: - keys: ["DC"] - keys: ["IsLocal"] fallback_policy: DEFAULT_SUBSET default_subset: IsLocal: true
  • 25. kind: VirtualHostSpec spec: name: api.vhost domains: ["api.service"] routes: - match: prefix: / route: cluster: api.prod.cluster timeout: 10s retry_policy: retry_on: 5xx num_retries: 3 per_try_timeout: 3s kind: ClusterSpec metadata: annotations: watcher_name: zookeeper.watcher paths: ["/pools/api/dc"] spec: name: api.prod.cluster lb_policy: LEAST_REQUEST connect_timeout: 1s lb_subset_config: subset_selectors: - keys: ["DC"] - keys: ["IsLocal"] fallback_policy: DEFAULT_SUBSET default_subset: IsLocal: true envoy cluster spec envoy virtual host spec
  • 26. kind: VirtualHostSpec spec: name: api.vhost domains: ["api.service"] routes: - match: prefix: / route: cluster: api.prod.cluster timeout: 10s retry_policy: retry_on: 5xx num_retries: 3 per_try_timeout: 3s kind: ClusterSpec metadata: annotations: watcher_name: zookeeper.watcher paths: ["/pools/api/dc"] spec: name: api.prod.cluster lb_policy: LEAST_REQUEST connect_timeout: 1s lb_subset_config: subset_selectors: - keys: ["DC"] - keys: ["IsLocal"] fallback_policy: DEFAULT_SUBSET default_subset: IsLocal: true envoy cluster spec envoy virtual host spec bootstrap wizard
  • 27. control plane ZooKeeper VC VC VC service A expose to service D service C expose to service Dservice B envoy I’m service D here is service A and service C specs
  • 28. application provider envoy service consumer service provider data plane HTTP(S) HTTP control plane control plane • routing rules • timeout/retry/etc policies • service discovery ZooKeeper (service discovery) Kubernetes (service discovery) ZooKeeper (configuration) configuration changes
  • 29. Deploying changes • integration with existent infrastructure • git-deploy • vanguard (canary) deployment • syntax and semantic validation • fast rollback
  • 30. application provider envoy service consumer service provider data plane HTTP(S) HTTP control plane control plane • routing rules • timeout/retry/etc policies • service discovery ZooKeeper (service discovery) ZooKeeper (configuration) configuration changes metrics Kubernetes (service discovery)
  • 31. Monitoring • approach #1 – graphite • excellent in-house facilities • aggregations is too heavy at scale • approach #2 – prometheus • statsd support • statsd_exporter • still iterating • standard dashboard envoy statsd_exporter envoy…
  • 32.
  • 33. application provider envoy service consumer service provider data plane HTTP(S) HTTP control plane control plane • routing rules • timeout/retry/etc policies • service discovery ZooKeeper (service discovery) ZooKeeper (configuration) configuration changes metrics Kubernetes (service discovery)
  • 35. Some numbers • ~ 6 months in production • ~ 30 projects • ~ 10K servers • hundreds of thousands RPS • overheads…
  • 38. Some findings • graceful restart works, but not always! • (major) Envoy upgrades may not be graceful • ”x-envoy-retry-on = connect-failure” – too few “x-envoy-retry-on = 5xx” – too much • gateway-error – HTTP 502, 503, 504 • absence of TCP keepalive => stale connections • coming in 1.7 • idle_timeout in TCP proxy in 1.6 • cluster_name (1.5) -> cluster_names (1.6)
  • 40. service mesh* is a building block on the path to SOA technically & organizationally * the way I understand it
  • 41. Andrei Vereha Envoy production stories Friday, May 4th 14:00 Envoy Deep Dive