Giacomo Tirabassi
Istio atInfluxData
© 2020 InfluxData. All rights reserved. 2
Who am
I?
Italian
SRE @ InfluxData (keep Cloud2 running)
Running Kubernetes since version 1.8 (weird flex but OK)
Travel, eat, cook, repeat
© 2020 InfluxData. All rights reserved. 3
Agenda
1. What is a service mesh?
2. Why do we need a service mesh?
3. How do we run it?
4. Outcome
5. Roadmap
What is a service mesh?
Volume 1
© 2020 InfluxData. All rights reserved. 5
Alternatives
● Istio (Google)
● Linkerd (CNCF, Buoyant)
● Consul Connect (Hashicorp)
© 2020 InfluxData. All rights reserved. 6
What is a Service Mesh?
Extract functionalities from application into the platform
● Networking: retries, timeout, rate limits, circuit breakers, canary
● Security: mTLS or JWT for authentication, authorization policies
● Observability: automatic tracing, access log, protocol specific metrics
© 2020 InfluxData. All rights reserved.
Kubernetes : Linux Process = Istio : HTTP Request
© 2020 InfluxData. All rights reserved. 8
Why does it make sense?
Extract functionalities from application into the platform
● Unix philosophy: “do one thing and do it well”
● Polyglot platforms: attempts like Hystrix don’t work if multiple languages are
used
● Zero trust networks: just because it's in your VPC doesn't mean it's
secure, reduce blast radius
© 2020 InfluxData. All rights reserved. 9
Let it sink in
Platform All Apps
Why do we need a service mesh?
Volume 2
© 2020 InfluxData. All rights reserved.
1
Initial reasons
● Canary deployments
● Defining better SLAs and SLOs
Challenges
Moooar complexity
Service Mesh
CNI
Kubernetes
Containers
Linux OS
Cloud Provider
Migrating production services...
How do we install and run Istio?
Volume 3
© 2020 InfluxData. All rights reserved.
1
Istio Deployment
From version 1.2 to 1.4
● We kept our fork of Helm installation in Jsonnet
●Configuration mess
Version 1.5 and 1.6
● Istio operator with istiod: single control plane component
● https://github.com/influxdata/helm-charts/tree/master/istio
© 2020 InfluxData. All rights reserved.
1
Prometheus-less deployment
● With istio-mixer (metrics aggregator)
○ Telegraf sidecar for istio-mixer to scrape metrics
● Without istio-mixer
○ Sidecar with telegraf-operator foristiod
○ telegraf-operator is used to add sidecar to every pod using istioto
scrape `http://127.0.0.1:15090/stats/prometheus`
© 2020 InfluxData. All rights reserved.
1
Istio caused outage
● Running Istio 1.3 and defining a Kubernetes Service with port name
`http` and port number `443` broke all communications from application in
the mesh to external services exposed with https
● Upstream bug: https://github.com/istio/istio/issues/16458
© 2020 InfluxData. All rights reserved.
1
Istio caused outage
● Solved in 1.4 upstream
● We solved it with conftest
Benefits
Volume 4
© 2020 InfluxData. All rights reserved.
2
© 2020 InfluxData. All rights reserved. 22
sensitive
© 2020 InfluxData. All rights reserved. 23
© 2020 InfluxData. All rights reserved. 24
© 2020 InfluxData. All rights reserved. 25
Dashboards soon to be in
https://github.com/influxdata/community-templates
Next steps
Volume 5
© 2020 InfluxData. All rights reserved.
2
Roadmap ( 1 of 3)
● Switch to mixer-less monitoring
● more services enrolled in the mesh
● ingress gateway
● configure tracing (flip the switch)
● enable access logs
© 2020 InfluxData. All rights reserved. 28
Roadmap ( 2 of 3)
● mTLS policy to STRICT which means that envoy will refuse all
connections which are not over mTLS from within the MESH
● outbound traffic to REGISTRY_ONLY: this means that only endpoints
inside of the mesh and in ServiceEntry are reachable by an app
● start using Egress Gateway for egress traffic out for alerts
© 2020 InfluxData. All rights reserved. 29
Roadmap ( 3 of 3)
● Sidecar CRD for all workloads (reduce number of connectionscross
namespace)
● Start reducing cross service access with PeerAuthentication
● Cross Region Failover with multi-cluster configuration
● Contribute to Kiali to work with InfluxDB
© 2020 InfluxData. All rights reserved. 30
The End
Twitter / Linkedin / Github
@gitirabassi

Giacomo Tirabassi [InfluxData] | Istio at InfluxData | InfluxDays Virtual Experience London 2020

  • 1.
  • 2.
    © 2020 InfluxData.All rights reserved. 2 Who am I? Italian SRE @ InfluxData (keep Cloud2 running) Running Kubernetes since version 1.8 (weird flex but OK) Travel, eat, cook, repeat
  • 3.
    © 2020 InfluxData.All rights reserved. 3 Agenda 1. What is a service mesh? 2. Why do we need a service mesh? 3. How do we run it? 4. Outcome 5. Roadmap
  • 4.
    What is aservice mesh? Volume 1
  • 5.
    © 2020 InfluxData.All rights reserved. 5 Alternatives ● Istio (Google) ● Linkerd (CNCF, Buoyant) ● Consul Connect (Hashicorp)
  • 6.
    © 2020 InfluxData.All rights reserved. 6 What is a Service Mesh? Extract functionalities from application into the platform ● Networking: retries, timeout, rate limits, circuit breakers, canary ● Security: mTLS or JWT for authentication, authorization policies ● Observability: automatic tracing, access log, protocol specific metrics
  • 7.
    © 2020 InfluxData.All rights reserved. Kubernetes : Linux Process = Istio : HTTP Request
  • 8.
    © 2020 InfluxData.All rights reserved. 8 Why does it make sense? Extract functionalities from application into the platform ● Unix philosophy: “do one thing and do it well” ● Polyglot platforms: attempts like Hystrix don’t work if multiple languages are used ● Zero trust networks: just because it's in your VPC doesn't mean it's secure, reduce blast radius
  • 9.
    © 2020 InfluxData.All rights reserved. 9 Let it sink in Platform All Apps
  • 10.
    Why do weneed a service mesh? Volume 2
  • 11.
    © 2020 InfluxData.All rights reserved. 1 Initial reasons ● Canary deployments ● Defining better SLAs and SLOs
  • 12.
  • 13.
  • 14.
  • 15.
    How do weinstall and run Istio? Volume 3
  • 16.
    © 2020 InfluxData.All rights reserved. 1 Istio Deployment From version 1.2 to 1.4 ● We kept our fork of Helm installation in Jsonnet ●Configuration mess Version 1.5 and 1.6 ● Istio operator with istiod: single control plane component ● https://github.com/influxdata/helm-charts/tree/master/istio
  • 17.
    © 2020 InfluxData.All rights reserved. 1 Prometheus-less deployment ● With istio-mixer (metrics aggregator) ○ Telegraf sidecar for istio-mixer to scrape metrics ● Without istio-mixer ○ Sidecar with telegraf-operator foristiod ○ telegraf-operator is used to add sidecar to every pod using istioto scrape `http://127.0.0.1:15090/stats/prometheus`
  • 18.
    © 2020 InfluxData.All rights reserved. 1 Istio caused outage ● Running Istio 1.3 and defining a Kubernetes Service with port name `http` and port number `443` broke all communications from application in the mesh to external services exposed with https ● Upstream bug: https://github.com/istio/istio/issues/16458
  • 19.
    © 2020 InfluxData.All rights reserved. 1 Istio caused outage ● Solved in 1.4 upstream ● We solved it with conftest
  • 20.
  • 21.
    © 2020 InfluxData.All rights reserved. 2
  • 22.
    © 2020 InfluxData.All rights reserved. 22 sensitive
  • 23.
    © 2020 InfluxData.All rights reserved. 23
  • 24.
    © 2020 InfluxData.All rights reserved. 24
  • 25.
    © 2020 InfluxData.All rights reserved. 25 Dashboards soon to be in https://github.com/influxdata/community-templates
  • 26.
  • 27.
    © 2020 InfluxData.All rights reserved. 2 Roadmap ( 1 of 3) ● Switch to mixer-less monitoring ● more services enrolled in the mesh ● ingress gateway ● configure tracing (flip the switch) ● enable access logs
  • 28.
    © 2020 InfluxData.All rights reserved. 28 Roadmap ( 2 of 3) ● mTLS policy to STRICT which means that envoy will refuse all connections which are not over mTLS from within the MESH ● outbound traffic to REGISTRY_ONLY: this means that only endpoints inside of the mesh and in ServiceEntry are reachable by an app ● start using Egress Gateway for egress traffic out for alerts
  • 29.
    © 2020 InfluxData.All rights reserved. 29 Roadmap ( 3 of 3) ● Sidecar CRD for all workloads (reduce number of connectionscross namespace) ● Start reducing cross service access with PeerAuthentication ● Cross Region Failover with multi-cluster configuration ● Contribute to Kiali to work with InfluxDB
  • 30.
    © 2020 InfluxData.All rights reserved. 30 The End Twitter / Linkedin / Github @gitirabassi