Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Giacomo Tirabassi [InfluxData] | Istio at InfluxData | InfluxDays Virtual Experience London 2020

47 views

Published on

This talk covers how and why we are running Istio on Kubernetes at InfluxData. Buzzwords can’t substitute engineering. Our experience, roadmap and challenges of using Istio without Prometheus and using Telegraf-operator.

Published in: Technology
  • Be the first to comment

Giacomo Tirabassi [InfluxData] | Istio at InfluxData | InfluxDays Virtual Experience London 2020

  1. 1. Giacomo Tirabassi Istio atInfluxData
  2. 2. © 2020 InfluxData. All rights reserved. 2 Who am I? Italian SRE @ InfluxData (keep Cloud2 running) Running Kubernetes since version 1.8 (weird flex but OK) Travel, eat, cook, repeat
  3. 3. © 2020 InfluxData. All rights reserved. 3 Agenda 1. What is a service mesh? 2. Why do we need a service mesh? 3. How do we run it? 4. Outcome 5. Roadmap
  4. 4. What is a service mesh? Volume 1
  5. 5. © 2020 InfluxData. All rights reserved. 5 Alternatives ● Istio (Google) ● Linkerd (CNCF, Buoyant) ● Consul Connect (Hashicorp)
  6. 6. © 2020 InfluxData. All rights reserved. 6 What is a Service Mesh? Extract functionalities from application into the platform ● Networking: retries, timeout, rate limits, circuit breakers, canary ● Security: mTLS or JWT for authentication, authorization policies ● Observability: automatic tracing, access log, protocol specific metrics
  7. 7. © 2020 InfluxData. All rights reserved. Kubernetes : Linux Process = Istio : HTTP Request
  8. 8. © 2020 InfluxData. All rights reserved. 8 Why does it make sense? Extract functionalities from application into the platform ● Unix philosophy: “do one thing and do it well” ● Polyglot platforms: attempts like Hystrix don’t work if multiple languages are used ● Zero trust networks: just because it's in your VPC doesn't mean it's secure, reduce blast radius
  9. 9. © 2020 InfluxData. All rights reserved. 9 Let it sink in Platform All Apps
  10. 10. Why do we need a service mesh? Volume 2
  11. 11. © 2020 InfluxData. All rights reserved. 1 Initial reasons ● Canary deployments ● Defining better SLAs and SLOs
  12. 12. Challenges
  13. 13. Moooar complexity Service Mesh CNI Kubernetes Containers Linux OS Cloud Provider
  14. 14. Migrating production services...
  15. 15. How do we install and run Istio? Volume 3
  16. 16. © 2020 InfluxData. All rights reserved. 1 Istio Deployment From version 1.2 to 1.4 ● We kept our fork of Helm installation in Jsonnet ●Configuration mess Version 1.5 and 1.6 ● Istio operator with istiod: single control plane component ● https://github.com/influxdata/helm-charts/tree/master/istio
  17. 17. © 2020 InfluxData. All rights reserved. 1 Prometheus-less deployment ● With istio-mixer (metrics aggregator) ○ Telegraf sidecar for istio-mixer to scrape metrics ● Without istio-mixer ○ Sidecar with telegraf-operator foristiod ○ telegraf-operator is used to add sidecar to every pod using istioto scrape `http://127.0.0.1:15090/stats/prometheus`
  18. 18. © 2020 InfluxData. All rights reserved. 1 Istio caused outage ● Running Istio 1.3 and defining a Kubernetes Service with port name `http` and port number `443` broke all communications from application in the mesh to external services exposed with https ● Upstream bug: https://github.com/istio/istio/issues/16458
  19. 19. © 2020 InfluxData. All rights reserved. 1 Istio caused outage ● Solved in 1.4 upstream ● We solved it with conftest
  20. 20. Benefits Volume 4
  21. 21. © 2020 InfluxData. All rights reserved. 2
  22. 22. © 2020 InfluxData. All rights reserved. 22 sensitive
  23. 23. © 2020 InfluxData. All rights reserved. 23
  24. 24. © 2020 InfluxData. All rights reserved. 24
  25. 25. © 2020 InfluxData. All rights reserved. 25 Dashboards soon to be in https://github.com/influxdata/community-templates
  26. 26. Next steps Volume 5
  27. 27. © 2020 InfluxData. All rights reserved. 2 Roadmap ( 1 of 3) ● Switch to mixer-less monitoring ● more services enrolled in the mesh ● ingress gateway ● configure tracing (flip the switch) ● enable access logs
  28. 28. © 2020 InfluxData. All rights reserved. 28 Roadmap ( 2 of 3) ● mTLS policy to STRICT which means that envoy will refuse all connections which are not over mTLS from within the MESH ● outbound traffic to REGISTRY_ONLY: this means that only endpoints inside of the mesh and in ServiceEntry are reachable by an app ● start using Egress Gateway for egress traffic out for alerts
  29. 29. © 2020 InfluxData. All rights reserved. 29 Roadmap ( 3 of 3) ● Sidecar CRD for all workloads (reduce number of connectionscross namespace) ● Start reducing cross service access with PeerAuthentication ● Cross Region Failover with multi-cluster configuration ● Contribute to Kiali to work with InfluxDB
  30. 30. © 2020 InfluxData. All rights reserved. 30 The End Twitter / Linkedin / Github @gitirabassi

×