What is a Service Mesh?
And Do I Need One When Developing “Cloud Native” Microservices?
Daniel Bryant
@danielbryantuk
Cloud Native Apps: Expectations versus reality
@danielbryantuk
“DevOps”
tl;dr – Service Meshes
• A service mesh is a dedicated infrastructure layer for making service-to-
service communication safe, reliable, observable and configurable
• Valuable as we move from deployment of complicated monoliths/services
to orchestration of complex “cloud native” microservices and functions
• But take care, as this is very new technology implementing an old pattern!
@danielbryantuk
@danielbryantuk
• Independent Technical Consultant
• Architecture, DevOps, Java, microservices, cloud, containers
• Continuous Delivery (CI/CD) advocate
• Leading change through technology and teams
@danielbryantuk
bit.ly/2jWDSF7
Setting the Scene
@danielbryantuk
23/03/2018 @danielbryantuk
Simple
(Sense, Categorise, Respond)
Complicated
(Sense, Analyse, Respond)
Complex
(Probe, Sense, Respond)
1990s
Monoliths
In-process comms, custom wire protocols
Single language
In-house hardware (servers, SAN, networks)
Manual config and scripting
Optimise for Stability (MTBF)
Specialist staff/departments
2010s
Microservices, functions, SaaS-all-the-things
Dumb pipes (HTTP, Kafka), de-centralised
Polyglot languages
Cloud and containers (Datacenter as a Computer)
Software-Defined Everything
Optimise for innovation (and MTTR)
Business teams (“FinDev”, SRE and Platform Team)
2000s
Monoliths, Coarse-grained SOA, SaaS
Smart pipes (ESB, MQ), centralised, BPM
Frontend/backend language
“Co-lo” or private datacenters
Configuration management
Optimise for Recovery (MTTR)
Generalist teams (Full Stack and “DevOps”)
Chaotic
(Act, Sense, Respond)
”Cloud Native”
Eight Fallacies of Distributed Computing Cloud Native
1. The network is reliable.
2. Latency is zero.
3. Bandwidth is infinite.
4. The network is secure.
5. Topology doesn't change.
6. There is one administrator.
7. Transport cost is zero.
8. The network is homogeneous.
23/03/2018 @danielbryantuk
https://www.somethingsimilar.com/2013/01/14/notes-on-distributed-systems-for-young-bloods/
What do ”cloud native” comms look like?
• Services communicate over an (unreliable) network
• These interactions are non-trivial
• Lots of value in understanding the network
• The application is ultimately responsible
@danielbryantuk
blog.christianposta.com/microservices/application-network-
functions-with-esbs-api-management-and-now-service-mesh/
But we’ve been here before…
@danielbryantuk
blog.christianposta.com/microservices/application-network-
functions-with-esbs-api-management-and-now-service-mesh/
www.slideshare.net/dbryant_uk/goto-chicagocraftconf-2017-the-
seven-more-deadly-sins-of-microservices
Avoiding the ESB: My first “service mesh”
@danielbryantuk
http://techblog.poppulo.com/microservices-service-discovery-with-smartstack-and-docker/
@danielbryantuk
http://philcalcado.com/2017/08/03/pattern_service_mesh.html
Service Meshes
@danielbryantuk
Service Mesh Functionality
@danielbryantuk
Service mesh features
• Normalises naming and adds logical routing
• user-service -> AWS-us-east-1a/prod/users/v4
• Adds traffic shaping and traffic shifting
• Load balancing
• Deploy control
• Per-request routing (shadowing, fault injection, debug)
• Adds baseline reliability
• Health checks, timeouts/deadlines, circuit breaking, and retry (budgets)
@danielbryantuk
Service mesh features
• Increased security
• Transparent mutual TLS
• Policies (service Access Control Lists - ACL)
• Observability / monitoring
• Top-line metrics like request volume, success rates and latencies
• Distributed tracing
• Sane defaults (to protect the system)
• With options to tune
@danielbryantuk
Naming and load balancing
@danielbryantuk
https://buoyant.io/2016/03/16/beyond-round-robin-load-balancing-for-latency/
Traffic control
@danielbryantuk
https://istio.io/docs/concepts/traffic-management/request-routing.html
https://www.youtube.com/watch?v=s4qasWn_mFc
Per-request routing: shadow, fault inject, debug
@danielbryantuk
https://buoyant.io/2017/01/06/a-service-mesh-for-kubernetes-part-vi-staging-microservices-without-the-tears/
Timeouts / deadlines
@danielbryantuk
William Morgan Introduction to Linkerd: https://www.youtube.com/watch?v=0xYSy6OmjUM
Circuit breaking (out of process, not Hystrix)
@danielbryantuk
Mutual TLS (transparent protocol upgrade)
@danielbryantuk
https://istio.io/blog/istio-auth-for-microservices.html
Communication policies
@danielbryantuk
https://istio.io/docs/concepts/policy-and-control/mixer-config.html#aspects
https://www.projectcalico.org/network-policy-and-istio-deep-dive/
Small digression: Open Policy Agent
@danielbryantuk
http://www.openpolicyagent.org/
Visibility
@danielbryantuk
Service Mesh Implementations
@danielbryantuk
@danielbryantuk
https://www.infoq.com/news/2018/03/cilium-linux-bpf
https://www.infoq.com/news/2018/01/conduit-service-mesh
https://www.infoq.com/news/2017/09/nginx-platform-service-mesh
https://www.infoq.com/news/2017/10/cncf-notary-envoy-jaeger
@danielbryantuk
https://github.com/fabiolb/fabiohttps://verizon.github.io/nelson/
Putting it all together: Istio
• “Istio” is an open platform
• Connect, manage, secure services
• Proxies are the data plane / mesh
• Proxies are (in theory) swappable
• But in reality there are different
feature sets, security, performance
@danielbryantuk
Control Plane / Data Plane (Istio example)
@danielbryantuk
https://istio.io/docs/concepts/what-is-istio/overview.html
Control plane
Data plane
Istio control plane: Pilot and Mixer
@danielbryantuk
Precondition checking
Quota management
Telemetry reporting
Linkerd and NGINX control plane
@danielbryantuk
www.infoq.com/news/2017/09/nginx-platform-service-mesh
Control Plane / Data Plane (Istio example)
@danielbryantuk
https://istio.io/docs/concepts/what-is-istio/overview.html
Control plane
Data plane
@danielbryantuk
https://www.envoyproxy.io/
https://www.getambassador.io/
https://gloo.solo.io/
Getting started
• Articles:
• Linkerd + Kubernetes:
• https://buoyant.io/2016/10/04/a-service-mesh-for-kubernetes-part-i-top-line-service-metrics/
• Installing Istio:
• https://istio.io/docs/tasks/installing-istio.html
• Tim Perrett: Envoy with Nomad and Consul
• http://timperrett.com/2017/05/13/nomad-with-envoy-and-consul
• NGINX Fabric Model:
• https://www.nginx.com/blog/microservices-reference-architecture-nginx-fabric-model/
• Videos:
• William Morgan - Linkerd:
• https://www.youtube.com/watch?v=0xYSy6OmjUM
• Christian Posta – Envoy/Istio:
• http://blog.christianposta.com/microservices/00-microservices-patterns-with-envoy-proxy-series/
• Matt Klein – Envoy:
• https://www.youtube.com/watch?v=RVZX4CwKhGE
• Kelsey Hightower - Istio:
• https://www.youtube.com/watch?v=s4qasWn_mFc
@danielbryantuk
https://www.katacoda.com/courses/istio/deploy-istio-on-kubernetes
Use cases
@danielbryantuk
Use cases for Service Meshes
• Self-service configuration and observability
• Evolution from complicated to complex systems
• Monolith-to-service migration
• All components can use the same communication fabric
• Routing (shadow traffic, A/B, canarying etc)
• Chaos Engineering
@danielbryantuk
Self-service config and observability (IaC)
@danielbryantuk
Handling complexity: Debugging Microservices
@danielbryantuk
https://www.infoq.com/news/2017/11/debug-container-microservices
Migrations
@danielbryantuk
https://blog.christianposta.com/microservices/traffic-shadowing-with-
istio-reduce-the-risk-of-code-release/
https://github.com/github/scientist
https://blog.twitter.com/engineering/en_us/a/2015/diffy-
testing-services-without-writing-tests.html
Chaos Engineering
@danielbryantuk
https://istio.io/docs/concepts/traffic-management/rules-
configuration.html#injecting-faults-in-the-request-path
The Future
@danielbryantuk
Wardley Mapping Service Meshes
@danielbryantuk
https://medium.com/wardleymaps/on-being-lost-2ef5f05eb1ec
@danielbryantuk
So, maybe a few words of caution!
@danielbryantuk
https://xkcd.com/927/ https://twitter.com/jpilbeam/status/588693310610485248
Wrapping Up
@danielbryantuk
In conclusion…
• A service mesh is a dedicated infrastructure layer for making service-to-
service communication safe, reliable, observable and configurable
• Homogenise all RPC and (potentially) messaging
• Moving from complicated monoliths/services to orchestration of
complex “cloud native” microservices and functions
• Can provide hooks for observability, testing and debugging
• New technology implementing an old pattern!
• Know the risks, analyse your bottlenecks and determine your ROI
@danielbryantuk
Massive thanks to everyone who has helped!
• William Morgan @ Buoyant
• Owen Garrett @ NGINX
• Christian Posta @ Red Hat
• Matt Klein @ Lyft
• Shriram Rajagopalan (Istio-users)
• Louis Ryan (Istio-users)
• Varun Talwar @ Google
• Many more from the community
@danielbryantuk
Thanks for listening…
Twitter: @danielbryantuk
Email: daniel.bryant@tai-dev.co.uk
Writing: www.infoq.com/profile/Daniel-Bryant
Talks: www.youtube.com/playlist?list=PLoVYf_0qOYNeBmrpjuBOOAqJnQb3QAEtM
@danielbryantuk
Available Q3 2018!
bit.ly/2jWDSF7
Common Questions
“But what about…"
@danielbryantuk
How do service meshes relate to (Edge/API) gateways?
• Gateways primarily sit on the edge of your network
• Perform ingress cross-cutting concerns (authn/z, rate limiting, logging etc)
• My experience
• NGINX
• Cloud implementations
• Traefik and Datawire’s Ambassador (based on Envoy)
• Some are vying to act as the communication backbone too
• Kong API
• Mulesoft
• NGINX
@danielbryantuk
Isn’t this just ESB 2.0 or “web scale” ESB
• No
• At least not yet…
• ESB development was vendor-driven
• Overly centralised/coupled/conflated
• Process choreography
• Document transformation
• Tight integration with vendor products
@danielbryantuk
https://en.wikipedia.org/wiki/Enterprise_service_bus#/media/File:ESB_Component_Hive.png
Isn’t this just adding more network hops?
• Maybe… It depends on your network config
• …but good (infrastructure) architecture is all about
• Choosing the right abstraction
• Making trade-offs
• Separation of concerns
• Make an educated choice with your platform, and make it explicitly
@danielbryantuk
Shouldn’t this be part of the “platform”?
• Yep…
• And it probably will be in the near future
• But expect much innovation (and change) over the next 6-12 months
• Assess if it will be beneficial for your organisation to leverage this now
@danielbryantuk
Who owns the Service Mesh? Dev, SREs, Ops?
• Yes…
• As mentioned earlier
• We work with a sociotechnical system when delivering value/software
• Everything is context dependent (on your organisation)
• But deployment descriptor and service mesh config can provide good dev/ops
collaboration zones as part of the “platform”
• Make a decision, communicate it, and regularly retrospect
@danielbryantuk
So, Service Mesh all-the-things… right?
• No…
• It’s all about context and trade-offs
• Service meshes are great for point-to-point RPC
• Messaging is useful to decouple services in space and time
• Async work queues, pub/sub, topics e.g. RabbitMQ
• Distributed txn logs and stream processing e.g. Kafka
@danielbryantuk
Bonus content
@danielbryantuk
Let’s go unicorn spotting…
• Netflix
• Karyon + HTTP/JSON or RxNetty RPC + Eureka + Hystrix + …
• Twitter
• Finagle + Thrift + ZooKeeper + Zipkin
• Google
• Stubby (gRPC) + GSLB + GFE + Dapper
• AirBnB
• HTTP/JSON + SmartStack + ZooKeeper + Charon/Dyno
@danielbryantuk
Copying Netflix for “cloud native” comms
• Many of us have no single
mechanism for RPC / messaging
• Unlike Google, Twitter
• Instead, we can implement comms
handling via libraries
• Ribbon, Eureka, Hystrix
• Predominantly JVM-based
• Potentially use a “sidecar” (Prana)
@danielbryantuk
https://www.voxxed.com/2015/01/use-container-sidecar-microservices/

microXchg 2018: "What is a Service Mesh? Do I Need One When Developing 'Cloud Native' Microservices

  • 1.
    What is aService Mesh? And Do I Need One When Developing “Cloud Native” Microservices? Daniel Bryant @danielbryantuk
  • 2.
    Cloud Native Apps:Expectations versus reality @danielbryantuk “DevOps”
  • 3.
    tl;dr – ServiceMeshes • A service mesh is a dedicated infrastructure layer for making service-to- service communication safe, reliable, observable and configurable • Valuable as we move from deployment of complicated monoliths/services to orchestration of complex “cloud native” microservices and functions • But take care, as this is very new technology implementing an old pattern! @danielbryantuk
  • 4.
    @danielbryantuk • Independent TechnicalConsultant • Architecture, DevOps, Java, microservices, cloud, containers • Continuous Delivery (CI/CD) advocate • Leading change through technology and teams @danielbryantuk bit.ly/2jWDSF7
  • 5.
  • 6.
    23/03/2018 @danielbryantuk Simple (Sense, Categorise,Respond) Complicated (Sense, Analyse, Respond) Complex (Probe, Sense, Respond) 1990s Monoliths In-process comms, custom wire protocols Single language In-house hardware (servers, SAN, networks) Manual config and scripting Optimise for Stability (MTBF) Specialist staff/departments 2010s Microservices, functions, SaaS-all-the-things Dumb pipes (HTTP, Kafka), de-centralised Polyglot languages Cloud and containers (Datacenter as a Computer) Software-Defined Everything Optimise for innovation (and MTTR) Business teams (“FinDev”, SRE and Platform Team) 2000s Monoliths, Coarse-grained SOA, SaaS Smart pipes (ESB, MQ), centralised, BPM Frontend/backend language “Co-lo” or private datacenters Configuration management Optimise for Recovery (MTTR) Generalist teams (Full Stack and “DevOps”) Chaotic (Act, Sense, Respond) ”Cloud Native”
  • 7.
    Eight Fallacies ofDistributed Computing Cloud Native 1. The network is reliable. 2. Latency is zero. 3. Bandwidth is infinite. 4. The network is secure. 5. Topology doesn't change. 6. There is one administrator. 7. Transport cost is zero. 8. The network is homogeneous. 23/03/2018 @danielbryantuk https://www.somethingsimilar.com/2013/01/14/notes-on-distributed-systems-for-young-bloods/
  • 8.
    What do ”cloudnative” comms look like? • Services communicate over an (unreliable) network • These interactions are non-trivial • Lots of value in understanding the network • The application is ultimately responsible @danielbryantuk blog.christianposta.com/microservices/application-network- functions-with-esbs-api-management-and-now-service-mesh/
  • 9.
    But we’ve beenhere before… @danielbryantuk blog.christianposta.com/microservices/application-network- functions-with-esbs-api-management-and-now-service-mesh/ www.slideshare.net/dbryant_uk/goto-chicagocraftconf-2017-the- seven-more-deadly-sins-of-microservices
  • 10.
    Avoiding the ESB:My first “service mesh” @danielbryantuk http://techblog.poppulo.com/microservices-service-discovery-with-smartstack-and-docker/
  • 11.
  • 12.
  • 13.
  • 14.
    Service mesh features •Normalises naming and adds logical routing • user-service -> AWS-us-east-1a/prod/users/v4 • Adds traffic shaping and traffic shifting • Load balancing • Deploy control • Per-request routing (shadowing, fault injection, debug) • Adds baseline reliability • Health checks, timeouts/deadlines, circuit breaking, and retry (budgets) @danielbryantuk
  • 15.
    Service mesh features •Increased security • Transparent mutual TLS • Policies (service Access Control Lists - ACL) • Observability / monitoring • Top-line metrics like request volume, success rates and latencies • Distributed tracing • Sane defaults (to protect the system) • With options to tune @danielbryantuk
  • 16.
    Naming and loadbalancing @danielbryantuk https://buoyant.io/2016/03/16/beyond-round-robin-load-balancing-for-latency/
  • 17.
  • 18.
    Per-request routing: shadow,fault inject, debug @danielbryantuk https://buoyant.io/2017/01/06/a-service-mesh-for-kubernetes-part-vi-staging-microservices-without-the-tears/
  • 19.
    Timeouts / deadlines @danielbryantuk WilliamMorgan Introduction to Linkerd: https://www.youtube.com/watch?v=0xYSy6OmjUM
  • 20.
    Circuit breaking (outof process, not Hystrix) @danielbryantuk
  • 21.
    Mutual TLS (transparentprotocol upgrade) @danielbryantuk https://istio.io/blog/istio-auth-for-microservices.html
  • 22.
  • 23.
    Small digression: OpenPolicy Agent @danielbryantuk http://www.openpolicyagent.org/
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
    Putting it alltogether: Istio • “Istio” is an open platform • Connect, manage, secure services • Proxies are the data plane / mesh • Proxies are (in theory) swappable • But in reality there are different feature sets, security, performance @danielbryantuk
  • 29.
    Control Plane /Data Plane (Istio example) @danielbryantuk https://istio.io/docs/concepts/what-is-istio/overview.html Control plane Data plane
  • 30.
    Istio control plane:Pilot and Mixer @danielbryantuk Precondition checking Quota management Telemetry reporting
  • 31.
    Linkerd and NGINXcontrol plane @danielbryantuk www.infoq.com/news/2017/09/nginx-platform-service-mesh
  • 32.
    Control Plane /Data Plane (Istio example) @danielbryantuk https://istio.io/docs/concepts/what-is-istio/overview.html Control plane Data plane
  • 33.
  • 34.
    Getting started • Articles: •Linkerd + Kubernetes: • https://buoyant.io/2016/10/04/a-service-mesh-for-kubernetes-part-i-top-line-service-metrics/ • Installing Istio: • https://istio.io/docs/tasks/installing-istio.html • Tim Perrett: Envoy with Nomad and Consul • http://timperrett.com/2017/05/13/nomad-with-envoy-and-consul • NGINX Fabric Model: • https://www.nginx.com/blog/microservices-reference-architecture-nginx-fabric-model/ • Videos: • William Morgan - Linkerd: • https://www.youtube.com/watch?v=0xYSy6OmjUM • Christian Posta – Envoy/Istio: • http://blog.christianposta.com/microservices/00-microservices-patterns-with-envoy-proxy-series/ • Matt Klein – Envoy: • https://www.youtube.com/watch?v=RVZX4CwKhGE • Kelsey Hightower - Istio: • https://www.youtube.com/watch?v=s4qasWn_mFc @danielbryantuk https://www.katacoda.com/courses/istio/deploy-istio-on-kubernetes
  • 35.
  • 36.
    Use cases forService Meshes • Self-service configuration and observability • Evolution from complicated to complex systems • Monolith-to-service migration • All components can use the same communication fabric • Routing (shadow traffic, A/B, canarying etc) • Chaos Engineering @danielbryantuk
  • 37.
    Self-service config andobservability (IaC) @danielbryantuk
  • 38.
    Handling complexity: DebuggingMicroservices @danielbryantuk https://www.infoq.com/news/2017/11/debug-container-microservices
  • 39.
  • 40.
  • 41.
  • 42.
    Wardley Mapping ServiceMeshes @danielbryantuk https://medium.com/wardleymaps/on-being-lost-2ef5f05eb1ec
  • 43.
  • 44.
    So, maybe afew words of caution! @danielbryantuk https://xkcd.com/927/ https://twitter.com/jpilbeam/status/588693310610485248
  • 45.
  • 46.
    In conclusion… • Aservice mesh is a dedicated infrastructure layer for making service-to- service communication safe, reliable, observable and configurable • Homogenise all RPC and (potentially) messaging • Moving from complicated monoliths/services to orchestration of complex “cloud native” microservices and functions • Can provide hooks for observability, testing and debugging • New technology implementing an old pattern! • Know the risks, analyse your bottlenecks and determine your ROI @danielbryantuk
  • 47.
    Massive thanks toeveryone who has helped! • William Morgan @ Buoyant • Owen Garrett @ NGINX • Christian Posta @ Red Hat • Matt Klein @ Lyft • Shriram Rajagopalan (Istio-users) • Louis Ryan (Istio-users) • Varun Talwar @ Google • Many more from the community @danielbryantuk
  • 48.
    Thanks for listening… Twitter:@danielbryantuk Email: daniel.bryant@tai-dev.co.uk Writing: www.infoq.com/profile/Daniel-Bryant Talks: www.youtube.com/playlist?list=PLoVYf_0qOYNeBmrpjuBOOAqJnQb3QAEtM @danielbryantuk Available Q3 2018! bit.ly/2jWDSF7
  • 49.
    Common Questions “But whatabout…" @danielbryantuk
  • 50.
    How do servicemeshes relate to (Edge/API) gateways? • Gateways primarily sit on the edge of your network • Perform ingress cross-cutting concerns (authn/z, rate limiting, logging etc) • My experience • NGINX • Cloud implementations • Traefik and Datawire’s Ambassador (based on Envoy) • Some are vying to act as the communication backbone too • Kong API • Mulesoft • NGINX @danielbryantuk
  • 51.
    Isn’t this justESB 2.0 or “web scale” ESB • No • At least not yet… • ESB development was vendor-driven • Overly centralised/coupled/conflated • Process choreography • Document transformation • Tight integration with vendor products @danielbryantuk https://en.wikipedia.org/wiki/Enterprise_service_bus#/media/File:ESB_Component_Hive.png
  • 52.
    Isn’t this justadding more network hops? • Maybe… It depends on your network config • …but good (infrastructure) architecture is all about • Choosing the right abstraction • Making trade-offs • Separation of concerns • Make an educated choice with your platform, and make it explicitly @danielbryantuk
  • 53.
    Shouldn’t this bepart of the “platform”? • Yep… • And it probably will be in the near future • But expect much innovation (and change) over the next 6-12 months • Assess if it will be beneficial for your organisation to leverage this now @danielbryantuk
  • 54.
    Who owns theService Mesh? Dev, SREs, Ops? • Yes… • As mentioned earlier • We work with a sociotechnical system when delivering value/software • Everything is context dependent (on your organisation) • But deployment descriptor and service mesh config can provide good dev/ops collaboration zones as part of the “platform” • Make a decision, communicate it, and regularly retrospect @danielbryantuk
  • 55.
    So, Service Meshall-the-things… right? • No… • It’s all about context and trade-offs • Service meshes are great for point-to-point RPC • Messaging is useful to decouple services in space and time • Async work queues, pub/sub, topics e.g. RabbitMQ • Distributed txn logs and stream processing e.g. Kafka @danielbryantuk
  • 56.
  • 57.
    Let’s go unicornspotting… • Netflix • Karyon + HTTP/JSON or RxNetty RPC + Eureka + Hystrix + … • Twitter • Finagle + Thrift + ZooKeeper + Zipkin • Google • Stubby (gRPC) + GSLB + GFE + Dapper • AirBnB • HTTP/JSON + SmartStack + ZooKeeper + Charon/Dyno @danielbryantuk
  • 58.
    Copying Netflix for“cloud native” comms • Many of us have no single mechanism for RPC / messaging • Unlike Google, Twitter • Instead, we can implement comms handling via libraries • Ribbon, Eureka, Hystrix • Predominantly JVM-based • Potentially use a “sidecar” (Prana) @danielbryantuk https://www.voxxed.com/2015/01/use-container-sidecar-microservices/