Reliable application roll out and
operations with Istio
Lin Sun, IBM @linsun_unc
Mandar Jog, Google @mandarjog
Common DevOps Challenge 1
• How do I roll out a newer version of my
microservice without down time?
• How do I ensure traffic continue goes
to the current version before the newer
version is tested and ready?
Common DevOps Challenge 2
• How do I do A/B testing?
• Release a new version to a
subset of users in a precise way
• I have launched B in the dark,
but how can I keep B to myself
or a small testing group?
Common DevOps Challenge 3
• How do I do canary testing?
• I want to leverage crowdsourced
testing. How do I test the new
version to a subset of users?
• How do I proceed to a full rollout
after satisfactory testing of the new
version?
Other Common DevOps Challenges
• Things don’t always go correctly in production…
How do I inject fault to my microservices to
prepare myself?
• Our team knows different languages and our
services are written in different languages.
• My services can only handle certain rate, how
can I limit rate for some of my services?
• I need to view what is going on with each of my
services when crisis arises.
Introduce Istio
http://istio.io
Intelligent Routing and Load Balancing
http://istio.io
Resilience Across Languages and Platforms
http://istio.io
http://istio.io
Secure Access with Fleet Wide Policy Enforcement

http://istio.io
In-Depth Telemetry and Reporting
Components of Istio
• Envoy proxy, to mediate all inbound and outbound traffic for all services in the service mesh.
Leverages Envoy features such as dynamic service discovery, load balancing, TLS
termination, HTTP/2 & gRPC proxying, circuit breakers, health checks, staged rollouts with %-
based traffic split, fault injection, and rich metrics.

• Pilot: Programming envoys and responsible for service discovery, registration and load
balancing

• Istio-Security provides strong service-to-service and end-user authentication using mutual
TLS, with built-in identity and credential management

• Mixer is responsible for enforcing access control and usage policies across the service mesh
and collecting telemetry data from the Envoy proxy and other services.
Our sidecar of choice
Putting it all together
Traffic Control
// A simple traffic control rule
destination:
name: serviceB.example.cluster.local
match:

  source: serviceA.example.cluster.local

route:

- labels:

    version: v1.5
    env: us-prod
  weight: 100
Challenge 1: How can I roll out new version without
down time or changing code?
Traffic Steering
// Content-based traffic steering rule
destination:
serviceB.example.cluster.local

match:

  httpHeaders:

    user-agent:

      regex: ^(.*?;)?(iPhone)(;.*)?$

precedence: 2

route:

- labels:

    version: v2
Challenge 2: How do I do A/B testing?
Traffic Splitting
// A simple traffic splitting rule
destination:
serviceB.example.cluster.local
match:

  source:
serviceA.example.cluster.local

route:

- labels:

    version: v1.5
    env: us-prod
  weight: 90
- labels:

    version: v2.0-alpha
    env: us-staging
  weight: 10
Challenge 3: How do I do canary testing?
Resiliency
// Circuit breakers
destination: serviceB.example.cluster.local

policy:

- labels:

    version: v1

  circuitBreaker:

    simpleCb:

      maxConnections: 100

      httpMaxRequests: 1000

      httpMaxRequestsPerConnection: 10

      httpConsecutiveErrors: 7

      sleepWindow: 15m

      httpDetectionInterval: 5m
Istio adds fault tolerance to your application
without any changes to code Resilience features
❖ Timeouts
❖ Retries with timeout budget
❖ Circuit breakers
❖ Health checks
❖ AZ-aware load balancing w/ automatic
failover
❖ Control connection pool size and request
load
Resiliency Testing
Systematic fault injection to identify weaknesses in failure recovery
policies
❖ HTTP/gRPC error codes 
❖ Delay injection
Rate Limiting
Istio protects your application from rogue
actors by imposing ratelimits Rate limit
❖ Configurable limits with overrides
❖ Multiple rate limiting backends
❖ Conditional rate limiting
Quotas:
- name: requestcount.quota.istio-system
maxAmount: 5000
validDuration: 1s
overrides:
- dimensions:
destination: ratings
source: reviews
sourceVersion: v3
maxAmount: 1
validDuration: 1s
- dimensions:
destination: ratings
maxAmount: 100
validDuration: 1s
Telemetry
Monitoring & tracing should not be an
afterthought in the infrastructure
Goals
● Metrics without instrumenting apps
● Consistent metrics across fleet
● Trace flow of requests across services
● Portable across metric backend
providers
Proposed Istio Deployment Controller
Istio Analytics
Demo
+

Application Rollout - Istio

  • 1.
    Reliable application rollout and operations with Istio Lin Sun, IBM @linsun_unc Mandar Jog, Google @mandarjog
  • 2.
    Common DevOps Challenge1 • How do I roll out a newer version of my microservice without down time? • How do I ensure traffic continue goes to the current version before the newer version is tested and ready?
  • 3.
    Common DevOps Challenge2 • How do I do A/B testing? • Release a new version to a subset of users in a precise way • I have launched B in the dark, but how can I keep B to myself or a small testing group?
  • 4.
    Common DevOps Challenge3 • How do I do canary testing? • I want to leverage crowdsourced testing. How do I test the new version to a subset of users? • How do I proceed to a full rollout after satisfactory testing of the new version?
  • 5.
    Other Common DevOpsChallenges • Things don’t always go correctly in production… How do I inject fault to my microservices to prepare myself? • Our team knows different languages and our services are written in different languages. • My services can only handle certain rate, how can I limit rate for some of my services? • I need to view what is going on with each of my services when crisis arises.
  • 6.
  • 7.
    Intelligent Routing andLoad Balancing http://istio.io
  • 8.
    Resilience Across Languagesand Platforms http://istio.io
  • 9.
    http://istio.io Secure Access withFleet Wide Policy Enforcement

  • 10.
  • 11.
    Components of Istio •Envoy proxy, to mediate all inbound and outbound traffic for all services in the service mesh. Leverages Envoy features such as dynamic service discovery, load balancing, TLS termination, HTTP/2 & gRPC proxying, circuit breakers, health checks, staged rollouts with %- based traffic split, fault injection, and rich metrics.
 • Pilot: Programming envoys and responsible for service discovery, registration and load balancing
 • Istio-Security provides strong service-to-service and end-user authentication using mutual TLS, with built-in identity and credential management
 • Mixer is responsible for enforcing access control and usage policies across the service mesh and collecting telemetry data from the Envoy proxy and other services.
  • 12.
  • 13.
  • 14.
    Traffic Control // Asimple traffic control rule destination: name: serviceB.example.cluster.local match:
   source: serviceA.example.cluster.local
 route:
 - labels:
     version: v1.5     env: us-prod   weight: 100 Challenge 1: How can I roll out new version without down time or changing code?
  • 15.
    Traffic Steering // Content-basedtraffic steering rule destination: serviceB.example.cluster.local
 match:
   httpHeaders:
     user-agent:
       regex: ^(.*?;)?(iPhone)(;.*)?$
 precedence: 2
 route:
 - labels:
     version: v2 Challenge 2: How do I do A/B testing?
  • 16.
    Traffic Splitting // Asimple traffic splitting rule destination: serviceB.example.cluster.local match:
   source: serviceA.example.cluster.local
 route:
 - labels:
     version: v1.5     env: us-prod   weight: 90 - labels:
     version: v2.0-alpha     env: us-staging   weight: 10 Challenge 3: How do I do canary testing?
  • 17.
    Resiliency // Circuit breakers destination:serviceB.example.cluster.local
 policy:
 - labels:
     version: v1
   circuitBreaker:
     simpleCb:
       maxConnections: 100
       httpMaxRequests: 1000
       httpMaxRequestsPerConnection: 10
       httpConsecutiveErrors: 7
       sleepWindow: 15m
       httpDetectionInterval: 5m Istio adds fault tolerance to your application without any changes to code Resilience features ❖ Timeouts ❖ Retries with timeout budget ❖ Circuit breakers ❖ Health checks ❖ AZ-aware load balancing w/ automatic failover ❖ Control connection pool size and request load
  • 18.
    Resiliency Testing Systematic faultinjection to identify weaknesses in failure recovery policies ❖ HTTP/gRPC error codes  ❖ Delay injection
  • 19.
    Rate Limiting Istio protectsyour application from rogue actors by imposing ratelimits Rate limit ❖ Configurable limits with overrides ❖ Multiple rate limiting backends ❖ Conditional rate limiting Quotas: - name: requestcount.quota.istio-system maxAmount: 5000 validDuration: 1s overrides: - dimensions: destination: ratings source: reviews sourceVersion: v3 maxAmount: 1 validDuration: 1s - dimensions: destination: ratings maxAmount: 100 validDuration: 1s
  • 20.
    Telemetry Monitoring & tracingshould not be an afterthought in the infrastructure Goals ● Metrics without instrumenting apps ● Consistent metrics across fleet ● Trace flow of requests across services ● Portable across metric backend providers
  • 21.
  • 22.
  • 23.