Microservices reativos usando
a stack do Netflix na AWS
Diego Pacheco
Principal Software Architect at ilegra.com
@diego_pacheco
www.ilegra.com
NetflixOSS
Stack
Why Netflix?
 Billions Requests Per Day
 1/3 US internet
bandwidth
 ~10k EC2 Instances
 Multi-Region
 100s Microservices
 Innovation + Solid
Service
 SOA, Microservices and
DevOps Benchmark
 Social Product
 Social Network
 Video
 Docs
 Apps
 Chat
Scalability
Distributed Teams
Could reach some
Web Scale
Netflix My Problem
AWS
Cloud Native
Principles
 Stateless Services
 Ephemeral Instances
 Everything fails all the
time
 Auto Scaling / Down
Scaling
 Multi AZ and multi
Region
 No SPOF
 Design for Failure
(expected)
 SOA
 Microservices
 No Central Database
 NoSQL
 Lightweight Serializable
Objects
 Latency tolerant
protocols
 DevOps Enabler

Immutable Infrastructure

Anti-Fragility
Right Set of
Assumptons
Microservices
Reactive
Java Drivers X REST
X
Simple View of the
Architecture
Zuul
UI
Microservice
Cassandra
Cluster
Stack
OSS
Zuul
Zuul
Karyon: Microbiology - Nucleus
 Reactive Extensions + Netty Server
 Lower Latency under Heavy Load
 Fewer Locks, Fewer Thread Migrations
 Consumes Less CPU
 Lower Object Allocation Rate
RxNetty
Karyon:
CODE
Karyon:
Reactive
Karyon:
Reactive
Eureka and Service
Discovery
http://microservices.io/patterns/server-side-discovery.html
Eureka
 AWS Service Registry for Mid-tier
 Load balancing and Failover
 REST based
 Karyon and Ribbon Integration
Eureka
Eureka and Service
Discovery
Availability
Histryx
 IPC Library
 Client Side Load Balancing
 Multi-Protocol (HTTP, TCP, UDP)
 Caching*
 Batching
 Reactive
Ribbon
Ribbon
CODE
Ribbon
CODE
 Reactive Extension of the JVM
 Async/Event based programming
 Observer Pattern
 Less 1mb
 Heavy usage by Netflix OSS Stack
RX-Java
Archaius
 Configuration Management Solution
 Dynamic and Typed Properties
 High Throughtput and Thread Safety
 Callbacks: Notifications of config changes
 JMX Beans
 Dynamic Config Sources: File, Db, DynamoDB, Zookeper
 Based on Apache Commons Configuration
Archaius + Git
MicroserviceMicroservice Slave Side Car
Central
Internal GIT
Property
Files
File
System
MicroserviceMicroservice Slave Side Car
File
System
MicroserviceMicroservice Slave Side Car
File
System
Asgard
Asgard
Packer
JOB
Create
Bake/Provision
Launch
Deploys
Dynomite: Distributed Cache
https://github.com/Netflix/dynomite
Dynomite
 Implements the Amazon Dynamo
Similar to Cassandra, Riak and DynamoDB
Strong Consistency – Quorum-like – No Data Loss
Pluggable
Scalable
Redis / Memcached
Multi-Clients with Dyno
Can use most of redis commands
Integrated with Eureka via Prana
Dynomite:
Internals
Oregon D1
Oregon D2
N California D3
Eureka Server
Eureka Server
Prana
Prana
Prana
Multi-Region
Cluster
Dynomite:
CODE
Dynomite
Contributions
https://github.com/Netflix/dynomite
https://github.com/Netflix/dynomite/pull/207
https://github.com/Netflix/dynomite/pull/200
Caos
Engineering
 Isolate Failure – Avoid cascading
 Redundancy – NO SPOF
 Auto-Scaling
 Fault Tolerance and Isolation
 Recovery
 Fallbacks and Degraded Experience
 Protect Customer from failures – Don’t throw Failures ->
Failures VS Errors
Chaos / Failure
Gatling
Stress Testing
Tool
Scala DSL
Run on top of Akka
Simple to use
Chaos
Arch
Zuul
Microservice N1 Microservice N2
Cassandra Cluster
Zuul
Eureka
ELB
Running…
Chaos Results and Learnings
 Retry configuration and Timeouts in Ribbon
 Right Class in Zuul 1.x (default retry only SocketException)

RequestSpecificRetryHandler (Httpclient Exceptions)

zuul.client.ribbon.MaxAutoRetries=1

zuul.client.ribbon.MaxAutoRetriesNextServer=1

zuul.client.ribbon.OkToRetryOnAllOperations=true
Eureka Timeouts
 It Works
 Everything needs to have redudancy
 ASG is your friend :-)
 Stateless Service FTW
Microservice Producer
Kafka / Storm :: Event System
Chaos Results and Learnings
Before:

Data was not in Elastic Search

Producers was loosing data

After:

No Data Loss

It Works

Changes:

No logging on Microservice :( (Log was added)

Code that publish events on a try-catch

Retry config in kafka producer from 0 to 5
Main
Challenges
Hacker
Mindset
Next Steps
 IPC
 Spinnaker
 Containers
 Client side Aggregation
 DevOps 2.0 -> Remediation / Skynet
Pocs
https://github.com/diegopacheco/netflixoss-pocs
http://diego-pacheco.blogspot.com.br/search/label/netflix?max-results=30
Microservices reativos usando
a stack do Netflix na AWS
Diego Pacheco
Principal Software Architect at ilegra.com
@diego_pacheco
Obrigado!

Microservices reativos usando a stack do Netflix na AWS