Reactive mistakes reactive nyc

Petr Zapletal @petr_zapletal
#ReactiveNYC
@cakesolutions
Top Mistakes When Writing Reactive
Applications

Agenda
● Motivation
● Actors vs Futures
● Serialization
● Flat Actor Hierarchies
● Graceful Shutdown
● Distributed Transactions
● Longtail Latencies
● Quick Tips

Actors vs Futures
Constraints Liberate, Liberties Constrain

Pick the Right Tool for The Job
Scala
Future[T]
Akka
ACTORS
Power
Constraints
Akka
Stream

Scala
Future[T]
Akka
ACTORS
Power
Constraints
Akka
TYPED

Scala
Future[T] Akka
TYPED
Akka
ACTORS
Power
Constraints
Akka
Stream

Scala
Future[T]
Local Abstractions Distribution
Akka
TYPED
Akka
ACTORS
Power
Constraints
Akka
Stream

Actor Use Cases
● State management
● Location transparency
● Resilience mechanisms
● Single writer
● In-memory lock-free cache
● Sharding
Akka
ACTOR

Future Use Cases
● Local Concurrency
● Simplicity
● Composition
● Typesafety
Scala
Future[T]

Avoid Java Serialization
Java Serialization is the default in Akka, since
it is easy to start with it, but is very slow and
footprint heavy

Akka
ACTOR
Sending Data Through Network
Serialization Serialization
Akka
ACTOR

Persisting Data
Akka
ACTOR
Serialization

Java Serialization - Round Trip

Java Serialization - Footprint

Java Serialization - Footprint
case class Order (id: Long, description: String, totalCost: BigDecimal, orderLines: ArrayList[OrderLine], customer: Customer)
Java Serialization:
----sr--model.Order----h#-----J--idL--customert--Lmodel/Customer;L--descriptiont--Ljava/lang/String;L--orderLinest--Ljava/util
/List;L--totalCostt--Ljava/math/BigDecimal;xp--------ppsr--java.util.ArrayListx-----a----I--sizexp----w-----sr--model.OrderLine--
&-1-S----I--lineNumberL--costq-~--L--descriptionq-~--L--ordert--Lmodel/Order;xp----sr--java.math.BigDecimalT--W--(O---I--s
caleL--intValt--Ljava/math/BigInteger;xr--java.lang.Number-----------xp----sr--java.math.BigInteger-----;-----I--bitCountI--bitLe
ngthI--firstNonzeroByteNumI--lowestSetBitI--signum[--magnitudet--[Bxq-~----------------------ur--[B------T----xp----xxpq-~--x
q-~--
XML:
<order id="0" totalCost="0"><orderLines lineNumber="1" cost="0"><order>0</order></orderLines></order>
JSON:
{"order":{"id":0,"totalCost":0,"orderLines":[{"lineNumber":1,"cost":0,"order":0}]}}

Java Serialization Implementation
● Serializes
○ Data
○ Entire class definition
○ Definitions of all referenced classes
● It just “works”
○ Serializes almost everything (what implements Serializable)
○ Works with different JVMs
● Performance was not the main requirement

Points of Interest
● Performance
● Footprint
● Schema evolution
● Implementation effort
● Human readability
● Language bindings
● Backwards & forwards compatibility
● ...

JSON
● Advantages:
○ Human readability
○ Simple & well known
○ Many good libraries
for all platforms
● Disadvantages:
○ Slow
○ Large
○ Object names included
○ No schema (except e.g. json
schema)
○ Format and precision issues
● json4s, circe, µPickle, spray-json, argonaut, rapture-json, play-json, …

Binary formats [Schema-less]
● Metadata send together with data
● Advantages:
○ Implementation effort
○ Performance
○ Footprint *
● Disadvantages:
○ No human readability
● Kryo, Binary JSON (MessagePack, BSON, ... )

Binary formats [Schema]
● Schema defined by some kind of DSL
● Advantages:
○ Performance
○ Footprint
○ Schema evolution
● Disadvantages:
○ Implementation effort
○ No human readability
● Protobuf (+ projects like Flatbuffers, Cap’n Proto, etc.), Thrift, Avro

Summary
● Should be always changed
● Depends on particular use case
● Quick tips:
○ json4s
○ kryo
○ protobuf

Flat Actor Hierarchies
Errors should be handled out of band in a
parallel process - they are not part of the
main app

Top Level Actors
The Actor Hierarchy
/a1 /a2

Top Level Actors
The Actor Hierarchy
/a1 /a2
Root Actor
/user

Top Level Actors
The Actor Hierarchy
/a1 /a2
/b1 /b2
Root Actor
/c4/c3/c2/c1
/user

Top Level Actors
The Actor Hierarchy
/a1 /a2
/b1 /b2
Root Actor
/c4/c3/c2/c1
/user
/
/system

Two Different Battles to Win
● Separate business logic and failure handling
○ Less complexity
○ Better supportability
● Getting our application back to life after something bad happened
○ Failure isolation
○ Recovery
○ No more midnight calls :)
---> no more midnight calls :)

Errors & Failures
Errors
● Common events
● The current request is affected
● Will be communicated with the client/caller
● Incorrect requests, errors during validations, ...
Failures
● Unexpected events
● Service/actor is not able to operate normally
● Reports to supervisor
● Client can’t do anything, might be notified
● Database failures, network partitions, hardware
malfunctions, ...

Error Kernel Pattern
● Actor’s state is lost during restart and may not be recovered
● Delegating dangerous tasks to child actors and supervise them
/user/
a1
/user/
a1
/user/
a1/w1
/user/
a1
/user/
a1/w1

Summary
● Create rich actor hierarchies
● Separate business logic and failure handling
● Properly written, your application will be self-healing and incredibly
resilient

Graceful Shutdown
We have thousands of sharded actors on
multiple nodes and we want to shut one of
them down

High-level Procedure
1. JVM gets the shutdown signal

2. Coordinator tells all local ShardRegions to shut down gracefully

3. Node leaves cluster

4. Coordinator gives singletons a grace period to migrate

4. Coordinator gives singletons a grace period to migrate
5. Actor System & JVM Termination

Adding Shutdown Hook
val nodeShutdownCoordinatorActor = system.actorOf(Props(
new NodeGracefulShutdownCoordinator(...)))
sys.addShutdownHook {
nodeShutdownCoordinatorActor ! StartNodeShutdown(shardRegions)
}

Tell Local Regions to Shutdown
when(AwaitNodeShutdownInitiation) {
case Event(StartNodeShutdown(shardRegions), _) =>
if (shardRegions.nonEmpty) {
// starts watching of every shard region and sends GracefulShutdown msg to them
stopShardRegions(shardRegions)
goto(AwaitShardRegionsShutdown) using ManagedRegions(shardRegions)
} else {
// registers OnMemberRemoved and leaves the cluster
leaveCluster()
goto(AwaitClusterExit)
}
}

Node Leaves the Cluster
when(AwaitShardRegionsShutdown, stateTimeout = ... ){
case Event(Terminated(actor), ManagedRegions(regions)) =>
if (regions.contains(actor)) {
val remainingRegions = regions - actor
if (remainingRegions.isEmpty) {
leaveCluster()
goto(AwaitClusterExit)
} else {
goto(AwaitShardRegionsShutdown) using ManagedRegions(remainingRegions)
}
} else {
stay()
}
case Event(StateTimeout, _) =>
leaveCluster()
goto(AwaitNodeTerminationSignal)
}

Wait for Singletons to Migrate
when(AwaitClusterExit, stateTimeout = ...) {
case Event(NodeLeftCluster | StateTimeout, _) =>
// Waiting on cluster singleton migration
goto(AwaitClusterSingletonMigration)
}
when(AwaitClusterSingletonMigration, stateTimeout = ... ) {
case Event(StateTimeout, _) =>
goto(AwaitNodeTerminationSignal)
}
onTransition {
case AwaitClusterSingletonMigration -> AwaitNodeTerminationSignal =>
self ! TerminateNode
}

Actor System & JVM Termination
when(AwaitNodeTerminationSignal, stateTimeout = ...) {
case Event(TerminateNode | StateTimeout, _) =>
// This is NOT an Akka thread-pool (since we're shutting those down)
val ec = scala.concurrent.ExecutionContext.global
// Calls context.system.terminate with registered onComplete block
terminateSystem {
case Success(ex) =>
System.exit(...)
case Failure(ex) =>
System.exit(...)
}(ec)
stop(Shutdown)
}

Integration with Sharded Actors
● Handling of added messages
○ Passivate() message for graceful stop
○ Context.stop() for immediate stop
● Priority mailbox
○ Priority message handling
○ Message retrying support

Summary
● We don’t want to lose data (usually)
● Shutdown coordinator on every node
● Integration with sharded actors

Distributed Transactions
Any situation where a single event results in
the mutation of two separate sources of data
which cannot be committed atomically

What’s Wrong With Them
● Simple happy paths
● 7 Fallacies of Distributed Programming
○ The network is reliable.
○ Latency is zero.
○ Bandwidth is infinite.
○ The network is secure.
○ Topology doesn't change.
○ There is one administrator.
○ Transport cost is zero.
○ The network is homogeneous.

Two-phase commit (2PC)
Stage 1 - Prepare Stage 2 - Commit
Prepare
Prepared
Prepare
Prepared
Com
m
it
Com
m
itted
Commit
Committed
Resource
Manager
Resource
Manager
Transaction
Manager
Resource
Manager
Resource
Manager
Transaction
Manager

Saga Pattern
T1 T2 T3 T4
C1 C2 C3 C4

The Big Trade-Off
● Distributed transactions can be usually avoided
○ Hard, expensive, fragile and do not scale
● Every business event needs to result in a single synchronous commit
● Other data sources should be updated asynchronously
● Introducing eventual consistency

Longtail Latencies
Consider a system where each service
typically responds in 10ms but with a 99th
percentile latency of one second

Longtail Latencies
Latency Normal vs. Longtail
Legend:
Normal
Longtail
50
40
30
20
10
0
25 50 75 90 99 99.9
Latency(ms)
Percentile

Longtails really matter
● Latency accumulation
● Not just noise
● Don’t have to be power users
● Real problem

Investigating Longtail Latencies
● Narrow the problem
● Isolate in a test environment
● Measure & monitor everything
● Tackle the problem
● Pretty hard job

Tolerating Longtail Latencies
● Hedging your bet

● Tied requests

● Tied requests
● Selectively increase replication factors

● Tied requests
● Put slow machines on probation

● Tied requests
● Consider ‘good enough’ responses

● Tied requests
● Consider ‘good enough’ responses
● Hardware update

Quick Tips
● Monitoring
● Network partitions & Split Brain Resolver

Quick Tips
● Monitoring
● Blocking

Quick Tips
● Monitoring
● Blocking
● Too many actor systems

Questions
MANCHESTER LONDON NEW YORK

@petr_zapletal @cakesolutions
347 708 1518
petrz@cakesolutions.net
We are hiring
http://www.cakesolutions.net/careers

References
● http://www.reactivemanifesto.org/
● http://www.slideshare.net/ktoso/zen-of-akka
● http://eishay.github.io/jvm-serializers/prototype-results-page/
● http://java-persistence-performance.blogspot.com/2013/08/optimizing-java-serialization-java-vs.html
● https://github.com/romix/akka-kryo-serialization
● http://gotocon.com/dl/goto-chicago-2015/slides/CaitieMcCaffrey_ApplyingTheSagaPattern.pdf
● http://www.grahamlea.com/2016/08/distributed-transactions-microservices-icebergs/
● http://www.cs.duke.edu/courses/cps296.4/fall13/838-CloudPapers/dean_longtail.pdf
● https://engineering.linkedin.com/performance/who-moved-my-99th-percentile-latency
● http://doc.akka.io/docs/akka/rp-15v09p01/scala/split-brain-resolver.html
● http://manuel.bernhardt.io/2016/08/09/akka-anti-patterns-flat-actor-hierarchies-or-mixing-business-logic-a
nd-failure-handling/

Backup Slides

Reactive mistakes reactive nyc

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Reactive mistakes reactive nyc

Similar to Reactive mistakes reactive nyc (20)

More from Petr Zapletal

More from Petr Zapletal (7)

Recently uploaded

Recently uploaded (20)

Reactive mistakes reactive nyc