Reactive mistakes - ScalaDays Chicago 2017

1. MANCHESTER LONDON NEW YORK

2. Petr Zapletal @petr_zapletal #scaladays @cakesolutions Top Mistakes When Writing Reactive Applications

3. Agenda ● Motivation ● Actors vs Futures ● Serialization ● Flat Actor Hierarchies ● Graceful Shutdown ● Distributed Transactions ● Longtail Latencies ● Quick Tips

4. Actors vs Futures Constraints Liberate, Liberties Constrain

5. Pick the Right Tool for The Job Scala Future[T] Akka ACTORS Power Constraints Akka Stream

6. Pick the Right Tool for The Job Scala Future[T] Akka ACTORS Power Constraints Akka TYPED

7. Pick the Right Tool for The Job Scala Future[T] Akka TYPED Akka ACTORS Power Constraints Akka Stream

8. Pick the Right Tool for The Job Scala Future[T] Local Abstractions Distribution Akka TYPED Akka ACTORS Power Constraints Akka Stream

9. Actor Use Cases ● State management ● Location transparency ● Resilience mechanisms ● Single writer ● In-memory lock-free cache ● Sharding Akka ACTOR

10. Future Use Cases ● Local Concurrency ● Simplicity ● Composition ● Typesafety Scala Future[T]

11. Avoid Java Serialization Java Serialization is the default in Akka, since it is easy to start with it, but is very slow and footprint heavy

12. Akka ACTOR Sending Data Through Network Serialization Serialization Akka ACTOR

13. Persisting Data Akka ACTOR Serialization

14. Java Serialization - Round Trip

15. Java Serialization - Footprint

16. Java Serialization - Footprint case class Order (id: Long, description: String, totalCost: BigDecimal, orderLines: ArrayList[OrderLine], customer: Customer) Java Serialization: ----sr--model.Order----h#-----J--idL--customert--Lmodel/Customer;L--descriptiont--Ljava/lang/String;L--orderLinest--Ljava/util /List;L--totalCostt--Ljava/math/BigDecimal;xp--------ppsr--java.util.ArrayListx-----a----I--sizexp----w-----sr--model.OrderLine-- &-1-S----I--lineNumberL--costq-~--L--descriptionq-~--L--ordert--Lmodel/Order;xp----sr--java.math.BigDecimalT--W--(O---I--s caleL--intValt--Ljava/math/BigInteger;xr--java.lang.Number-----------xp----sr--java.math.BigInteger-----;-----I--bitCountI--bitLe ngthI--firstNonzeroByteNumI--lowestSetBitI--signum[--magnitudet--[Bxq-~----------------------ur--[B------T----xp----xxpq-~--x q-~-- XML: <order id="0" totalCost="0"><orderLines lineNumber="1" cost="0"><order>0</order></orderLines></order> JSON: {"order":{"id":0,"totalCost":0,"orderLines":[{"lineNumber":1,"cost":0,"order":0}]}}

17. Java Serialization Implementation ● Serializes ○ Data ○ Entire class definition ○ Definitions of all referenced classes ● It just “works” ○ Serializes almost everything (what implements Serializable) ○ Works with different JVMs ● Performance was not the main requirement

18. Points of Interest ● Performance ● Footprint ● Schema evolution ● Implementation effort ● Human readability ● Language bindings ● Backwards & forwards compatibility ● ...

19. JSON ● Advantages: ○ Human readability ○ Simple & well known ○ Many good libraries for all platforms ● Disadvantages: ○ Slow ○ Large ○ Object names included ○ No schema (except e.g. json schema) ○ Format and precision issues ● json4s, circe, µPickle, spray-json, argonaut, rapture-json, play-json, …

20. Binary formats [Schema-less] ● Metadata send together with data ● Advantages: ○ Implementation effort ○ Performance ○ Footprint * ● Disadvantages: ○ No human readability ● Kryo, Binary JSON (MessagePack, BSON, ... )

21. Binary formats [Schema] ● Schema defined by some kind of DSL ● Advantages: ○ Performance ○ Footprint ○ Schema evolution ● Disadvantages: ○ Implementation effort ○ No human readability ● Protobuf (+ projects like Flatbuffers, Cap’n Proto, etc.), Thrift, Avro

22. Summary ● Should be always changed ● Depends on particular use case ● Quick tips: ○ json4s ○ kryo ○ protobuf

23. Flat Actor Hierarchies Errors should be handled out of band in a parallel process - they are not part of the main app

24. Top Level Actors The Actor Hierarchy /a1 /a2

25. Top Level Actors The Actor Hierarchy /a1 /a2 Root Actor /user

26. Top Level Actors The Actor Hierarchy /a1 /a2 /b1 /b2 Root Actor /c4/c3/c2/c1 /user

27. Top Level Actors The Actor Hierarchy /a1 /a2 /b1 /b2 Root Actor /c4/c3/c2/c1 /user / /system

28. Two Different Battles to Win ● Separate business logic and failure handling ○ Less complexity ○ Better supportability ● Getting our application back to life after something bad happened ○ Failure isolation ○ Recovery ○ No more midnight calls :) ---> no more midnight calls :)

29. Errors & Failures Errors ● Common events ● The current request is affected ● Will be communicated with the client/caller ● Incorrect requests, errors during validations, ... Failures ● Unexpected events ● Service/actor is not able to operate normally ● Reports to supervisor ● Client can’t do anything, might be notified ● Database failures, network partitions, hardware malfunctions, ...

30. Error Kernel Pattern ● Actor’s state is lost during restart and may not be recovered ● Delegating dangerous tasks to child actors and supervise them /user/ a1 /user/ a1 /user/ a1/w1 /user/ a1 /user/ a1/w1

31. Backoff Supervisor ● Restarts actors each time with a growing time delay between restarts BackoffSupervisor.props( Backoff.onFailure( childProps, childName = "foo", minBackoff = 3.seconds, maxBackoff = 30.seconds, randomFactor = 0.2 ))

36. Summary ● Create rich actor hierarchies ● Separate business logic and failure handling ● Backoff Supervisor

37. Graceful Shutdown We have thousands of sharded actors on multiple nodes and we want to shut one of them down

38. Graceful Shutdown

39. High-level Procedure

40. High-level Procedure 1. JVM gets the shutdown signal

41. High-level Procedure 1. JVM gets the shutdown signal 2. Coordinator tells all local ShardRegions to shut down gracefully

42. High-level Procedure 1. JVM gets the shutdown signal 2. Coordinator tells all local ShardRegions to shut down gracefully 3. Node leaves cluster

43. High-level Procedure 1. JVM gets the shutdown signal 2. Coordinator tells all local ShardRegions to shut down gracefully 3. Node leaves cluster 4. Coordinator gives singletons a grace period to migrate

44. High-level Procedure 1. JVM gets the shutdown signal 2. Coordinator tells all local ShardRegions to shut down gracefully 3. Node leaves cluster 4. Coordinator gives singletons a grace period to migrate 5. Actor System & JVM Termination

45. Integration with Sharded Actors ● Handling of added messages ○ Passivate() message for graceful stop ○ Context.stop() for immediate stop ● Priority mailbox ○ Priority message handling ○ Message retrying support

46. CoordinatedShutdown Extension ● Stops actors/services in a specific order ● Allows to register tasks and execute them during the shutdown ● More generic approach ● Added in Akka 2.5 (~ a week ago)

47. Summary ● We don’t want to lose data (usually) ● Shutdown coordinator on every node & Integration with sharded actors ● Akka’s CoordinatedShutdown

48. Distributed Transactions Any situation where a single event results in the mutation of two separate sources of data which cannot be committed atomically

49. What’s Wrong With Them ● Simple happy paths ● Fallacies of Distributed Programming ○ The network is reliable. ○ Latency is zero. ○ Bandwidth is infinite. ○ The network is secure. ○ Topology doesn't change. ○ There is one administrator. ○ Transport cost is zero. ○ The network is homogeneous.

50. Two-phase commit (2PC) Stage 1 - Prepare Stage 2 - Commit Prepare Prepared Prepare Prepared Com m it Com m itted Commit Committed Resource Manager Resource Manager Transaction Manager Resource Manager Resource Manager Transaction Manager

51. Saga Pattern T1 T2 T3 T4 C1 C2 C3 C4

52. The Big Trade-Off ● Distributed transactions can be usually avoided ○ Hard, expensive, fragile and do not scale ● Every business event needs to result in a single synchronous commit ● Other data sources should be updated asynchronously ● Introducing eventual consistency

53. Longtail Latencies Consider a system where each service typically responds in 10ms but with a 99th percentile latency of one second

54. Longtail Latencies Latency Normal vs. Longtail Legend: Normal Longtail 50 40 30 20 10 0 25 50 75 90 99 99.9 Latency(ms) Percentile

55. Longtails really matter ● Latency accumulation ● Not just noise ● Don’t have to be power users ● Real problem

56. Investigating Longtail Latencies ● Narrow the problem ● Isolate in a test environment ● Measure & monitor everything ● Tackle the problem ● Pretty hard job

57. Tolerating Longtail Latencies

58. Tolerating Longtail Latencies ● Hedging your bet

59. Tolerating Longtail Latencies ● Hedging your bet ● Tied requests

60. Tolerating Longtail Latencies ● Hedging your bet ● Tied requests ● Selectively increase replication factors

61. Tolerating Longtail Latencies ● Hedging your bet ● Tied requests ● Selectively increase replication factors ● Put slow machines on probation

62. Tolerating Longtail Latencies ● Hedging your bet ● Tied requests ● Selectively increase replication factors ● Put slow machines on probation ● Consider ‘good enough’ responses

63. Tolerating Longtail Latencies ● Hedging your bet ● Tied requests ● Selectively increase replication factors ● Put slow machines on probation ● Consider ‘good enough’ responses ● Hardware update

64. Quick Tips

65. Quick Tips ● Monitoring

66. Quick Tips ● Monitoring ● Network partitions

67. Quick Tips ● Monitoring ● Network partitions ○ Split Brain Resolver

68. Quick Tips ● Monitoring ● Network partitions ○ Split Brain Resolver ● Blocking

69. Quick Tips ● Monitoring ● Network partitions ○ Split Brain Resolver ● Blocking ● Too many actor systems

70. Questions MANCHESTER LONDON NEW YORK

71. MANCHESTER LONDON NEW YORK @petr_zapletal @cakesolutions 347 708 1518 petrz@cakesolutions.net We are hiring http://www.cakesolutions.net/careers

72. References ● http://www.reactivemanifesto.org/ ● http://www.slideshare.net/ktoso/zen-of-akka ● http://eishay.github.io/jvm-serializers/prototype-results-page/ ● http://java-persistence-performance.blogspot.com/2013/08/optimizing-java-serialization-java-vs.html ● https://github.com/romix/akka-kryo-serialization ● http://gotocon.com/dl/goto-chicago-2015/slides/CaitieMcCaffrey_ApplyingTheSagaPattern.pdf ● http://www.grahamlea.com/2016/08/distributed-transactions-microservices-icebergs/ ● http://www.cs.duke.edu/courses/cps296.4/fall13/838-CloudPapers/dean_longtail.pdf ● https://engineering.linkedin.com/performance/who-moved-my-99th-percentile-latency ● http://doc.akka.io/docs/akka/rp-15v09p01/scala/split-brain-resolver.html ● http://manuel.bernhardt.io/2016/08/09/akka-anti-patterns-flat-actor-hierarchies-or-mixing-business-logic-a nd-failure-handling/

73. Backup Slides MANCHESTER LONDON NEW YORK

74. Adding Shutdown Hook val nodeShutdownCoordinatorActor = system.actorOf(Props( new NodeGracefulShutdownCoordinator(...))) sys.addShutdownHook { nodeShutdownCoordinatorActor ! StartNodeShutdown(shardRegions) }

77. Tell Local Regions to Shutdown when(AwaitNodeShutdownInitiation) { case Event(StartNodeShutdown(shardRegions), _) => if (shardRegions.nonEmpty) { // starts watching of every shard region and sends GracefulShutdown msg to them stopShardRegions(shardRegions) goto(AwaitShardRegionsShutdown) using ManagedRegions(shardRegions) } else { // registers OnMemberRemoved and leaves the cluster leaveCluster() goto(AwaitClusterExit) } }

81. Node Leaves the Cluster when(AwaitShardRegionsShutdown, stateTimeout = ... ){ case Event(Terminated(actor), ManagedRegions(regions)) => if (regions.contains(actor)) { val remainingRegions = regions - actor if (remainingRegions.isEmpty) { leaveCluster() goto(AwaitClusterExit) } else { goto(AwaitShardRegionsShutdown) using ManagedRegions(remainingRegions) } } else { stay() } case Event(StateTimeout, _) => leaveCluster() goto(AwaitNodeTerminationSignal) }

84. Wait for Singletons to Migrate when(AwaitClusterExit, stateTimeout = ...) { case Event(NodeLeftCluster | StateTimeout, _) => // Waiting on cluster singleton migration goto(AwaitClusterSingletonMigration) } when(AwaitClusterSingletonMigration, stateTimeout = ... ) { case Event(StateTimeout, _) => goto(AwaitNodeTerminationSignal) } onTransition { case AwaitClusterSingletonMigration -> AwaitNodeTerminationSignal => self ! TerminateNode }

88. Actor System & JVM Termination when(AwaitNodeTerminationSignal, stateTimeout = ...) { case Event(TerminateNode | StateTimeout, _) => // This is NOT an Akka thread-pool (since we're shutting those down) val ec = scala.concurrent.ExecutionContext.global // Calls context.system.terminate with registered onComplete block terminateSystem { case Success(ex) => System.exit(...) case Failure(ex) => System.exit(...) }(ec) stop(Shutdown) }

Reactive mistakes - ScalaDays Chicago 2017

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Reactive mistakes - ScalaDays Chicago 2017

Similar to Reactive mistakes - ScalaDays Chicago 2017 (20)

More from Petr Zapletal

More from Petr Zapletal (9)

Recently uploaded

Recently uploaded (20)

Reactive mistakes - ScalaDays Chicago 2017