Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

## Introducing a reactive Scala-Akka based system in a Java centric company

658 views

Published on

## Introducing a reactive Scala-Akka based system in a Java centric company

Published in: Software
  • Be the first to comment

## Introducing a reactive Scala-Akka based system in a Java centric company

  1. 1. Introducing a reactive Scala-Akka based system in a Java centric company Basware Belgium NV Jeroen Verellen ( )@jeroen_v_ Milan Aleksić ( )@milanaleksic
  2. 2. Basware Metrics system and dashboard A journey through Akka and spray covering: actor development testing spray routing acceptance testing build micro benchmarking and more...
  3. 3. Agenda Business case Requirements Concepts Technologies Evolution through commit log Future changes Q&A
  4. 4. Business case We want to have a real time dashboard that shows the amount of documents coming in and going out (per channel). And also list the amount per document type. All of this should be made visible in a per hour view.
  5. 5. Requirements
  6. 6. Requirements Basically, a lightweight replacement for data warehouse / reporting tool Highly concurrent, non-blocking: no influence on other systems that generate the metrics The aggregated metrics should be stored so that the system can recover its state after a restart. The store should be simple.
  7. 7. Requirements The system should run on Java 8 runtime like all our other (newer) components Simple API that allows us to show metrics in dashing, since that is the dashboard technology of choice
  8. 8. Mockup
  9. 9. Some concepts we liked and planned on using
  10. 10. Pre-Aggregate Calculate a number of statistics while the metrics are coming in. The system does not store raw data but calculates e.g. how many times a service was called in the last hour. But, we didn't really use MongoDB in our case, this was just a design pattern we liked
  11. 11. Event sourcing Capture all changes to an application state as a sequence of events (Fowler). Instead of storing the latest application state, the system stores the event that changes the state. Upon a query for the application state, the state is rebuild from the events. Advantages: temporal query support event replay in case of bugs code changes / complete rebuild of state reverse / undo events Difficulties: mind shift interaction with other non-event sourced aplications
  12. 12. CQRS At its heart is the notion that you can use a different model to update information than the model you use to read information (Martin Fowler) Command Query Responsibility Segregation System interactions arrive in the form of commands some commands can be rejected (e.g. validation failure) successful commands are stored in the persistence layer Another part of the system can receive non-rejectable events replayed in order on the query object side to avoid cost of replay of all events, we use snapshots
  13. 13. CQRS It composes well with Eventsourcing and Actors Model It is not an architecture, it's a pattern!
  14. 14. Some technologies we liked and planned on using
  15. 15. Many reasons why we like Scala, just to list some: modern functional / OOP mix stable, but moving faster than Java traits > interfaces chaining and composing Futures case classes pattern matching All in all, more expressive in less code
  16. 16. Akka is a toolkit that promises scalability via Actors each actor can have internal state which can be changed only via messages which are executed in order supervision strategies, event buses, remote actors, cluster... Vertical and horizontal scalability using single paradigm Cons: paradigm switch, takes time to learn a general concurrency pattern, but doesn't fit every usage case
  17. 17. Some of the good sides: builds on Akka, Akka IO both client side and serve side APIs case classes for requests, status codes declarative routing DSL Spray library is going to be republished as akka‐http Most of the things we done could (and should) be migrated to akka-http as this new library becomes GA Important thing it currently can't do: WebSockets DSL can cause compile issues in IDEA JSON support relies heavily on implicits, making it hard to debug
  18. 18. Pure Java client We (obviously) needed a way to push ("report") the metrics data to the server What we came up with is simple: in-memory bounded queue of sending tasks The client side needs to be lightweight and non-intrusive we would also batch increments before sending them we delegated implementation of this to Joeri, our colleague
  19. 19. ScalaTest Functional Many test styles FlatSpec for unit testing FeatureSpec for AT Many more available WordSpec ...
  20. 20. Typesafe Config Configuration library for JVM languages. No dependencies Java properties, JSON, and a human-friendly JSON superset Support for nesting
  21. 21. Evolution through commit log
  22. 22. Contract first: define JSON in/out of the API Use simple .sbt file commit dc597f5a959d844c1c98b459ce5db5192b3dfc9a Date: Mon Oct 20 16:47:04 2014 +0200 project structure + schemas commit 81956d778e1bafc0e0decd2a74dd7e8d0e9ce5cf Date:   Mon Oct 20 17:15:36 2014 +0200 allow to post multiple metrics
  23. 23. New libraries %% notation of dependencies Typesafe config Spray spray-can spray-routing spray-json commit 87d3f08716a29922a26df82ec1f23e552bd18ed1 Date:   Wed Oct 22 11:19:32 2014 +0200 use base unit test class ­ DRY commit 1896d9d7229bbe5fd7a038c1c6661fe3c5ae8740 Date:   Wed Oct 22 11:04:39 2014 +0200 first implementation of a route for handling metrics posts
  24. 24. Unit and acceptance testing TDD from here on Use API example in test (test the documentation) Scala test, Spray-client, Akka-TestKit Re-use spray-json on client side Server and client run in same Actor System commit ddb2d5a8fd47b9042d906ca75c26eab9f0483d88 Date:   Wed Oct 22 17:51:32 2014 +0200 add first acceptance test commit 0ada7ce978f28ecd58d077e3f10785e0ad8e559c Date:   Thu Oct 23 13:29:20 2014 +0200 add test for invalid metrics commit 657264839cdb948df508b0fefb955ef2a89422a2 Date:   Thu Oct 23 12:33:21 2014 +0200 split of the AT tests, using example from API def in test
  25. 25. SBT setup remodeling Script -> Scala Stolen from Spray clearer separation modules, dependencies, build settings commit b204e81c6c42837792e6292f4d6a494a5efa1cfa Date:   Thu Oct 23 11:28:33 2014 +0200 improve sbt setup, stole from spray setup , , ,JSON Add Metric SPRAY REST route JSON support Unit and Acceptance testing
  26. 26. Dashing Re-use company standard for dashboards Easier adoption in the company Ruby gem (Sinatra, Batman.js, CoffeeScript, SCSS... full hipster) Currently running on our dashing server It's Ruby but simple enough
  27. 27. Actor tree overview CounterMetricsActor CounterMetricsActorName (metric "outgoing­AS2") CounterMetricsIntervalActor YEAR = 2015 CounterMetricsIntervalActor MONTH = 11 CounterMetricsIntervalActor DAY = 11 CounterMetricsIntervalActor HOUR= 01 BUCKETS=[00..59] CounterMetricsActorName (metric "incoming­HTTP") CounterMetricsActorName (metric "message­type­invoice") CounterMetricsIntervalActor YEAR = 2014 CounterMetricsIntervalActor YEAR = 2015 CounterMetricsIntervalActor YEAR = 2015 ......... ......... ......... CounterMetricsIntervalActor MONTH = 10 CounterMetricsIntervalActor MONTH = 12 CounterMetricsIntervalActor DAY = 10 CounterMetricsIntervalActor HOUR= 02 BUCKETS=[00..59] CounterMetricsIntervalActor HOUR= 03 BUCKETS=[00..59] ......... ......... .........
  28. 28. Shaping the Actor System Documentation, testing "Add metrics" / "Report metrics" calls introduced commit 182b3394db9c628a2ab26a62c6eeb351a360657c Date:   Thu Oct 23 20:39:35 2014 +0200 add post method for getting metric reports
  29. 29. Shaping the Actor System We put a thin facade between the API and the actors Case classes / model on InternalAPI DTO objects on ExternalAPI we used Cmd and Query as suffixes for commands and queries Starting from this commit Milan starts getting more involved with server Scala side commit 6326978260986f90dd8cb0796ef105cd945aded2 Date:   Mon Oct 27 12:43:32 2014 +0100 Introducing internal API class. Replacing Request/Response case classes  with command/view case classes. Using ScalaMock to test the api entry point commit 1601c2de3a0d8b69fc683930d7f1e3235ba09604 Date:   Mon Oct 27 14:03:30 2014 +0100 CR­based improvements commit 3fc2363817c6b013a26feaef783835fc89dd48ba Date:   Mon Oct 27 15:40:48 2014 +0100 making first Command & Query case classes in the place of incoming DTO objects CR-1615 CR-1618
  30. 30. Shaping the Actor System, child instantiation & caching First actor tree Keep your own reference cache Create children Watch children Act on "Terminated" commit a82584263da9d0dae8a54bc1dd510a95bbc45790 Date:   Mon Oct 27 17:36:39 2014 +0100 start with actor tree commit ac3167bf9b9d491235e9a8c97ee89b24e20d92f3 Date:   Tue Oct 28 11:18:22 2014 +0100 handle unknown messages, check cache is empty commit c776d5fcf814372350f3d9d9121e2858342efe58 Date:   Tue Oct 28 13:11:01 2014 +0100 add child factory with default impl commit 43cbe21a418cda4d01d767cd2aeccbfc4b263107 Date:   Tue Oct 28 17:07:48 2014 +0100 add support for query reports
  31. 31. Akka Persistence Design Overview AkkaInteralApi implements InternalApi CounterMetricsActor (1) AddMetricsCmd / <no reply> (1) ReportMetricsQuery (12) ReportMetricsQueryResult CounterMetricsActorName (metric "bw­incoming­ONP") (2) RecordCounterMetric (2) GetCounterMetricReport (11) CounterMetricReport CounterMetricsIntervalActor YEAR = 2014 CounterMetricsIntervalActor MONTH = 11 .......... CounterMetricsIntervalActor DAY = 11 CounterMetricsIntervalActor HOUR= 01 BUCKETS=[00..59] .......... .......... .......... (4) RecordCounterMetric (5) RecordCounterMetric (6) RecordCounterMetric (3) RecordCounterMetric (6) GetCounterMetricReport (7) CounterMetricReport (5) GetCounterMetricReport (8) CounterMetricReport (4) GetCounterMetricReport (9) CounterMetricReport (3) GetCounterMetricReport (10) CounterMetricReport ......... There is only one. It caches actor children per metric name Each HTTP request can contain multiple metric queries. Each metric query makes a single "TimeScopeQuery" tree structure that gets partially processed by adequate children actors through GetCounterMetricReport message. Result of all queries is processed in async and gathered as a single HTTP response Both Actor and Interval actors have Interval actor children (cached by their value, eg. hour or minute number). These children actors can die if they are not queried for long enough period. Only if there is no more children can a parent actor die. Actors can be revived to their previous state via akka­persistence. This actor sends a scheduled message to its children to SNAPSHOT their current state (an optimization in CQRS systems to make "reviving" faster) This actor is special because it keeps minutes' information inside "buckets" (a hashmap) making it our minimal possible precision Persistent actors, keep state between restarts. store events, delete old events create snapshot, delete old snapshots MetricsExternalApi addMetrics getReports Exposes a REST/JSON interface towards metrics clients
  32. 32. Journal Plugin (in memory) Keeps track of the last few events, rest is thrown away Snapshot Plugin (file system) Keep the last snapshot, throw away older snapshots Persistent framework.
  33. 33. Introducing Akka Persistence Choosing plugins Journal in memory, Snapshots on disk PersistentActor becomes receiveRecover: SnapshotOffer and/or Journal entry receiveCommand: work and call persist commit c67826ade4b8b32dd7b67cd5c05e06529b0cfedb Date:   Tue Oct 28 10:40:05 2014 +0100 adding akka persistence dependency
  34. 34. Introducing Akka Persistence Initial commit where we tried to split actors into name- and time- based ones: root actor in tree decides how to delegate based on name (of the metric) second level (and deeper) decide based on time There was still lot of work to be done commit 5dbff87119593cba7db6eef32b595135317bc17f Date:   Tue Oct 28 16:00:56 2014 +0100 time scope actors introduced
  35. 35. Run localhost Main class Easier startup from IDE Local testing front end Client simulator Easier local testing Used for load testing later on commit c907474d7846865ddc277550b3a6cdaaf36c4ee8 Date:   Wed Oct 29 09:02:52 2014 +0100 have our own main: easier for standalone or IDE usage commit 2954db34ae44b52d9b4087e7ba74118871ea1f70 Date:   Wed Oct 29 10:03:10 2014 +0100 utilizing localhost from a running project, correcting URLs commit 463499a493f575b3817313640014a1c45431da09 Date:   Wed Oct 29 10:23:36 2014 +0100 add client simulator
  36. 36. Extend report/query functionality Still waging war with the journal - too much logging Stabilizing number of "with"s we are using with common trait BaseMetricsActor Pre-calculation of the "query tree" when query comes in Delegation to children only when needed BucketScope vs RangedScope vs FullScope Still to find better Scala-idiomatic way to do it commit 3d92414b3c972b4b91c608287929e76a63e84a38 Date:   Fri Oct 31 11:52:04 2014 +0100 query recording runs across year/month/date actors ... and many others
  37. 37. Packaging First idea: Settled for: uber-jar Akka micro kernel commit 1c0c8cebd6cf00cd6c748bd9d1c5fddb4a0f7ab3 Date:   Mon Nov 3 13:21:24 2014 +0100 allow assembly and dependency­tree plugins
  38. 38. Akka Persistence Part II System scheduler: trigger snapshot creation "Make snapshot" message sent from root actor Each actor in tree is able to fanout to its children Each actor sends PoisonPill to himself after making a snapshot this decision will have performance consequences Keep your own children's references interesting bugfix by Jeroen commit 5ad1d4e36fcc46919c125950ea7399256635e989 Date:   Mon Nov 3 17:00:04 2014 +0000 snapshots should work from this point on commit 7d588f29fa6f222bd1fa54a82fde3e932106c22f Date:   Tue Nov 4 12:52:08 2014 +0100 Fix fanout of MakeSnapshot use the cache to get the children since in testing the children are not registered Commit 1 Commit 2
  39. 39. Akka Persistence and Acceptance testing Random snapshot directory for Acceptance testing Remove snapshot directory after test Execute cleanup from SBT commit 8b577a80dbb4c0a23281bde25673d1927a563365 Date:   Tue Nov 4 11:55:26 2014 +0000 some randomization introduced into the system and snapshot directory removal moved to a SBT task
  40. 40. Fixing memory issues Related to performance issue actor got killed on every snapshot revival of actor is IO intensive once a minute peaks in VisualVM Make sure actors die after 30 minutes Revive and restore state when needed commit a8750fa7a3ac3557a4028f580b216a9383d32610 Date:   Wed Nov 5 12:00:26 2014 +0100 Make sure actors die if not used for 30 minutes: constrain memory usage
  41. 41. Fixing storage issues Snapshot was made even for actors with empty state Check if state is dirty before save snapshot On SaveSnapshotSuccess clean old snapshots based on metadata delete journal entries commit 2016179d8b43698444511ebdcbf3e81fadf8f140 Date:   Wed Nov 5 18:00:53 2014 +0100 first go at cleanup of snapshots and journal entries commit 2b8f101c04d0cb87b585c63e94cdb779d15f15ce Date:   Thu Nov 6 08:57:25 2014 +0100 improved snapshot cleanup commit 7fdae48fbc5ca7f2be5a4ad349521b065b0fca81 Date:   Thu Nov 6 10:34:21 2014 +0100 added test for snapshot cleanup
  42. 42. Actor supervision Poison pill / suicide turned into massacre ;-) Stop exceptions from bubbling up Stop actor in case of failure Revive and restore state in case needed commit dea72b1fa0db5e05220496224c5e3c62db477c3c Date:   Fri Nov 7 09:36:46 2014 +0100 added supervisor strategy in actors, changed inmemory journal plugin,  tweaking of journal message deletion Need to add tests for all of this!! commit a12aea367e847825aba64603ae1e872a7183aa65 Date:   Wed Nov 12 12:33:45 2014 +0100 added test to make sure snapshotting only happens when state is dirty commit 7c3175ece5be1560e3774d55ec478818b77ab3ff Date:   Wed Nov 12 15:01:26 2014 +0100 added test: try to simulate error just after the hour
  43. 43. JVM tuning Use simple client simulator 100 metrics / second Monitor metrics server Run over lunch break / over night Large young generation for Scala GC options for shorter pauses and better response times production in on Solaris ­Xms128M ­Xmx512M ­XX:MaxMetaspaceSize=128m ­XX:NewSize=450M ­XX:+UseConcMarkSweepGC ­XX:+UseParNewGC ­XX:+PrintGCDetails ­XX:+PrintGCTimeStamps ­verbose:gc ­server
  44. 44. If we could choose how to improve it...
  45. 45. Possible improvements on front-end Replace the Dashing.io framework with D3 + Spray.io the former is a powerful visualization library the latter we already have for serving API the idea: serve the static JS files which would call REST API We didn't really think about MVVC framework, why not pure WebComponents + ES6?
  46. 46. Possible improvements on front-end Use WebSockets for real time information Dashing.io already uses SSE, but the server side has occasional hiccups
  47. 47. Possible improvements on back-end Introduce a bit more serious snapshot plugin (Snapshot plugins) MongoDB, PostgreSQL... This was hurting our eyes from the start but the funny things... it works even as is http://akka.io/community/
  48. 48. Possible improvements on back-end Unify actor implementation we should be using buckets everywhere Current state sealed trait ActorCache[T] {     val nameRefMap: mutable.Map[T, ActorRef] } trait MetricActorCache extends ActorCache[Metric] {...} // root one trait TimeScopeActorCache extends ActorCache[TimeScope] {...} // keeping per hour / minute class CounterMetricsIntervalActor(val metric: Metric, val scope: TimeScope) { // lowest level     private var buckets: mutable.Map[Int, Int] }
  49. 49. Possible improvements on back-end Randomization of children sleeping pill at this time our GC spikes on round hours this would allow the pressure on the snapshot store to breath a bit it would also give us extra stability since sometimes incoming messages get lost
  50. 50. If we had even more time... Clustering how scalable would it be? what would be the throughput? how would we handle node fail? Functional improvements cohort analysis dynamic time selection Akka Persistence use persistent view instead of persistent actor we only care about the query original commands are thrown away
  51. 51. Q&A Thank you!

×