Developing a Real-time Engine
with Akka, Cassandra, and Spray
Jacob Park
What is Paytm Labs and Paytm?
• Paytm Labs is a data-driven lab focusing on tackling very
difficult problems involving the topics of fraud,
recommendations, ratings, and platforms for Paytm.
• Paytm is the world's fastest growing mobile-first
marketplace and payment ecosystem that serves over 100
million people who make over 1.5 million business
transactions representing $1.7 billion of goods and
services exchanged annually.
2
What is Akka?
• Akka (http://akka.io/):
• “Akka is a toolkit and runtime for building highly
concurrent, distributed, and resilient message-driven
applications on the JVM.”
• Packages: “akka-actor”, “akka-remote”, “akka-cluster”,
“akka-persistence”, “akka-http”, and “akka-stream”.
3
What is Cassandra?
• Cassandra (http://cassandra.apache.org/):
• “The Apache Cassandra database is the right choice
when you need scalability and high availability without
compromising performance.”
4
What is Spray?
• Spray (http://spray.io/):
• “Spray is an open-source toolkit for building REST/HTTP-
based integration layers on top of Scala and Akka.”
• Packages: “spray-caching”, “spray-can”, “spray-http”,
“spray-httpx”, “spray-io”, “spray-json”, “spray-routing”,
“spray-servlet”.
5
What is Maquette?
• A real-time fraud rule-engine which enables synchronous
calls for core operational platforms to evaluate fraud.
• Its core technologies include Akka, Cassandra, and Spray.
6
Why Akka, Cassandra, and Spray?
• Akka, Cassandra, and Spray are highly performant,
developer-friendly, treat failures as a first-class concept,
provide great support for clustering to ensure
responsiveness, resiliency, and elasticity when creating
Reactive Systems.
7
Maquette In a Nutshell
8
HTTP Environment Executor
Maquette Actor System
9
HTTP Layer
• Utilize Spray-Can for a fast HTTP endpoint.
• Utilize Jackson for JSON deserialization/serialization.
• Utilize a separate dispatcher for the Bulkhead Pattern.
• Expose a normalized yet flexible schema for integration.
• Request Handling: Worst → Best
• Cameo Pattern (Per-request Actor),
• Ask Pattern (Future),
• RequestHandlerPool (Akka Router Pool).
10
HTTP Layer
trait FraudRoute extends BaseRoute with ActorLogging {
this: Actor =>
import SprayJacksonSupportUtils._
override protected def receiveRequest(
delegateActorRef: ActorRef, parentUriPath: Path
): Actor.Receive = {
case incomingHttpRequest @ HttpRequest(
HttpMethods.POST, requestUri, requestHeaders, requestEntity, requestProtocol
)
if requestUri.path startsWith parentUriPath =>
val senderActorRef = sender()
unmarshalHttpEntityAndDelegateRequest(
requestEntity, delegateActorRef, senderActorRef
)
}
}
11
Environment Layer
• A tree of actors which are responsible for managing a
cache or pool of Contexts and Dependencies required to
evaluate incoming requests.
• A Context is a Document Message which wraps
configurations for evaluating requests.
• A Dependency is a Document Message which wraps
optimized queries to Cassandra.
12
Environment Layer
• Map incoming requests to a Context by forking a template
with .copy().
• Forward the forked Context to Executor Layer in the same
or different JVM with Akka Router.
• Consider implementing a custom router to favour locality
of execution on the same JVM until responsiveness
requires distribution.
13
Environment Layer
• Always pre-compute and pre-optimize the Environment
Layer as a whole.
• Allow the capability to remotely pre-compute and update
Contexts.
• Ensure Contexts and Dependencies are designed for
optimization by allowing arithmetic reduction or sorts.
• Having a ProxyActor and StateActor for an
EnvironmentActor is preferred to ensure caching of the
whole environment to recover from failures fast.
14
Environment Layer
type EnvironmentStateActorRefFactory =
(EnvironmentProxyActorContext, EnvironmentProxyActorSelf) => ActorRef
type EnvironmentActorRefFactory =
(EnvironmentProxyActorContext, EnvironmentProxyActorSelf) => ActorRef
class EnvironmentProxyActor(
environmentStateActorRefFactory: EnvironmentStateActorRefFactory,
environmentActorRefFactory: EnvironmentActorRefFactory
) extends Actor with ActorLogging {
val environmentStateActorRef = environmentStateActorRefFactory(context, self)
val environmentActorRef = environmentActorRefFactory(context, self)
override def receive: Receive =
receiveEnvironmentState orElse
receiveFraudRequest orElse
receiveEnvironmentLocalCommand orElse
receiveEnvironmentRemoteCommand
} 15
Environment Layer
class EnvironmentStateActor(
environmentProxyActorRef: ActorRef, databaseInstance: Database
) extends Actor with ActorLogging {
import EnvironmentStateActor._
import EnvironmentStateFactory._
import EnvironmentStateLifecycleStrategy._
import EnvironmentStateRepository._
var environmentState: Option[EnvironmentState] = None
override def receive: Receive =
receiveLocalCommand orElse
receiveRemoteCommand
object EnvironmentStateLifecycleStrategy { ... }
object EnvironmentStateFactory { ... }
object EnvironmentStateRepository { ... }
}
16
Environment Layer
class EnvironmentActor(
environmentProxyActor: ActorRef, executorActorRef: ActorRef, bootActorRef: ActorRef
) extends Actor with ActorLogging {
import EnvironmentActor._
import EnvironmentLifecycleStrategy._
var environmentState: Option[EnvironmentState] = None
override def receive: Receive =
receiveEnvironmentState orElse
receiveFraudRequest
def forkedMaquetteContext(fraudRequest: FraudRequest): Option[MaquetteContext] = {
val forkedMaquetteContextOption = for {
actualEnvironmentState <- environmentState
actualBaseMaquetteContext <- actualEnvironmentState.maquetteContextMap.
get(fraudRequest.evaluationType)
actualForkMaquetteContext = actualBaseMaquetteContext.
copy(fraudRequest = fraudRequest)
} yield actualForkMaquetteContext
forkedMaquetteContextOption
}
}
17
Executor Layer
• A pipeline of actors responsible for scheduling execution of
Tasks defined within a Context with the specified
Dependencies, executing the Tasks, and coordinating the
results of the Tasks to provide a response.
• A Task is an optimized set of executable rules.
18
Executor Layer
• Ideally, an Execution Layer should be stateless to allow
easy recovery from failures.
• Ideally, keep the Execution Layer available across the
cluster.
19
Executor Layer
type ExecutorRouterActorRefFactory =
(ExecutorActorContext, ExecutorActorSelf) => ActorRef
type ExecutorCoordinatorActorRefFactory =
(ExecutorActorContext, ExecutorActorSender, ExecutorActorNext, MaquetteContext, Timeout) =>
ActorRef
class ExecutorActor(
executorRouterActorRefFactory: ExecutorRouterActorRefFactory,
executorCoordinatorActorRefFactory: ExecutorCoordinatorActorRefFactory,
actionActorRef: ActorRef
) extends Actor with ActorLogging {
import ExecutorActor._
import ExecutorSchedulerStrategy._
val executorRouterActorRef: ActorRef = executorRouterActorRefFactory(context, self)
override def receive: Receive =
receiveMaquetteContext orElse
receiveMaquetteResult
object ExecutorSchedulerStrategy {
def scheduleExecution(maquetteContext: MaquetteContext): Unit = { ... }
}
}
20
Executor Layer
• Design a Task as a functional and monadic data structure.
• Utilizing functional programming, the Task should isolate
side effects from functions.
• Utilizing Monads, the Task becomes easily optimizable
with its properties for composition or reduction which
allows high parallelization.
21
Executor Layer
case class Query(
selectComponent: Select, fromComponent: From, whereComponent: Where
) {
def + (that: Query): Query = {
this.copy(selectComponent =
Select(this.selectComponent.columnNames union
that.selectComponent.columnNames)
)
}
def - (that: Query): Query = {
this.copy(selectComponent =
Select(this.selectComponent.columnNames diff
that.selectComponent.columnNames)
)
}
}
22
Note: An example of a Rule object is not shown as it is a trade secret.
Executor Layer
• For a Task object, consider the use of an external DSL to
interpret into executable and immutable graphs and even
Java byte code.
• Scala Parser Combinators:
https://github.com/scala/scala-parser-combinators
• Parboiled2: https://github.com/sirthias/parboiled2
• ANTLR: http://www.antlr.org/
23
Executor Layer
object QueryParser extends JavaTokenParsers {
def parseQuery(queryString: String): Try[Query] = {
parseAll(queryStatement, queryString) ...
}
object QueryGrammar {
lazy val queryStatement: Parser[Query] =
selectClause ~ fromClause ~ opt(whereClause) ~ ";" ^^ {
case selectComponent ~ fromComponent ~ whereComponent ~ ";" =>
Query(selectComponent, fromComponent, whereComponent.getOrElse(Where.Empty))
}
}
object SelectGrammar { ... }
object FromGrammar { ... }
object WhereGrammar { ... }
object StaticClauseGrammar { ... }
object DynamicClauseGrammar { ... }
object InterpolationTypeGrammar { ... }
object DataTypeGrammar { ... }
object LexicalGrammar { ... }
}
24
Note: An example of a Rule parser is not shown as it is a trade secret.
Abstracting Concurrency for High Parallelism Tasks
• Scala Futures.
• Scala Parallel Collections.
• Akka Router Pool.
• Akka Streams.
25
Scala Futures
• “A Future is an object holding a value which may become
available at some point.”
26
val f = for {
a <- Future(10 / 2)
b <- Future(a + 1)
c <- Future(a - 1)
if c > 3
} yield b * c
f foreach println
Scala Futures
• Advantages: Efficient, Highly Parallel, Simple Monadic
Abstraction.
• Disadvantages: Lacks Communication, Lacks Low-Level
Concurrency Control, JVM Bound.
• Note: Monadic Futures Enqueue All Operations to ExecutionContext
⇒ Lack of Control over Context-Switching.
27
Scala Parallel Collections
• Scala Parallel Collections is a package in the Scala
standard library which allows collections to execute
operations in parallel.
28
(0 until 100000).par
.filter(x => x.toString == x.toString.reverse)
Scala Parallel Collections
• Advantages: Very Efficient, Highly Parallel, Control of
Parallelism Level.
• Disadvantages: Lacks Communication, Non-parallelizable
Operations (foldLeft() and aggregate()), Non-
deterministic and Side Effects Issues for Degree of
Abstraction, JVM-Bound.
29
Akka Router Pool
• An Akka Router Pool maintains pool of child actors to
forward messages.
• If an Akka Router Pool is configured with an appropriate
dispatcher, mailbox, supervisor, and routing logic, it allows
a highly parallel yet elastic construct to execute tasks.
30
Akka Router Pool
val routerSupervisionStrategy = OneForOneStrategy() {
case _ => SupervisorStrategy.Restart
}
val routerPool = FromConfig.
withSupervisorStrategy(routerSupervisionStrategy)
val routerProps = routerPool.props(
ExecutorWorkerActor.props(accessLayer).
withDispatcher(DispatcherConfigPath)
)
context.actorOf(
props = routerProps,
name = RouterName
)
31
Akka Router Pool
• Advantages:
• Work-Pull Pattern = Rate Limiting.
• Bounded Mailbox = Backpressure.
• SupervisionStrategy = Failure.
• Scheduler = Timeout.
• Router Resizer = Predictive Parallelism & Scaling.
• Dispatcher Throughput = Predictive Context Switching.
• Location Transparency = JVM Unbound.
32
Akka Router Pool
• Disadvantages:
• Complex optimizations or implementation required.
• Actors with state potentially lead to issues regarding
mutability and lack of idempotence.
• Actors which require communication beyond parent-child
trees lead to potentially complex graphs.
33
Akka Steams
• “Akka Streams is an implementation of Reactive Streams,
which is a standard for asynchronous stream processing
with non-blocking backpressure.”
34
implicit val system = ActorSystem("reactive-tweets")
implicit val materializer = ActorMaterializer()
val authors: Source[Author, Unit] =
tweets
.filter(_.hashtags.contains(akka))
.map(_.author)
authors.runWith(Sink.foreach(println))
Akka Steams
• Advantages: Backpressure and Failure as First-class
Concepts, Concurrency Control, Simple Monadic
Abstraction, Graph API, Bi-directional Channels.
• Disadvantages: Too New = Risk for Production.
• Current: JVM Bounded; Potentially: Distributed
Streaming.
• Current: No Graph Optimization; Potentially: Macro-
based Optimization.
35
Maquette Performance
• With 10 Cassandra nodes, 4 Maquette nodes, and an HA
Proxy as a staging environment, ~40 000 requests per
second with a mean 10 millisecond response time with 50
rules.
36
Tips
• Investigate Akka Streams for Akka HTTP.
• Investigate CPU usage and memory consumption: YourKit
or VisualVM and Eclipse MAT.
• Utilize Kamon for real-time metrics to StatsD or a third-
party service like Datadog.
• If implementing a DSL or a complex actor-based graph,
remember to utilize ScalaTest and Akka TestKit properly.
• Utilize Gatling.io for load and scenario based testing.
37
Tips
• We used Cassandra 2.1.6 as our main data store for
Maquette. We experienced many pains with operating
Cassandra.
• Mastering Apache Cassandra (2nd Edition):
http://www.amazon.com/Mastering-Apache-Cassandra-
Second-Edition-ebook/dp/B00VAG2WZO
38
Tips
• Investigate the Play Framework with Akka Cluster to create
a web application for operations.
• Commands to operate instances in the cluster.
• Commands to configure instances in real-time.
• GUI interface for data scientists and business analysts to
easily define and configure rules.
39
Tips
• Utilize Kafka to publish audits which can be utilized to
monitor rules through an Logstash, Elasticsearch, and
Kibana flow, and archived in a HDFS.
• Consider Kafka to replay audits as requests to run real-time
engine offline for tuning rules.
40
Resources
• The Reactive Manifesto:
• http://www.reactivemanifesto.org/
• Reactive Messaging Patterns with the Actor Model:
• http://www.amazon.ca/Reactive-Messaging-Patterns-Actor-
Model/dp/0133846830
• Learning Concurrent Programming in Scala:
• http://www.amazon.com/Learning-Concurrent-Programming-Aleksandar-
Prokopec/dp/1783281413
• Akka Concurrency:
• http://www.amazon.ca/Akka-Concurrency-Derek-Wyatt/dp/0981531660
41
Thank you!
Jacob Park
Phone Number Removed
jacob@paytm.com
park.jacob.96@gmail.com

Developing a Real-time Engine with Akka, Cassandra, and Spray

  • 1.
    Developing a Real-timeEngine with Akka, Cassandra, and Spray Jacob Park
  • 2.
    What is PaytmLabs and Paytm? • Paytm Labs is a data-driven lab focusing on tackling very difficult problems involving the topics of fraud, recommendations, ratings, and platforms for Paytm. • Paytm is the world's fastest growing mobile-first marketplace and payment ecosystem that serves over 100 million people who make over 1.5 million business transactions representing $1.7 billion of goods and services exchanged annually. 2
  • 3.
    What is Akka? •Akka (http://akka.io/): • “Akka is a toolkit and runtime for building highly concurrent, distributed, and resilient message-driven applications on the JVM.” • Packages: “akka-actor”, “akka-remote”, “akka-cluster”, “akka-persistence”, “akka-http”, and “akka-stream”. 3
  • 4.
    What is Cassandra? •Cassandra (http://cassandra.apache.org/): • “The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance.” 4
  • 5.
    What is Spray? •Spray (http://spray.io/): • “Spray is an open-source toolkit for building REST/HTTP- based integration layers on top of Scala and Akka.” • Packages: “spray-caching”, “spray-can”, “spray-http”, “spray-httpx”, “spray-io”, “spray-json”, “spray-routing”, “spray-servlet”. 5
  • 6.
    What is Maquette? •A real-time fraud rule-engine which enables synchronous calls for core operational platforms to evaluate fraud. • Its core technologies include Akka, Cassandra, and Spray. 6
  • 7.
    Why Akka, Cassandra,and Spray? • Akka, Cassandra, and Spray are highly performant, developer-friendly, treat failures as a first-class concept, provide great support for clustering to ensure responsiveness, resiliency, and elasticity when creating Reactive Systems. 7
  • 8.
    Maquette In aNutshell 8 HTTP Environment Executor
  • 9.
  • 10.
    HTTP Layer • UtilizeSpray-Can for a fast HTTP endpoint. • Utilize Jackson for JSON deserialization/serialization. • Utilize a separate dispatcher for the Bulkhead Pattern. • Expose a normalized yet flexible schema for integration. • Request Handling: Worst → Best • Cameo Pattern (Per-request Actor), • Ask Pattern (Future), • RequestHandlerPool (Akka Router Pool). 10
  • 11.
    HTTP Layer trait FraudRouteextends BaseRoute with ActorLogging { this: Actor => import SprayJacksonSupportUtils._ override protected def receiveRequest( delegateActorRef: ActorRef, parentUriPath: Path ): Actor.Receive = { case incomingHttpRequest @ HttpRequest( HttpMethods.POST, requestUri, requestHeaders, requestEntity, requestProtocol ) if requestUri.path startsWith parentUriPath => val senderActorRef = sender() unmarshalHttpEntityAndDelegateRequest( requestEntity, delegateActorRef, senderActorRef ) } } 11
  • 12.
    Environment Layer • Atree of actors which are responsible for managing a cache or pool of Contexts and Dependencies required to evaluate incoming requests. • A Context is a Document Message which wraps configurations for evaluating requests. • A Dependency is a Document Message which wraps optimized queries to Cassandra. 12
  • 13.
    Environment Layer • Mapincoming requests to a Context by forking a template with .copy(). • Forward the forked Context to Executor Layer in the same or different JVM with Akka Router. • Consider implementing a custom router to favour locality of execution on the same JVM until responsiveness requires distribution. 13
  • 14.
    Environment Layer • Alwayspre-compute and pre-optimize the Environment Layer as a whole. • Allow the capability to remotely pre-compute and update Contexts. • Ensure Contexts and Dependencies are designed for optimization by allowing arithmetic reduction or sorts. • Having a ProxyActor and StateActor for an EnvironmentActor is preferred to ensure caching of the whole environment to recover from failures fast. 14
  • 15.
    Environment Layer type EnvironmentStateActorRefFactory= (EnvironmentProxyActorContext, EnvironmentProxyActorSelf) => ActorRef type EnvironmentActorRefFactory = (EnvironmentProxyActorContext, EnvironmentProxyActorSelf) => ActorRef class EnvironmentProxyActor( environmentStateActorRefFactory: EnvironmentStateActorRefFactory, environmentActorRefFactory: EnvironmentActorRefFactory ) extends Actor with ActorLogging { val environmentStateActorRef = environmentStateActorRefFactory(context, self) val environmentActorRef = environmentActorRefFactory(context, self) override def receive: Receive = receiveEnvironmentState orElse receiveFraudRequest orElse receiveEnvironmentLocalCommand orElse receiveEnvironmentRemoteCommand } 15
  • 16.
    Environment Layer class EnvironmentStateActor( environmentProxyActorRef:ActorRef, databaseInstance: Database ) extends Actor with ActorLogging { import EnvironmentStateActor._ import EnvironmentStateFactory._ import EnvironmentStateLifecycleStrategy._ import EnvironmentStateRepository._ var environmentState: Option[EnvironmentState] = None override def receive: Receive = receiveLocalCommand orElse receiveRemoteCommand object EnvironmentStateLifecycleStrategy { ... } object EnvironmentStateFactory { ... } object EnvironmentStateRepository { ... } } 16
  • 17.
    Environment Layer class EnvironmentActor( environmentProxyActor:ActorRef, executorActorRef: ActorRef, bootActorRef: ActorRef ) extends Actor with ActorLogging { import EnvironmentActor._ import EnvironmentLifecycleStrategy._ var environmentState: Option[EnvironmentState] = None override def receive: Receive = receiveEnvironmentState orElse receiveFraudRequest def forkedMaquetteContext(fraudRequest: FraudRequest): Option[MaquetteContext] = { val forkedMaquetteContextOption = for { actualEnvironmentState <- environmentState actualBaseMaquetteContext <- actualEnvironmentState.maquetteContextMap. get(fraudRequest.evaluationType) actualForkMaquetteContext = actualBaseMaquetteContext. copy(fraudRequest = fraudRequest) } yield actualForkMaquetteContext forkedMaquetteContextOption } } 17
  • 18.
    Executor Layer • Apipeline of actors responsible for scheduling execution of Tasks defined within a Context with the specified Dependencies, executing the Tasks, and coordinating the results of the Tasks to provide a response. • A Task is an optimized set of executable rules. 18
  • 19.
    Executor Layer • Ideally,an Execution Layer should be stateless to allow easy recovery from failures. • Ideally, keep the Execution Layer available across the cluster. 19
  • 20.
    Executor Layer type ExecutorRouterActorRefFactory= (ExecutorActorContext, ExecutorActorSelf) => ActorRef type ExecutorCoordinatorActorRefFactory = (ExecutorActorContext, ExecutorActorSender, ExecutorActorNext, MaquetteContext, Timeout) => ActorRef class ExecutorActor( executorRouterActorRefFactory: ExecutorRouterActorRefFactory, executorCoordinatorActorRefFactory: ExecutorCoordinatorActorRefFactory, actionActorRef: ActorRef ) extends Actor with ActorLogging { import ExecutorActor._ import ExecutorSchedulerStrategy._ val executorRouterActorRef: ActorRef = executorRouterActorRefFactory(context, self) override def receive: Receive = receiveMaquetteContext orElse receiveMaquetteResult object ExecutorSchedulerStrategy { def scheduleExecution(maquetteContext: MaquetteContext): Unit = { ... } } } 20
  • 21.
    Executor Layer • Designa Task as a functional and monadic data structure. • Utilizing functional programming, the Task should isolate side effects from functions. • Utilizing Monads, the Task becomes easily optimizable with its properties for composition or reduction which allows high parallelization. 21
  • 22.
    Executor Layer case classQuery( selectComponent: Select, fromComponent: From, whereComponent: Where ) { def + (that: Query): Query = { this.copy(selectComponent = Select(this.selectComponent.columnNames union that.selectComponent.columnNames) ) } def - (that: Query): Query = { this.copy(selectComponent = Select(this.selectComponent.columnNames diff that.selectComponent.columnNames) ) } } 22 Note: An example of a Rule object is not shown as it is a trade secret.
  • 23.
    Executor Layer • Fora Task object, consider the use of an external DSL to interpret into executable and immutable graphs and even Java byte code. • Scala Parser Combinators: https://github.com/scala/scala-parser-combinators • Parboiled2: https://github.com/sirthias/parboiled2 • ANTLR: http://www.antlr.org/ 23
  • 24.
    Executor Layer object QueryParserextends JavaTokenParsers { def parseQuery(queryString: String): Try[Query] = { parseAll(queryStatement, queryString) ... } object QueryGrammar { lazy val queryStatement: Parser[Query] = selectClause ~ fromClause ~ opt(whereClause) ~ ";" ^^ { case selectComponent ~ fromComponent ~ whereComponent ~ ";" => Query(selectComponent, fromComponent, whereComponent.getOrElse(Where.Empty)) } } object SelectGrammar { ... } object FromGrammar { ... } object WhereGrammar { ... } object StaticClauseGrammar { ... } object DynamicClauseGrammar { ... } object InterpolationTypeGrammar { ... } object DataTypeGrammar { ... } object LexicalGrammar { ... } } 24 Note: An example of a Rule parser is not shown as it is a trade secret.
  • 25.
    Abstracting Concurrency forHigh Parallelism Tasks • Scala Futures. • Scala Parallel Collections. • Akka Router Pool. • Akka Streams. 25
  • 26.
    Scala Futures • “AFuture is an object holding a value which may become available at some point.” 26 val f = for { a <- Future(10 / 2) b <- Future(a + 1) c <- Future(a - 1) if c > 3 } yield b * c f foreach println
  • 27.
    Scala Futures • Advantages:Efficient, Highly Parallel, Simple Monadic Abstraction. • Disadvantages: Lacks Communication, Lacks Low-Level Concurrency Control, JVM Bound. • Note: Monadic Futures Enqueue All Operations to ExecutionContext ⇒ Lack of Control over Context-Switching. 27
  • 28.
    Scala Parallel Collections •Scala Parallel Collections is a package in the Scala standard library which allows collections to execute operations in parallel. 28 (0 until 100000).par .filter(x => x.toString == x.toString.reverse)
  • 29.
    Scala Parallel Collections •Advantages: Very Efficient, Highly Parallel, Control of Parallelism Level. • Disadvantages: Lacks Communication, Non-parallelizable Operations (foldLeft() and aggregate()), Non- deterministic and Side Effects Issues for Degree of Abstraction, JVM-Bound. 29
  • 30.
    Akka Router Pool •An Akka Router Pool maintains pool of child actors to forward messages. • If an Akka Router Pool is configured with an appropriate dispatcher, mailbox, supervisor, and routing logic, it allows a highly parallel yet elastic construct to execute tasks. 30
  • 31.
    Akka Router Pool valrouterSupervisionStrategy = OneForOneStrategy() { case _ => SupervisorStrategy.Restart } val routerPool = FromConfig. withSupervisorStrategy(routerSupervisionStrategy) val routerProps = routerPool.props( ExecutorWorkerActor.props(accessLayer). withDispatcher(DispatcherConfigPath) ) context.actorOf( props = routerProps, name = RouterName ) 31
  • 32.
    Akka Router Pool •Advantages: • Work-Pull Pattern = Rate Limiting. • Bounded Mailbox = Backpressure. • SupervisionStrategy = Failure. • Scheduler = Timeout. • Router Resizer = Predictive Parallelism & Scaling. • Dispatcher Throughput = Predictive Context Switching. • Location Transparency = JVM Unbound. 32
  • 33.
    Akka Router Pool •Disadvantages: • Complex optimizations or implementation required. • Actors with state potentially lead to issues regarding mutability and lack of idempotence. • Actors which require communication beyond parent-child trees lead to potentially complex graphs. 33
  • 34.
    Akka Steams • “AkkaStreams is an implementation of Reactive Streams, which is a standard for asynchronous stream processing with non-blocking backpressure.” 34 implicit val system = ActorSystem("reactive-tweets") implicit val materializer = ActorMaterializer() val authors: Source[Author, Unit] = tweets .filter(_.hashtags.contains(akka)) .map(_.author) authors.runWith(Sink.foreach(println))
  • 35.
    Akka Steams • Advantages:Backpressure and Failure as First-class Concepts, Concurrency Control, Simple Monadic Abstraction, Graph API, Bi-directional Channels. • Disadvantages: Too New = Risk for Production. • Current: JVM Bounded; Potentially: Distributed Streaming. • Current: No Graph Optimization; Potentially: Macro- based Optimization. 35
  • 36.
    Maquette Performance • With10 Cassandra nodes, 4 Maquette nodes, and an HA Proxy as a staging environment, ~40 000 requests per second with a mean 10 millisecond response time with 50 rules. 36
  • 37.
    Tips • Investigate AkkaStreams for Akka HTTP. • Investigate CPU usage and memory consumption: YourKit or VisualVM and Eclipse MAT. • Utilize Kamon for real-time metrics to StatsD or a third- party service like Datadog. • If implementing a DSL or a complex actor-based graph, remember to utilize ScalaTest and Akka TestKit properly. • Utilize Gatling.io for load and scenario based testing. 37
  • 38.
    Tips • We usedCassandra 2.1.6 as our main data store for Maquette. We experienced many pains with operating Cassandra. • Mastering Apache Cassandra (2nd Edition): http://www.amazon.com/Mastering-Apache-Cassandra- Second-Edition-ebook/dp/B00VAG2WZO 38
  • 39.
    Tips • Investigate thePlay Framework with Akka Cluster to create a web application for operations. • Commands to operate instances in the cluster. • Commands to configure instances in real-time. • GUI interface for data scientists and business analysts to easily define and configure rules. 39
  • 40.
    Tips • Utilize Kafkato publish audits which can be utilized to monitor rules through an Logstash, Elasticsearch, and Kibana flow, and archived in a HDFS. • Consider Kafka to replay audits as requests to run real-time engine offline for tuning rules. 40
  • 41.
    Resources • The ReactiveManifesto: • http://www.reactivemanifesto.org/ • Reactive Messaging Patterns with the Actor Model: • http://www.amazon.ca/Reactive-Messaging-Patterns-Actor- Model/dp/0133846830 • Learning Concurrent Programming in Scala: • http://www.amazon.com/Learning-Concurrent-Programming-Aleksandar- Prokopec/dp/1783281413 • Akka Concurrency: • http://www.amazon.ca/Akka-Concurrency-Derek-Wyatt/dp/0981531660 41
  • 42.
    Thank you! Jacob Park PhoneNumber Removed jacob@paytm.com park.jacob.96@gmail.com