Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Developing a Real-time Engine with Akka, Cassandra, and Spray

2,732 views

Published on

My presentation at the Toronto Scala and Typesafe User Group: http://www.meetup.com/Toronto-Scala-Typesafe-User-Group/events/224034596/.

Published in: Technology
  • Be the first to comment

Developing a Real-time Engine with Akka, Cassandra, and Spray

  1. 1. Developing a Real-time Engine with Akka, Cassandra, and Spray Jacob Park
  2. 2. What is Paytm Labs and Paytm? • Paytm Labs is a data-driven lab focusing on tackling very difficult problems involving the topics of fraud, recommendations, ratings, and platforms for Paytm. • Paytm is the world's fastest growing mobile-first marketplace and payment ecosystem that serves over 100 million people who make over 1.5 million business transactions representing $1.7 billion of goods and services exchanged annually. 2
  3. 3. What is Akka? • Akka (http://akka.io/): • “Akka is a toolkit and runtime for building highly concurrent, distributed, and resilient message-driven applications on the JVM.” • Packages: “akka-actor”, “akka-remote”, “akka-cluster”, “akka-persistence”, “akka-http”, and “akka-stream”. 3
  4. 4. What is Cassandra? • Cassandra (http://cassandra.apache.org/): • “The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance.” 4
  5. 5. What is Spray? • Spray (http://spray.io/): • “Spray is an open-source toolkit for building REST/HTTP- based integration layers on top of Scala and Akka.” • Packages: “spray-caching”, “spray-can”, “spray-http”, “spray-httpx”, “spray-io”, “spray-json”, “spray-routing”, “spray-servlet”. 5
  6. 6. What is Maquette? • A real-time fraud rule-engine which enables synchronous calls for core operational platforms to evaluate fraud. • Its core technologies include Akka, Cassandra, and Spray. 6
  7. 7. Why Akka, Cassandra, and Spray? • Akka, Cassandra, and Spray are highly performant, developer-friendly, treat failures as a first-class concept, provide great support for clustering to ensure responsiveness, resiliency, and elasticity when creating Reactive Systems. 7
  8. 8. Maquette In a Nutshell 8 HTTP Environment Executor
  9. 9. Maquette Actor System 9
  10. 10. HTTP Layer • Utilize Spray-Can for a fast HTTP endpoint. • Utilize Jackson for JSON deserialization/serialization. • Utilize a separate dispatcher for the Bulkhead Pattern. • Expose a normalized yet flexible schema for integration. • Request Handling: Worst → Best • Cameo Pattern (Per-request Actor), • Ask Pattern (Future), • RequestHandlerPool (Akka Router Pool). 10
  11. 11. HTTP Layer trait FraudRoute extends BaseRoute with ActorLogging { this: Actor => import SprayJacksonSupportUtils._ override protected def receiveRequest( delegateActorRef: ActorRef, parentUriPath: Path ): Actor.Receive = { case incomingHttpRequest @ HttpRequest( HttpMethods.POST, requestUri, requestHeaders, requestEntity, requestProtocol ) if requestUri.path startsWith parentUriPath => val senderActorRef = sender() unmarshalHttpEntityAndDelegateRequest( requestEntity, delegateActorRef, senderActorRef ) } } 11
  12. 12. Environment Layer • A tree of actors which are responsible for managing a cache or pool of Contexts and Dependencies required to evaluate incoming requests. • A Context is a Document Message which wraps configurations for evaluating requests. • A Dependency is a Document Message which wraps optimized queries to Cassandra. 12
  13. 13. Environment Layer • Map incoming requests to a Context by forking a template with .copy(). • Forward the forked Context to Executor Layer in the same or different JVM with Akka Router. • Consider implementing a custom router to favour locality of execution on the same JVM until responsiveness requires distribution. 13
  14. 14. Environment Layer • Always pre-compute and pre-optimize the Environment Layer as a whole. • Allow the capability to remotely pre-compute and update Contexts. • Ensure Contexts and Dependencies are designed for optimization by allowing arithmetic reduction or sorts. • Having a ProxyActor and StateActor for an EnvironmentActor is preferred to ensure caching of the whole environment to recover from failures fast. 14
  15. 15. Environment Layer type EnvironmentStateActorRefFactory = (EnvironmentProxyActorContext, EnvironmentProxyActorSelf) => ActorRef type EnvironmentActorRefFactory = (EnvironmentProxyActorContext, EnvironmentProxyActorSelf) => ActorRef class EnvironmentProxyActor( environmentStateActorRefFactory: EnvironmentStateActorRefFactory, environmentActorRefFactory: EnvironmentActorRefFactory ) extends Actor with ActorLogging { val environmentStateActorRef = environmentStateActorRefFactory(context, self) val environmentActorRef = environmentActorRefFactory(context, self) override def receive: Receive = receiveEnvironmentState orElse receiveFraudRequest orElse receiveEnvironmentLocalCommand orElse receiveEnvironmentRemoteCommand } 15
  16. 16. Environment Layer class EnvironmentStateActor( environmentProxyActorRef: ActorRef, databaseInstance: Database ) extends Actor with ActorLogging { import EnvironmentStateActor._ import EnvironmentStateFactory._ import EnvironmentStateLifecycleStrategy._ import EnvironmentStateRepository._ var environmentState: Option[EnvironmentState] = None override def receive: Receive = receiveLocalCommand orElse receiveRemoteCommand object EnvironmentStateLifecycleStrategy { ... } object EnvironmentStateFactory { ... } object EnvironmentStateRepository { ... } } 16
  17. 17. Environment Layer class EnvironmentActor( environmentProxyActor: ActorRef, executorActorRef: ActorRef, bootActorRef: ActorRef ) extends Actor with ActorLogging { import EnvironmentActor._ import EnvironmentLifecycleStrategy._ var environmentState: Option[EnvironmentState] = None override def receive: Receive = receiveEnvironmentState orElse receiveFraudRequest def forkedMaquetteContext(fraudRequest: FraudRequest): Option[MaquetteContext] = { val forkedMaquetteContextOption = for { actualEnvironmentState <- environmentState actualBaseMaquetteContext <- actualEnvironmentState.maquetteContextMap. get(fraudRequest.evaluationType) actualForkMaquetteContext = actualBaseMaquetteContext. copy(fraudRequest = fraudRequest) } yield actualForkMaquetteContext forkedMaquetteContextOption } } 17
  18. 18. Executor Layer • A pipeline of actors responsible for scheduling execution of Tasks defined within a Context with the specified Dependencies, executing the Tasks, and coordinating the results of the Tasks to provide a response. • A Task is an optimized set of executable rules. 18
  19. 19. Executor Layer • Ideally, an Execution Layer should be stateless to allow easy recovery from failures. • Ideally, keep the Execution Layer available across the cluster. 19
  20. 20. Executor Layer type ExecutorRouterActorRefFactory = (ExecutorActorContext, ExecutorActorSelf) => ActorRef type ExecutorCoordinatorActorRefFactory = (ExecutorActorContext, ExecutorActorSender, ExecutorActorNext, MaquetteContext, Timeout) => ActorRef class ExecutorActor( executorRouterActorRefFactory: ExecutorRouterActorRefFactory, executorCoordinatorActorRefFactory: ExecutorCoordinatorActorRefFactory, actionActorRef: ActorRef ) extends Actor with ActorLogging { import ExecutorActor._ import ExecutorSchedulerStrategy._ val executorRouterActorRef: ActorRef = executorRouterActorRefFactory(context, self) override def receive: Receive = receiveMaquetteContext orElse receiveMaquetteResult object ExecutorSchedulerStrategy { def scheduleExecution(maquetteContext: MaquetteContext): Unit = { ... } } } 20
  21. 21. Executor Layer • Design a Task as a functional and monadic data structure. • Utilizing functional programming, the Task should isolate side effects from functions. • Utilizing Monads, the Task becomes easily optimizable with its properties for composition or reduction which allows high parallelization. 21
  22. 22. Executor Layer case class Query( selectComponent: Select, fromComponent: From, whereComponent: Where ) { def + (that: Query): Query = { this.copy(selectComponent = Select(this.selectComponent.columnNames union that.selectComponent.columnNames) ) } def - (that: Query): Query = { this.copy(selectComponent = Select(this.selectComponent.columnNames diff that.selectComponent.columnNames) ) } } 22 Note: An example of a Rule object is not shown as it is a trade secret.
  23. 23. Executor Layer • For a Task object, consider the use of an external DSL to interpret into executable and immutable graphs and even Java byte code. • Scala Parser Combinators: https://github.com/scala/scala-parser-combinators • Parboiled2: https://github.com/sirthias/parboiled2 • ANTLR: http://www.antlr.org/ 23
  24. 24. Executor Layer object QueryParser extends JavaTokenParsers { def parseQuery(queryString: String): Try[Query] = { parseAll(queryStatement, queryString) ... } object QueryGrammar { lazy val queryStatement: Parser[Query] = selectClause ~ fromClause ~ opt(whereClause) ~ ";" ^^ { case selectComponent ~ fromComponent ~ whereComponent ~ ";" => Query(selectComponent, fromComponent, whereComponent.getOrElse(Where.Empty)) } } object SelectGrammar { ... } object FromGrammar { ... } object WhereGrammar { ... } object StaticClauseGrammar { ... } object DynamicClauseGrammar { ... } object InterpolationTypeGrammar { ... } object DataTypeGrammar { ... } object LexicalGrammar { ... } } 24 Note: An example of a Rule parser is not shown as it is a trade secret.
  25. 25. Abstracting Concurrency for High Parallelism Tasks • Scala Futures. • Scala Parallel Collections. • Akka Router Pool. • Akka Streams. 25
  26. 26. Scala Futures • “A Future is an object holding a value which may become available at some point.” 26 val f = for { a <- Future(10 / 2) b <- Future(a + 1) c <- Future(a - 1) if c > 3 } yield b * c f foreach println
  27. 27. Scala Futures • Advantages: Efficient, Highly Parallel, Simple Monadic Abstraction. • Disadvantages: Lacks Communication, Lacks Low-Level Concurrency Control, JVM Bound. • Note: Monadic Futures Enqueue All Operations to ExecutionContext ⇒ Lack of Control over Context-Switching. 27
  28. 28. Scala Parallel Collections • Scala Parallel Collections is a package in the Scala standard library which allows collections to execute operations in parallel. 28 (0 until 100000).par .filter(x => x.toString == x.toString.reverse)
  29. 29. Scala Parallel Collections • Advantages: Very Efficient, Highly Parallel, Control of Parallelism Level. • Disadvantages: Lacks Communication, Non-parallelizable Operations (foldLeft() and aggregate()), Non- deterministic and Side Effects Issues for Degree of Abstraction, JVM-Bound. 29
  30. 30. Akka Router Pool • An Akka Router Pool maintains pool of child actors to forward messages. • If an Akka Router Pool is configured with an appropriate dispatcher, mailbox, supervisor, and routing logic, it allows a highly parallel yet elastic construct to execute tasks. 30
  31. 31. Akka Router Pool val routerSupervisionStrategy = OneForOneStrategy() { case _ => SupervisorStrategy.Restart } val routerPool = FromConfig. withSupervisorStrategy(routerSupervisionStrategy) val routerProps = routerPool.props( ExecutorWorkerActor.props(accessLayer). withDispatcher(DispatcherConfigPath) ) context.actorOf( props = routerProps, name = RouterName ) 31
  32. 32. Akka Router Pool • Advantages: • Work-Pull Pattern = Rate Limiting. • Bounded Mailbox = Backpressure. • SupervisionStrategy = Failure. • Scheduler = Timeout. • Router Resizer = Predictive Parallelism & Scaling. • Dispatcher Throughput = Predictive Context Switching. • Location Transparency = JVM Unbound. 32
  33. 33. Akka Router Pool • Disadvantages: • Complex optimizations or implementation required. • Actors with state potentially lead to issues regarding mutability and lack of idempotence. • Actors which require communication beyond parent-child trees lead to potentially complex graphs. 33
  34. 34. Akka Steams • “Akka Streams is an implementation of Reactive Streams, which is a standard for asynchronous stream processing with non-blocking backpressure.” 34 implicit val system = ActorSystem("reactive-tweets") implicit val materializer = ActorMaterializer() val authors: Source[Author, Unit] = tweets .filter(_.hashtags.contains(akka)) .map(_.author) authors.runWith(Sink.foreach(println))
  35. 35. Akka Steams • Advantages: Backpressure and Failure as First-class Concepts, Concurrency Control, Simple Monadic Abstraction, Graph API, Bi-directional Channels. • Disadvantages: Too New = Risk for Production. • Current: JVM Bounded; Potentially: Distributed Streaming. • Current: No Graph Optimization; Potentially: Macro- based Optimization. 35
  36. 36. Maquette Performance • With 10 Cassandra nodes, 4 Maquette nodes, and an HA Proxy as a staging environment, ~40 000 requests per second with a mean 10 millisecond response time with 50 rules. 36
  37. 37. Tips • Investigate Akka Streams for Akka HTTP. • Investigate CPU usage and memory consumption: YourKit or VisualVM and Eclipse MAT. • Utilize Kamon for real-time metrics to StatsD or a third- party service like Datadog. • If implementing a DSL or a complex actor-based graph, remember to utilize ScalaTest and Akka TestKit properly. • Utilize Gatling.io for load and scenario based testing. 37
  38. 38. Tips • We used Cassandra 2.1.6 as our main data store for Maquette. We experienced many pains with operating Cassandra. • Mastering Apache Cassandra (2nd Edition): http://www.amazon.com/Mastering-Apache-Cassandra- Second-Edition-ebook/dp/B00VAG2WZO 38
  39. 39. Tips • Investigate the Play Framework with Akka Cluster to create a web application for operations. • Commands to operate instances in the cluster. • Commands to configure instances in real-time. • GUI interface for data scientists and business analysts to easily define and configure rules. 39
  40. 40. Tips • Utilize Kafka to publish audits which can be utilized to monitor rules through an Logstash, Elasticsearch, and Kibana flow, and archived in a HDFS. • Consider Kafka to replay audits as requests to run real-time engine offline for tuning rules. 40
  41. 41. Resources • The Reactive Manifesto: • http://www.reactivemanifesto.org/ • Reactive Messaging Patterns with the Actor Model: • http://www.amazon.ca/Reactive-Messaging-Patterns-Actor- Model/dp/0133846830 • Learning Concurrent Programming in Scala: • http://www.amazon.com/Learning-Concurrent-Programming-Aleksandar- Prokopec/dp/1783281413 • Akka Concurrency: • http://www.amazon.ca/Akka-Concurrency-Derek-Wyatt/dp/0981531660 41
  42. 42. Thank you! Jacob Park Phone Number Removed jacob@paytm.com park.jacob.96@gmail.com

×