Streaming Microservices With Akka Streams And Kafka Streams

2. Check out these resources: Dean’s book Webinars etc. Fast Data Architectures   for Streaming Applications Getting Answers Now from Data Sets that Never End By Dean Wampler, Ph. D., VP of Fast Data Engineering LIGHTBEND.COM/LEARN 2

3. Mesos, YARN, Cloud, … Logs Sockets REST ZooKeeper Cluster ZK Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 11 KaEa Cluster Ka9a Microservices RP Go Node.js … 2 4 7 8 9 10 Beam Streaming architecture, from the book

4. Mesos, YARN, Cloud, … Logs Sockets REST ZooKeeper Cluster ZK Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 11 KaEa Cluster Ka9a Microservices RP Go Node.js … 2 4 7 8 9 10 Beam Today’s focus: •Kafka - the data backplane •Akka Streams and Kafka Streams - streaming microservices

5. Service 1 Log & Other Files Internet Services Service 2 Service 3 Services Services N * M links ConsumersProducers Before: Why Kafka?

6. Service 1 Log & Other Files Internet Services Service 2 Service 3 Services Services N * M links ConsumersProducers Before: Service 1 Log & Other Files Internet Services Service 2 Service 3 Services Services N + M links ConsumersProducers After: Why Kafka?

7. Service 1 Log & Other Files Internet Services Service 2 Service 3 Services Services N + M links ConsumersProducers After: Why Kafka? Kafka: • Simplify dependencies between services • Reduce data loss when a service crashes • M producers, N consumers • Simplicity of one “API” for communication

8. ogs ckets REST Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaEa Cluster Ka9a 2 4 7 8 9 10 Beam Four Streaming Engines: Spark, Flink - services to which you submit work. Large scale, automatic data partitioning.

9. Four Streaming Engines: Spark, Flink - services to which you submit work. Large scale, automatic data partitioning.

10. ogs ckets REST Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaEa Cluster Ka9a 2 4 7 8 9 10 Beam Four Streaming Engines: Spark, Flink - services to which you submit work. Large scale, automatic data partitioning. Akka Streams, Kafka Streams - libraries for “data-centric microservices”. Smaller scale, but great flexibility

11. “Record-centric” μ-services Events Records A Spectrum of Microservices Event-driven μ-services … Browse REST AccountOrders Shopping Cart API Gateway Inventory storage Data Model Training Model Serving Other Logic

12. Events Records Event-driven μ-services … Browse REST AccountOrders Shopping Cart API Gateway Inventory Akka emerged from the left-hand side of the spectrum, the world of highly Reactive microservices. Akka Streams pushes to the right, more data-centric. A Spectrum of Microservices

13. “Record-centric” μ-services Events Records A Spectrum of Microservices storage Data Model Training Model Serving Other Logic Emerged from the right-hand side. Kafka Streams pushes to the left, supporting many event- processing scenarios.

14. ogs ckets REST Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaEa Cluster Ka9a 2 4 7 8 9 10 Beam Kafka Streams • Low overhead • Parallelism <=> Kafka partitions • Read/write Kafka topics • Java and Scala (coming) API • Feels similar to Spark, Scala collections • SQL interface • Query in-memory state

15. ogs ckets REST Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaEa Cluster Ka9a 2 4 7 8 9 10 Beam Kafka Streams • Ideal for • ETL (“KStreams”) • Aggregations (“KTables”) • Joins • “Eﬀectively once” requirements

16. Kafka Streams Example Data Model Training Model Serving Other Logic Raw Data Model Params Scored Records Final Records storage Data Model Training Model Serving Other Logic

17. Data Model Training Model Serving Other Logic Raw Data Model Params Scored Records Final Records Kafka Streams Example A new Scala Kafka Streams API: • https://github.com/lightbend/kafka-streams-scala • Adheres very closely to the semantics of the Java API • Lightbend is contributing this API to Apache Kafka • Developed by Debasish Ghosh, Boris Lublinsky, Sean Glover, with contributions from other Fast Data Platform team members

18. val builder = new StreamsBuilderS // New Scala Wrapper API. val data = builder.stream[Array[Byte], Array[Byte]](rawDataTopic) val model = builder.stream[Array[Byte], Array[Byte]](modelTopic) val modelProcessor = new ModelProcessor val scorer = new Scorer(modelProcessor) model.mapValues(bytes => Model.parseBytes(bytes)) // array => record .filter((key, model) => model.valid) // Successful? .mapValues(model => ModelImpl.findModel(model)) .process(() => modelProcessor, …) // Set up actual model data.mapValues(bytes => DataRecord.parseBytes(bytes)) .filter((key, record) => record.valid) .mapValues(record => new ScoredRecord(scorer.score(record),record)) .to(scoredRecordsTopic) val streams = new KafkaStreams( builder.build, streamsConfiguration) streams.start() Data Model Training Model Serving Raw Data Model Params Scored Records

26. Data Model Training Model Serving Other Logic Raw Data Model Params Scored Records Final Records What’s Missing? Kafka Streams is a powerful library, but you’ll need to provide the rest of the microservice support telling through other means. (Some are provided if you run the support services for the SQL interface.) You would embed your KS code in microservices written with more comprehensive toolkits, such as the Lightbend Reactive Platform! We’ll return to this point…

27. ogs ckets REST Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaEa Cluster Ka9a 2 4 7 8 9 10 Beam Akka Streams • Low latency, mid-volume • Graphs of processing nodes • Based on Akka Actors: • Complex event processing • Eﬀicient, per-event processing

28. ogs ckets REST Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaEa Cluster Ka9a 2 4 7 8 9 10 Beam Akka Streams • Rich ecosystem • Alpakka - connectivity for DBs, Kafka, etc. (like Camel) • Akka Cluster - scale across nodes • Akka Persistence - Akka state durability

29. Akka Cluster Data Model Training Model Serving Other Logic Raw Data Model Params Final Records Alpakka Alpakka storage Akka Streams Example Data Model Training Model Serving Other Logic Raw Data Model Params Scored Records Final Records

30. Akka Cluster Data Model Training Model Serving Raw Data Model Params Alpakka implicit val system = ActorSystem("ModelServing") implicit val materializer = ActorMaterializer() implicit val executionContext = system.dispatcher val modelProcessor = new ModelProcessor val scorer = new Scorer(modelProcessor) val modelStream: Source[ModelImpl, Consumer.Control] = Consumer.atMostOnceSource(modelConsumerSettings, Subscriptions.topics(modelTopic)) .map(input => Model.parseBytes(input.value())) .filter(model => model.valid).map(_.get) .map(model => ModelImpl.findModel(model)) .filter(model => model.valid).map(_.get) val dataStream: Source[Record, Consumer.Control] = Consumer.atMostOnceSource(dataConsumerSettings, Subscriptions.topics(rawDataTopic)) .map(input => DataRecord.parseBytes(input.value())) .filter(record => record.valid).map(_.get)

31. implicit val system = ActorSystem("ModelServing") implicit val materializer = ActorMaterializer() implicit val executionContext = system.dispatcher val modelProcessor = new ModelProcessor val scorer = new Scorer(modelProcessor) val modelStream: Source[ModelImpl, Consumer.Control] = Consumer.atMostOnceSource(modelConsumerSettings, Subscriptions.topics(modelTopic)) .map(input => Model.parseBytes(input.value())) .filter(model => model.valid).map(_.get) .map(model => ModelImpl.findModel(model)) .filter(model => model.valid).map(_.get) val dataStream: Source[Record, Consumer.Control] = Consumer.atMostOnceSource(dataConsumerSettings, Subscriptions.topics(rawDataTopic)) .map(input => DataRecord.parseBytes(input.value())) .filter(record => record.valid).map(_.get) Akka Cluster Data Model Training Model Serving Raw Data Model Params Alpakka

35. val model = ModelStage(modelProcessor) def keepModelMaterializedValue[M1,M2,M3](m1:M1, m2:M2, m3:M3):M3 = m3 val modelPredictions: Source[Option[Double], ReadableModelStateStore] = Source.fromGraph { GraphDSL.create(dataStream, modelStream, model)( keepModelMaterializedValue) { implicit builder => (d, m, w) => import GraphDSL.Implicits._ // Wire the input streams with the model stage (2 in, 1 out) // dataStream --> | | // | model | -> predictions // modelStream -> | | d ~> w.dataRecordIn m ~> w.modelRecordIn SourceShape(w.scoringResultOut) } } Akka Cluster Data Model Training Model Serving Raw Data Model Params Alpakka

36. val model = ModelStage(modelProcessor) def keepModelMaterializedValue[M1,M2,M3](m1:M1, m2:M2, m3:M3):M3 = m3 val modelPredictions: Source[Option[Double], ReadableModelStateStore] = Source.fromGraph { GraphDSL.create(dataStream, modelStream, model)( keepModelMaterializedValue) { implicit builder => (d, m, w) => import GraphDSL.Implicits._ // Wire the input streams with the model stage (2 in, 1 out) // dataStream --> | | // | model | -> predictions // modelStream -> | | d ~> w.dataRecordIn m ~> w.modelRecordIn SourceShape(w.scoringResultOut) } } Akka Cluster Data Model Training Model Serving Raw Data Model Params Alpakka case class ModelStage(modelProcessor: …) extends GraphStageWithMaterializedValue[…, …] { val scorer = new Scorer(modelProcessor) val dataRecordIn = Inlet[Record]("dataRecordIn") val modelRecordIn = Inlet[ModelImpl]("modelRecordIn") val scoringResultOut = Outlet[ScoredRecord]("scoringOut") … setHandler(dataRecordIn, new InHandler { override def onPush(): Unit = { val record = grab(dataRecordIn) val newRecord = new ScoredRecord(scorer.score(record),record)) push(scoringResultOut, Some(newRecord)) pull(dataRecordIn) }) … }

40. d ~> w.dataRecordIn m ~> w.modelRecordIn SourceShape(w.scoringResultOut) } } val materializedReadableModelStateStore: ReadableModelStateStore = modelPredictions .to(Sink.ignore) // we do not read the results directly .run() // we run the stream, materializing the stage's StateStore Akka Cluster Data Model Training Model Serving Raw Data Model Params Alpakka

41. •Scale scoring with workers and routers, across a cluster •Persist actor state with Akka Persistence •Connect to almost anything with Alpakka • Enterprise Suite •for production Other Concerns Akka Cluster Model Serving Other Logic Alpakka Alpakka Router Worker Worker Worker Worker Stateful Logic Persistence actor state storage Akka Cluster Data Model Training Model Serving Other Logic Raw Data Model Params Final Records Alpakka Alpakka storage

42. •Extremely low latency •Minimal I/O and memory overhead •Reactive Streams backpressure •M producers, N consumers, but directly connected (sort of) •Use Akka Persistence for durable state Go Direct or Through Kafka? Akka Cluster Model Serving Other Logic Alpakka Alpakka Model Serving Other Logic Scored Records vs. ? •Higher latency (including queue depth) •Higher I/O overhead •Very large buﬀer (disk size) •M producers, N consumers, completely disconnected •Automatic durability (topics on disk)

43. •Use for smaller, faster messaging between “components”. •Watch for consumer “backup” •Use Akka Persistence for important state! Akka Cluster Model Serving Other Logic Alpakka Alpakka Model Serving Other Logic Scored Records vs. ? •Use for larger volumes, more course-grained service interactions •Plan partitioning and replication carefully Go Direct or Through Kafka?

44. Lightbend Fast Data Platform

45. lightbend.com/fast-data-platform

46. Check out these resources: Dean’s book Webinars etc. Fast Data Architectures   for Streaming Applications Getting Answers Now from Data Sets that Never End By Dean Wampler, Ph. D., VP of Fast Data Engineering 46 LIGHTBEND.COM/LEARN

47. Serving Machine Learning Models A Guide to Architecture, Stream Processing Engines,   and Frameworks By Boris Lublinsky, Fast Data Platform Architect 47 LIGHTBEND.COM/LEARN

48. For even more information: Tutorial - Building Streaming Applications with Kafka:  Software Architecture Conference New York Strata Data Conference San Jose Strata Data Conference London My talk:  Strata Data Conference San Jose 48

Streaming Microservices With Akka Streams And Kafka Streams

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Streaming Microservices With Akka Streams And Kafka Streams

Similar to Streaming Microservices With Akka Streams And Kafka Streams (20)

More from Lightbend

More from Lightbend (20)

Recently uploaded

Recently uploaded (20)

Streaming Microservices With Akka Streams And Kafka Streams