Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache Kafka

4,770 views

Published on

Things were easier when all our data used to be offline, analyzed overnight in batches. Now our data is online, in motion, and generated constantly. For architects, developers and their businesses, this means that there is an urgent need for tools and applications that can deliver real-time (or near real-time) streaming ETL capabilities.

In this session by Konrad Malawski, author, speaker and Senior Akka Engineer at Lightbend, you will learn how to build these streaming ETL pipelines with Akka Streams, Alpakka and Apache Kafka, and why they matter to enterprises that are increasingly turning to streaming Fast Data applications.

Published in: Software

Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache Kafka

  1. 1. Konrad `ktoso` Malawski Akka Team, Reactive Streams TCK, Persistence, HTTP, Remoting / Cluster
  2. 2. Make building powerful concurrent & distributed applications simple. Akka is a toolkit and runtime for building highly concurrent, distributed, and resilient message-driven applications on the JVM
  3. 3. Actors – simple & high performance concurrency Cluster / Remoting – location transparency, resilience Cluster tools – and more prepackaged patterns Streams – back-pressured stream processing Persistence – Event Sourcing HTTP – complete, fully async and reactive HTTP Server Official Kafka, Cassandra, DynamoDB integrations, tons more in the community Complete Java & Scala APIs for all features What’s in the toolkit?
  4. 4. “Stream” has many meanings
  5. 5. akka streams Asynchronous back pressured stream processing Source Sink Flow
  6. 6. akka streams Asynchronous back pressured stream processing Source Sink (possible) asynchronous boundaries Flow
  7. 7. akka streams Asynchronous back pressured stream processing Source Sink 10 msg/s 1 msg/s OutOfMemoryError!! Flow
  8. 8. akka streams Asynchronous back pressured stream processing Source Sink 10 msg/s 1 msg/s hand me 3 morehand me 3 more 1 msg/s Flow
  9. 9. akka streams Not only linear streams Source SinkFlow Source Sink Flow Flow
  10. 10. And the many meanings it carries. Reactive
  11. 11. The many meanings of Reactive reactivemanifesto.org
  12. 12. The many meanings of Reactive
  13. 13. “Not-quite-Reactive-System” The reason we started researching into transparent to users flow control.
  14. 14. I’ll “just-slap-a-Kafka” in there and problem solved!
  15. 15. If only there was a way… complete sources on github
  16. 16. If only there was a way… complete sources on github
  17. 17. If only there was a way… complete sources on github
  18. 18. If only there was a way… complete sources on github
  19. 19. Reactive Streams Reactive Streams is an initiative to provide a standard for asynchronous stream processing with non-blocking back pressure. This encompasses efforts aimed at runtime environments as well as network protocols http://www.reactive-streams.org
  20. 20. Reactive Streams A buiding-block of Reactive Systems, not the “entire story”.
  21. 21. Reactive Streams Reactive Streams is an initiative to provide a standard for asynchronous stream processing with non-blocking back pressure. This encompasses efforts aimed at runtime environments as well as network protocols http://www.reactive-streams.org
  22. 22. Part of JDK 9 (!) java.util.concurrent.Flow http://openjdk.java.net/projects/jdk9/
  23. 23. Part of JDK 9 (!) java.util.concurrent.Flow http://openjdk.java.net/projects/jdk9/ java.util.concurrent.Flow.* is exactly Reactive Streams. It follows the RS specification 1:1, and implementations must be verified using the RS TCK.
  24. 24. Reactive Streams RS Library A RS library B async boundary
  25. 25. JEP-266 – NOW RELEASED - IN JDK9! public final class Flow { private Flow() {} // uninstantiable @FunctionalInterface public static interface Publisher<T> { public void subscribe(Subscriber<? super T> subscriber); } public static interface Subscriber<T> { public void onSubscribe(Subscription subscription); public void onNext(T item); public void onError(Throwable throwable); public void onComplete(); } public static interface Subscription { public void request(long n); public void cancel(); } public static interface Processor<T,R> extends Subscriber<T>, Publisher<R> { } } NOW
  26. 26. Reactive Streams / j.u.c.Flow RS Library A RS library B async boundary “Make building powerful concurrent & distributed applications simple.”
  27. 27. Reactive Streams - explained in 1 slide
  28. 28. Akka Streams native JDK9 support (Akka 2.5.5) java.util.concurrent.Flow support is merged, and about to be released this week (before JavaOne next week). https://github.com/akka/akka/pull/23650
  29. 29. Complete and awesome Java and Scala APIs (Just like everything in Akka) Akka Streams
  30. 30. Akka Streams in 20 seconds: Source<Integer, NotUsed> source = null;
 
 Flow<Integer, String, NotUsed> flow =
 Flow.<Integer>create().map((Integer n) -> n.toString());
 
 Sink<String, CompletionStage<Done>> sink =
 Sink.foreach(str -> System.out.println(str));
 
 RunnableGraph<NotUsed> runnable = source.via(flow).to(sink);
 
 runnable.run(materializer);

  31. 31. Akka Streams in 20 seconds: CompletionStage<String> firstString =
 Source.single(1)
 .map(n -> n.toString())
 .runWith(Sink.head(), materializer);

  32. 32. Source.single(1).map(i -> i.toString).runWith(Sink.head()) // types: _ Source<Int, NotUsed> Flow<Int, String, NotUsed> Sink<String, CompletionStage<String>> Akka Streams in 20 seconds:
  33. 33. Source.single(1).map(i -> i.toString).runWith(Sink.head()) // types: _ Source<Int, NotUsed> Flow<Int, String, NotUsed> Sink<String, CompletionStage<String>> Akka Streams in 20 seconds:
  34. 34. Akka Streams in 20 seconds:
  35. 35. Akka Streams in 20 seconds:
  36. 36. Akka Streams core principles:
  37. 37. Akka Streams core principles:
  38. 38. AlpakkaA community for Streams connectors http://blog.akka.io/integrations/2016/08/23/intro-alpakka
  39. 39. Alpakka – a community for Stream connectors Existing Alpakka MQTT AMQP/ RabbitMQ SSE Cassandra FTP/ SFTP JSON, XML, CSV, RecordIO IronMq Files AWS DynamoDB AWS SNS,SQS, S3, Kinesis,Lambda JMS Azure Storage Queue TCP In Akka Actors Reactive Streams Java Streams Basic File IO External Apache Geode Eventuate FS2 Akka Http HBase Google Cloud Pub/Sub Camel Kafka MongoDB Azure IoT http://developer.lightbend.com/docs/alpakka/current/index.html
  40. 40. Alpakka – a community for Stream connectors http://developer.lightbend.com/docs/alpakka/current/ A few months ago we only had these…
  41. 41. Alpakka – a community for Stream connectors http://developer.lightbend.com/docs/alpakka/current/ Now we have these! And still growing!
  42. 42. Alpakka – a community for Stream connectors http://developer.lightbend.com/docs/alpakka/current/ Now we have these! And still growing!
  43. 43. Getting things DONE, with Alpakka http://doc.akka.io/docs/akka-stream-kafka/current/consumer.html#example-code End-to-end Streaming from Kafka to Streaming HTTP endpoint in 5 minutes
  44. 44. Getting things DONE, with Alpakka http://doc.akka.io/docs/akka-stream-kafka/current/consumer.html#example-code 1. Skim Alpakka docs, find the dependency (dependencies) you need
  45. 45. Getting things DONE, with Alpakka http://doc.akka.io/docs/akka-stream-kafka/current/consumer.html#example-code 1. Skim Alpakka docs, find the dependency (dependencies) you need 2. “Pick your Source[T, …]” (from docs, or APIs) Example Kafka sources (different semantics):
  46. 46. Getting things DONE, with Alpakka http://doc.akka.io/docs/akka-stream-kafka/current/consumer.html#example-code 1. Skim Alpakka docs, find the dependency (dependencies) you need 2. “Pick your Source[T, …]” (from docs, or APIs) 3. Connect that Source to some Sink Example Kafka sources (different semantics):
  47. 47. Getting things DONE, with Alpakka http://doc.akka.io/docs/akka-stream-kafka/current/consumer.html#example-code 1. Skim Alpakka docs, find the dependency (dependencies) you need 2. “Pick your Source[T, …]” (from docs, or APIs) 3. Connect that Source to some Sink Example Kafka sources (different semantics): 4. Profit.
  48. 48. Getting things DONE, with Alpakka http://doc.akka.io/docs/akka-stream-kafka/current/consumer.html#example-code 1. Skim Alpakka docs, find the dependency (dependencies) you need 2. “Pick your Source[T, …]” (“recursively walk FTP and import files”) 3. Connect that Source to some Sink 4. Profit. .flatMapMerge(3, file => Ftp.fromPath(path, settings)…))(
  49. 49. )( .flatMapMerge(3, file => Ftp.fromPath(path, settings)…) Getting things DONE, with Alpakka http://doc.akka.io/docs/akka-stream-kafka/current/consumer.html#example-code 1. Skim Alpakka docs, find the dependency (dependencies) you need 2. “Pick your Source[T, …]” (“recursively walk FTP and import files”) 3. Connect that Source to some Sink 4. Profit.
  50. 50. What When Why
  51. 51. Akka, a toolbox – when to use which tool? http://developer.lightbend.com/docs/alpakka/current/ modeling power complexity actors streams (completable) futures threads, locks undefined complex concurrency models blocking non-reactive tech typed actors (preview available)
  52. 52. Actors ❤ Streams ==True Actors – perfect for managing state, things with lifecycles, restarting, running many separate instances in parallel, best for gathering data from multiple places and resiliency
 (Reactive) Streams – best suited for less-dynamic layouts, has simple lifecycle (running,completed,failed), best for moving data from A to B blog.colinbreck.com/integrating-akka-streams-and-akka-actors-part-i/ Excellent post explaining this (look up the author, Colin Breck):
  53. 53. But, distributed systems? Stream-in-Actor C-1 Shard C Akka Streams is a local abstraction Combine with Akka Cluster for distributed superpowers! Stream-in-Actor A-1 Shard A Stream-in-Actor B-1 Shard B Stream-in-Actor B-2
  54. 54. Roadmap update
  55. 55. Next steps for Akka Release stable new Akka Remoting (over 700.000+ msg/s (!)), (it is built using Akka Streams, Aeron). Even more integrations for Akka Streams stages, project Alpakka. Collaborating with IBM to deliver integrations with new tech (S3, JDBC). Continued polish of Reactive Kafka important part of Alpakka. Plans to expand beyond Kafka too! Akka Typed Cluster, Persistence, Streams integration preview: now. Preview of working Akka HTTP 2.0. Akka HTTP powering Play Framework by default. Streaming in Akka HTTPNext up for Akka and Alpakka
  56. 56. Multi Data Center support – coming soon More details soon… - Active + Active “Cluster Sharding” - Proximity aware routers? - Talk to us about your use cases :) - …?
  57. 57. Next steps for AkkaStreaming in Akka HTTPAkka Monitoring & Tracing developer.lightbend.com/docs/monitoring/latest/home.html + DataDog || StatsD || Graphite || …anything! Working Zipkin support Working Jaeger support
  58. 58. Ready to adopt on prod?
  59. 59. Totally, go for it. “If the JDK adopting Reactive Streams isn’t the best long-term endorsement of stability and commitment… …then, I don’t know what is.“
  60. 60. Read more tutorials and deep-dives: http://blog.akka.io/ http://developer.lightbend.com/guides Next up for Akka and Alpakka
  61. 61. Further reading: Akka docs: akka.io/docs Alpakka docs: Reactive Streams: reactive-streams.org Free O’Reilly report – bit.ly/why-reactive Example Sources: ktoso/akka-streams-alpakka-talk-demos Konrad ktoso@lightbend.com Malawski http://kto.so / @ktosopl
  62. 62. Thanks / Questions? Akka docs: akka.io/docs Alpakka docs: Reactive Streams: reactive-streams.org Free O’Reilly report – bit.ly/why-reactive Example Sources: ktoso/akka-streams-alpakka-talk-demos Konrad ktoso@lightbend.com Malawski http://kto.so / @ktosopl

×