Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

1,577 views

Published on

Taking real-time processing to the mainstream

Published in: Data & Analytics
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

  1. 1. 1Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Introducing Kafka’s Streams API Taking real-time processing to the mainstream Michael Noll <michael@confluent.io> Product manager, Confluent
  2. 2. 2Apache Kafka meetup, Munich, Germany, Jan 25, 2017
  3. 3. 3Apache Kafka meetup, Munich, Germany, Jan 25, 2017
  4. 4. 4Apache Kafka meetup, Munich, Germany, Jan 25, 2017
  5. 5. 5Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Our Dream Our Reality
  6. 6. 6Confidential Kafka’s Streams API Taking real-time processing to the mainstream
  7. 7. 7Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Taking real-time processing to the mainstream Kafka’s Streams API • Powerful yet easy-to-use library to build stream processing apps • Apps are standard Java applications that run on client machines • Part of open source Apache Kafka, introduced in 0.10+ • https://github.com/apache/kafka/tree/trunk/streams Streams API Your App Kafka Cluster
  8. 8. 8Apache Kafka meetup, Munich, Germany, Jan 25, 2017 <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-streams</artifactId> <version>0.10.1.1</version> </dependency> Build Applications, not Clusters!
  9. 9. 9Apache Kafka meetup, Munich, Germany, Jan 25, 2017 “Cluster to go”: elastic, scalable, distributed, fault-tolerant, secure apps
  10. 10. 10Apache Kafka meetup, Munich, Germany, Jan 25, 2017 ”Database to go”: tables, state management, interactive queries
  11. 11. 11Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Equally viable for S / M / L / XL / XXL use cases Ok. Ok. Ok.
  12. 12. 12Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Runs everywhere: from containers to cloud
  13. 13. 13Apache Kafka meetup, Munich, Germany, Jan 25, 2017 When to use Kafka’s Streams API Use case examples • Customer 360-degree view • Fleet or inventory management • Fraud detection • Real-time monitoring & intelligence • Location-based marketing • Virtual Reality (avatar replication) • <and more> To build real-time applications for your core business Scenarios • Microservices • Fast Data apps for small and big data • Reactive applications • Continuous queries and transformations • Event-triggered processes • The “T” in ETL • <and more>
  14. 14. 14Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Some public use cases in the wild & external articles • Why Kafka Streams: towards a real-time streaming architecture, by Sky Betting and Gaming • http://engineering.skybettingandgaming.com/2017/01/23/streaming-architectures/ • Applying Kafka’s Streams API for social messaging at LINE Corp. • http://developers.linecorp.com/blog/?p=3960 • Production pipeline at LINE, a social platform based in Japan with 220+ million users • Microservices and Reactive Applications at Capital One • https://speakerdeck.com/bobbycalderwood/commander-decoupled-immutable-rest-apis-with-kafka-streams • Containerized Kafka Streams applications in Scala, by Hive Streaming • https://www.madewithtea.com/processing-tweets-with-kafka-streams.html • Geo-spatial data analysis • http://www.infolace.com/blog/2016/07/14/simple-spatial-windowing-with-kafka-streams/ • Language classification with machine learning • https://dzone.com/articles/machine-learning-with-kafka-streams
  15. 15. 15Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Kafka Summit NYC, May 09 Here, the community will share latest Kafka Streams use cases. http://kafka-summit.org/
  16. 16. 16Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Do more with less
  17. 17. 17Confidential Architecture comparison: use case example Real-time dashboard for security monitoring “Which of my data centers are under attack?”
  18. 18. 18Apache Kafka meetup, Munich, Germany, Jan 25, 2017
  19. 19. 19Apache Kafka meetup, Munich, Germany, Jan 25, 2017 With Streams API
  20. 20. 20Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Organizational benefits: decouple teams and roadmaps, scale people
  21. 21. 21Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Available APIs
  22. 22. 22Apache Kafka meetup, Munich, Germany, Jan 25, 2017 • API option 1: DSL (declarative) KStream<Integer, Integer> input = builder.stream("numbers-topic"); // Stateless computation KStream<Integer, Integer> doubled = input.mapValues(v -> v * 2); // Stateful computation KTable<Integer, Integer> sumOfOdds = input .filter((k,v) -> v % 2 != 0) .selectKey((k, v) -> 1) .groupByKey() .reduce((v1, v2) -> v1 + v2, "sum-of-odds"); The preferred API for most use cases. 9 out of 10 users pick the DSL. Particularly appeals to: • Fans of Scala, functional programming • Users familiar with e.g. Spark
  23. 23. 23Apache Kafka meetup, Munich, Germany, Jan 25, 2017 • API option 2: Processor API (imperative) class PrintToConsoleProcessor implements Processor<K, V> { @Override public void init(ProcessorContext context) {} @Override void process(K key, V value) { System.out.println("Got value " + value); } @Override void punctuate(long timestamp) {} @Override void close() {} } Full flexibility but more manual work Appeals to: • Users who require functionality that is not yet available in the DSL • Users familiar with e.g. Storm, Samza • Still, check out the DSL!
  24. 24. 24Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Writing and running your first application • Preparation: Ensure Kafka cluster is accessible, has data to process • Step 1: Write the application code in Java or Scala, see next slide • Great starting point: https://github.com/confluentinc/examples • Documentation: http://docs.confluent.io/current/streams/ • Step 2: Run the application • During development: from your IDE, from CLI … (pro tip: Application Reset Tool is great for playing around) • In production: e.g. bundle as fat jar, then `java -cp my-fatjar.jar com.example.MyStreamsApp` • http://docs.confluent.io/current/streams/developer-guide.html#running-a-kafka-streams-application
  25. 25. 25Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Example: complete app, ready for production at large-scale Word Count App configuration Define processing (here: WordCount) Start processing
  26. 26. 26Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Key concepts
  27. 27. 27Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Key concepts
  28. 28. 28Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Key concepts
  29. 29. 29Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Key concepts Kafka’s data model Kafka’s Streams API
  30. 30. 30Confidential Streams and Tables Stream Processing meets Databases
  31. 31. 31Apache Kafka meetup, Munich, Germany, Jan 25, 2017
  32. 32. 32Apache Kafka meetup, Munich, Germany, Jan 25, 2017
  33. 33. 33Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Key observation: close relationship between Streams and Tables http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple http://docs.confluent.io/current/streams/concepts.html#duality-of-streams-and-tables
  34. 34. 34Apache Kafka meetup, Munich, Germany, Jan 25, 2017
  35. 35. 35Apache Kafka meetup, Munich, Germany, Jan 25, 2017
  36. 36. 36Apache Kafka meetup, Munich, Germany, Jan 25, 2017
  37. 37. 37Apache Kafka meetup, Munich, Germany, Jan 25, 2017
  38. 38. 38Apache Kafka meetup, Munich, Germany, Jan 25, 2017
  39. 39. 39Apache Kafka meetup, Munich, Germany, Jan 25, 2017
  40. 40. 40Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Key features
  41. 41. 41Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Key features in 0.10 • Native, 100%-compatible Kafka integration
  42. 42. 42Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Native, 100% compatible Kafka integration Read from Kafka Write to Kafka
  43. 43. 43Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Key features in 0.10 • Native, 100%-compatible Kafka integration • Secure stream processing using Kafka’s security features
  44. 44. 44Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Secure stream processing with the Streams API • Your applications can leverage all client-side security features in Apache Kafka • Security features include: • Encrypting data-in-transit between applications and Kafka clusters • Authenticating applications against Kafka clusters (“only some apps may talk to the production cluster”) • Authorizing application against Kafka clusters (“only some apps may read data from sensitive topics”) Streams API Your App Kafka Cluster ”I’m the Payments app!” “Ok, you may read the Purchases topic.” Data encryption
  45. 45. 45Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Key features in 0.10 • Native, 100%-compatible Kafka integration • Secure stream processing using Kafka’s security features • Elastic and highly scalable • Fault-tolerant
  46. 46. 46Apache Kafka meetup, Munich, Germany, Jan 25, 2017
  47. 47. 47Apache Kafka meetup, Munich, Germany, Jan 25, 2017
  48. 48. 48Apache Kafka meetup, Munich, Germany, Jan 25, 2017
  49. 49. 49Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Key features in 0.10 • Native, 100%-compatible Kafka integration • Secure stream processing using Kafka’s security features • Elastic and highly scalable • Fault-tolerant • Stateful and stateless computations
  50. 50. 50Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Stateful computations • Stateful computations include aggregations (e.g. counting), joins, and windowing • State stores are the backbone of state management • … are local for best performance • … are continuously backed up to Kafka to enable elasticity and fault-tolerance • ... are per stream task for isolation, think: share-nothing • Pluggable storage engines • Default: RocksDB (a key-value store) to allow for local state that is larger than available RAM • You can also use your own storage engine • From the user perspective • DSL: no need to worry about anything, state management is automatically being done for you • Processor API: direct access to state stores – very flexible but more manual work
  51. 51. 51Apache Kafka meetup, Munich, Germany, Jan 25, 2017
  52. 52. 52Apache Kafka meetup, Munich, Germany, Jan 25, 2017
  53. 53. 53Apache Kafka meetup, Munich, Germany, Jan 25, 2017
  54. 54. 54Apache Kafka meetup, Munich, Germany, Jan 25, 2017
  55. 55. 55Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Use case: real-time, distributed joins at large scale
  56. 56. 56Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Use case: real-time, distributed joins at large scale
  57. 57. 57Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Use case: real-time, distributed joins at large scale
  58. 58. 58Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Key features in 0.10 • Native, 100%-compatible Kafka integration • Secure stream processing using Kafka’s security features • Elastic and highly scalable • Fault-tolerant • Stateful and stateless computations • Interactive queries
  59. 59. 59Apache Kafka meetup, Munich, Germany, Jan 25, 2017
  60. 60. 60Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Key features in 0.10 • Native, 100%-compatible Kafka integration • Secure stream processing using Kafka’s security features • Elastic and highly scalable • Fault-tolerant • Stateful and stateless computations • Interactive queries • Time model
  61. 61. 61Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Time
  62. 62. 62Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Time A C B
  63. 63. 63Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Key features in 0.10 • Native, 100%-compatible Kafka integration • Secure stream processing using Kafka’s security features • Elastic and highly scalable • Fault-tolerant • Stateful and stateless computations • Interactive queries • Time model • Supports late-arriving and out-of-order data
  64. 64. 64Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Out-of-order and late-arriving data: example Users with mobile phones enter airplane, lose Internet connectivity Emails are being written during the 8h flight Internet connectivity is restored, phones will send queued emails now, though with an 8h delay Bob writes Alice an email at 2 P.M. Bob’s email is finally being sent at 10 P.M.
  65. 65. 65Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Key features in 0.10 • Native, 100%-compatible Kafka integration • Secure stream processing using Kafka’s security features • Elastic and highly scalable • Fault-tolerant • Stateful and stateless computations • Interactive queries • Time model • Supports late-arriving and out-of-order data • Windowing
  66. 66. 66Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Windowing • To group related events in a stream • Use case examples: • Time-based analysis of ad impressions (”number of ads clicked in the past hour”) • Monitoring statistics of telemetry data (“1min/5min/15min averages”) • Analyzing user browsing sessions on a news site Input data, where colors represent different users events Rectangles denote different event-time windows processing-time event-time windowing alice bob dave
  67. 67. 67Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Key features in 0.10 • Native, 100%-compatible Kafka integration • Secure stream processing using Kafka’s security features • Elastic and highly scalable • Fault-tolerant • Stateful and stateless computations • Interactive queries • Time model • Supports late-arriving and out-of-order data • Windowing • Millisecond processing latency, no micro-batching • At-least-once processing guarantees (exactly-once is in the works as we speak)
  68. 68. 68Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Roadmap Outlook
  69. 69. 69Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Roadmap outlook for Kafka Streams Upcoming in Confluent 3.2 & Apache Kafka 0.10.2 • Sessionization aka “session windows” -- e.g. for analyzing user browsing behavior • Global KTables (vs. today’s partitioned KTables) – e.g. for convenient facts-to-dimensions joins • Now you can use newer versions of the Streams API against older clusters, too • Further operational metrics to improve monitoring and 24x7 operations of apps Feature highlight for 2017 • Exactly-Once processing semantics • But much more to come!
  70. 70. 70Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Wrapping Up
  71. 71. 71Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Where to go from here • Kafka’s Streams API is available in Confluent Platform 3.1 and in Apache Kafka 0.10.1 • http://www.confluent.io/download • Demo applications: https://github.com/confluentinc/examples • Interactive Queries, Joins, Security, Windowing, Avro integration, … • Confluent documentation: http://docs.confluent.io/current/streams/ • Quickstart, Concepts, Architecture, Developer Guide, FAQ • Recorded talks • Introduction to Kafka’s Streams API: http://www.youtube.com/watch?v=o7zSLNiTZbA • Application Development and Data in the Emerging World of Stream Processing (higher level talk): https://www.youtube.com/watch?v=JQnNHO5506w
  72. 72. 72Confidential Thank You

×