Introducing Kafka's Streams API

68 views

Published on

Modern businesses have data at their core, and this data is changing continuously. How can we harness this torrent of information in real-time? The answer is stream processing, and the technology that has since become the core platform for streaming data is Apache Kafka. Among the thousands of companies that use Kafka to transform and reshape their industries are the likes of Netflix, Uber, PayPal, and AirBnB, but also established players such as Goldman Sachs, Cisco, and Oracle.
Unfortunately, today’s common architectures for real-time data processing at scale suffer from complexity: there are many technologies that need to be stitched and operated together, and each individual technology is often complex by itself. This has led to a strong discrepancy between how we, as engineers, would like to work vs. how we actually end up working in practice.
In this session we talk about how Apache Kafka helps you to radically simplify your data processing architectures. We cover how you can now build normal applications to serve your real-time processing needs — rather than building clusters or similar special-purpose infrastructure — and still benefit from properties such as high scalability, distributed computing, and fault-tolerance, which are typically associated exclusively with cluster technologies. Notably, we introduce Kafka’s Streams API, its abstractions for streams and tables, and its recently introduced Interactive Queries functionality. As we will see, Kafka makes such architectures equally viable for small, medium, and large scale use cases.

Published in: Software
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
68
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Introducing Kafka's Streams API

  1. 1. 1Confidential Introducing Kafka’s Streams API Stream processing made simple Target audience: technical staff, developers, architects Expected duration for full deck: 45 minutes
  2. 2. 2Confidential 0.10 Data processing (Streams API) 0.9 Data integration (Connect API) Intra-cluster replication 0.8 Apache Kafka: birthed as a messaging system, now a streaming platform 2012 2014 2015 2016 2017 Cluster mirroring, data compression 0.7 2013
  3. 3. 3Confidential Kafka’s Streams API: the easiest way to process data in Apache Kafka Key Benefits of Apache Kafka’s Streams API • Build Apps, Not Clusters: no additional cluster required • Cluster to go: elastic, scalable, distributed, fault-tolerant, secure • Database to go: tables, local state, interactive queries • Equally viable for S / M / L / XL / XXL use cases • “Runs Everywhere”: integrates with your existing deployment strategies such as containers, automation, cloud Part of open source Apache Kafka, introduced in 0.10+ • Powerful client library to build stream processing apps • Apps are standard Java applications that run on client machines • https://github.com/apache/kafka/tree/trunk/streams Streams API Your App Kafka Cluster
  4. 4. 4Confidential Kafka’s Streams API: Unix analogy $  cat  <  in.txt | grep “apache”  | tr a-­‐z  A-­‐Z  > out.txt Kafka  Cluster Connect  API Streams  API
  5. 5. 5Confidential Streams API in the context of Kafka Streams API Your App Kafka Cluster ConnectAPI ConnectAPI OtherSystems OtherSystems
  6. 6. 6Confidential When to use Kafka’s Streams API • Mainstream Application Development • To build core business applications • Microservices • Fast Data apps for small and big data • Reactive applications • Continuous queries and transformations • Event-triggered processes • The “T” in ETL • <and more> Use case examples • Real-time monitoring and intelligence • Customer 360-degree view • Fraud detection • Location-based marketing • Fleet management • <and more>
  7. 7. 7Confidential Some public use cases in the wild & external articles • Applying Kafka’s Streams API for internal message delivery pipeline at LINE Corp. • http://developers.linecorp.com/blog/?p=3960 • Kafka Streams in production at LINE, a social platform based in Japan with 220+ million users • Microservices and reactive applications at Capital One • https://speakerdeck.com/bobbycalderwood/commander-decoupled-immutable-rest-apis-with-kafka-streams • User behavior analysis • https://timothyrenner.github.io/engineering/2016/08/11/kafka-streams-not-looking-at-facebook.html • Containerized Kafka Streams applications in Scala • https://www.madewithtea.com/processing-tweets-with-kafka-streams.html • Geo-spatial data analysis • http://www.infolace.com/blog/2016/07/14/simple-spatial-windowing-with-kafka-streams/ • Language classification with machine learning • https://dzone.com/articles/machine-learning-with-kafka-streams
  8. 8. 8Confidential Do more with less
  9. 9. 9Confidential Architecture comparison: use case example Real-time dashboard for security monitoring “Which of my data centers are under attack?”
  10. 10. 10Confidential Architecture comparison: use case example Other App Dashboard Frontend App Other App 1 Capture business events in Kafka 2 Must process events with separate cluster (e.g. Spark) 4 Other apps access latest results by querying these DBs 3 Must share latest results through separate systems (e.g. MySQL) Before: Undue complexity, heavy footprint, many technologies, split ownership with conflicting priorities Your “Job” Other App Dashboard Frontend App Other App 1 Capture business events in Kafka 2 Process events with standard Java apps that use Kafka Streams 3 Now other apps can directly query the latest results With Kafka Streams: simplified, app-centric architecture, puts app owners in control Kafka Streams Your App
  11. 11. 11Confidential
  12. 12. 12Confidential
  13. 13. 13Confidential How do I install the Streams API? • There is and there should be no “installation” – Build Apps, Not Clusters! • It’s a library. Add it to your app like any other library. <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-­‐streams</artifactId> <version>0.10.1.1</version> </dependency>
  14. 14. 14Confidential “But wait a minute – where’s THE CLUSTER to process the data?” • No cluster needed – Build Apps, Not Clusters! • Unlearn bad habits: “do cool stuff with data ≠ must have cluster” Ok. Ok. Ok.
  15. 15. 15Confidential Organizational benefits: decouple teams and roadmaps, scale people
  16. 16. 16Confidential Organizational benefits: decouple teams and roadmaps, scale people Infrastructure Team (Kafka as a shared, multi-tenant service) Fraud detection app Payments team Recommenda tions app Mobile team Security alerts app Operations team ...more apps... ...
  17. 17. 17Confidential How do I package, deploy, monitor my apps? How do I …? • Whatever works for you. Stick to what you/your company think is the best way. • No magic needed. • Why? Because an app that uses the Streams API is…a normal Java app.
  18. 18. 18Confidential Available APIs
  19. 19. 19Confidential The API is but the tip of the iceberg API,  coding Org.  processes Reality™ Deployment Operations Security … Architecture Debugging
  20. 20. 20Confidential • API option 1: DSL (declarative) KStream<Integer,  Integer>  input  = builder.stream("numbers-­‐topic"); //  Stateless  computation KStream<Integer,  Integer>  doubled  = input.mapValues(v  -­‐>  v  *  2); //  Stateful  computation KTable<Integer,  Integer>  sumOfOdds =  input .filter((k,v)  -­‐>  v  %  2  !=  0) .selectKey((k,  v)  -­‐>  1) .groupByKey() .reduce((v1,  v2)  -­‐>  v1  +  v2,  "sum-­‐of-­‐odds"); The preferred API for most use cases. Particularly appeals to: • Fans of Scala, functional programming • Users familiar with e.g. Spark
  21. 21. 21Confidential • API option 2: Processor API (imperative) class  PrintToConsoleProcessor implements  Processor<K,  V>  { @Override public  void  init(ProcessorContext context)  {} @Override void  process(K  key,  V  value)  {   System.out.println("Got  value  "  +  value);   } @Override void  punctuate(long  timestamp)  {} @Override void  close()  {} } Full flexibility but more manual work Appeals to: • Users who require functionality that is not yet available in the DSL • Users familiar with e.g. Storm, Samza • Still, check out the DSL!
  22. 22. 22Confidential When to use Kafka Streams vs. Kafka’s “normal” consumer clients Kafka Streams • Basically all the time • Basically all the time • Basically all the time • Basically all the time • Basically all the time • Basically all the time • Basically all the time • Basically all the time • Basically all the time • Basically all the time • Basically all the time Kafka consumer clients (Java, C/C++, Python, Go, …) • When you must interact with Kafka at a very low level and/or in a very special way • Example: When integrating your own stream processing tool (Spark, Storm) with Kafka.
  23. 23. 23Confidential Code comparison Featuring Kafka with Streams API <-> Spark Streaming
  24. 24. 24Confidential ”My WordCount is better than your WordCount” (?) Kafka Spark These isolated code snippets are nice (and actually quite similar) but they are not very meaningful. In practice, we also need to read data from somewhere, write data back to somewhere, etc.– but we can see none of this here.
  25. 25. 25Confidential WordCount in Kafka Word Count
  26. 26. 26Confidential Compared to: WordCount in Spark 2.0 1 2 3 Runtime model leaks into processing logic (here: interfacing from Spark with Kafka)
  27. 27. 27Confidential Compared to: WordCount in Spark 2.0 4 5 Runtime model leaks into processing logic (driver vs. executors)
  28. 28. 28Confidential Key concepts
  29. 29. 29Confidential Key concepts
  30. 30. 30Confidential Key concepts
  31. 31. 31Confidential Key concepts Kafka Core Kafka Streams
  32. 32. 32Confidential Streams and Tables Stream Processing meets Databases
  33. 33. 33Confidential
  34. 34. 34Confidential
  35. 35. 35Confidential Key observation: close relationship between Streams and Tables http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple http://docs.confluent.io/current/streams/concepts.html#duality-of-streams-and-tables
  36. 36. 36Confidential
  37. 37. 37Confidential Example: Streams and Tables in Kafka Word Count hello 2 kafka 1 world 1 … …
  38. 38. 38Confidential
  39. 39. 39Confidential
  40. 40. 40Confidential
  41. 41. 41Confidential
  42. 42. 42Confidential Example: continuously compute current users per geo-region 4 7 5 3 2 8 4 7 6 3 2 7 Alice Real-time dashboard “How many users younger than 30y, per region?” alice Europe user-locations alice Asia, 25y, … bob Europe, 46y, … … … alice Europe, 25y, … bob Europe, 46y, … … … -1 +1 user-locations (mobile team) user-prefs (web team)
  43. 43. 43Confidential Example: continuously compute current users per geo-region KTable<UserId,  Location>  userLocations =  builder.table(“user-­‐locations-­‐topic”); KTable<UserId,  Prefs>        userPrefs =  builder.table(“user-­‐preferences-­‐topic”);
  44. 44. 44Confidential Example: continuously compute current users per geo-region alice Europe user-locations alice Asia, 25y, … bob Europe, 46y, … … … alice Europe, 25y, … bob Europe, 46y, … … … KTable<UserId,  Location>  userLocations =  builder.table(“user-­‐locations-­‐topic”); KTable<UserId,  Prefs>        userPrefs =  builder.table(“user-­‐preferences-­‐topic”); //  Merge  into  detailed  user  profiles  (continuously  updated) KTable<UserId,  UserProfile>  userProfiles = userLocations.join(userPrefs,  (loc,  prefs)  -­‐>  new  UserProfile(loc,  prefs)); KTable userProfilesKTable userProfiles
  45. 45. 45Confidential Example: continuously compute current users per geo-region KTable<UserId,  Location>  userLocations =  builder.table(“user-­‐locations-­‐topic”); KTable<UserId,  Prefs>        userPrefs =  builder.table(“user-­‐preferences-­‐topic”); //  Merge  into  detailed  user  profiles  (continuously  updated) KTable<UserId,  UserProfile>  userProfiles = userLocations.join(userPrefs,  (loc,  prefs)  -­‐>  new  UserProfile(loc,  prefs)); //  Compute  per-­‐region  statistics  (continuously  updated) KTable<UserId,  Long>  usersPerRegion =  userProfiles .filter((userId,  profile)    -­‐>  profile.age <  30) .groupBy((userId,  profile)  -­‐>  profile.location) .count(); alice Europe user-locations Africa 3 … … Asia 8 Europe 5 Africa 3 … … Asia 7 Europe 6 KTable usersPerRegion KTable usersPerRegion
  46. 46. 46Confidential Example: continuously compute current users per geo-region 4 7 5 3 2 8 4 7 6 3 2 7 Alice Real-time dashboard “How many users younger than 30y, per region?” alice Europe user-locations alice Asia, 25y, … bob Europe, 46y, … … … alice Europe, 25y, … bob Europe, 46y, … … … -1 +1 user-locations (mobile team) user-prefs (web team)
  47. 47. 47Confidential Streams meet Tables – in the DSL
  48. 48. 48Confidential Streams meet Tables • Most use cases for stream processing require both Streams and Tables • Essential for any stateful computations • Kafka ships with first-class support for Streams and Tables • Scalability, fault tolerance, efficient joins and aggregations, … • Benefits include: simplified architectures, less moving pieces, less Do-It-Yourself work
  49. 49. 49Confidential Key features
  50. 50. 50Confidential Key features in 0.10 • Native, 100%-compatible Kafka integration
  51. 51. 51Confidential Native, 100% compatible Kafka integration Read  from  Kafka Write  to  Kafka
  52. 52. 52Confidential Key features in 0.10 • Native, 100%-compatible Kafka integration • Secure stream processing using Kafka’s security features
  53. 53. 53Confidential Secure stream processing with the Streams API • Your applications can leverage all client-side security features in Apache Kafka • Security features include: • Encrypting data-in-transit between applications and Kafka clusters • Authenticating applications against Kafka clusters (“only some apps may talk to the production cluster”) • Authorizing application against Kafka clusters (“only some apps may read data from sensitive topics”)
  54. 54. 54Confidential Configuring security settings • In general, you can configure both Kafka Streams plus the underlying Kafka clients in your apps
  55. 55. 55Confidential Configuring security settings • Example: encrypting data-in-transit + client authentication to Kafka cluster Full demo application at https://github.com/confluentinc/examples
  56. 56. 56Confidential Key features in 0.10 • Native, 100%-compatible Kafka integration • Secure stream processing using Kafka’s security features • Elastic and highly scalable • Fault-tolerant
  57. 57. 57Confidential
  58. 58. 58Confidential
  59. 59. 59Confidential
  60. 60. 60Confidential Key features in 0.10 • Native, 100%-compatible Kafka integration • Secure stream processing using Kafka’s security features • Elastic and highly scalable • Fault-tolerant • Stateful and stateless computations
  61. 61. 61Confidential Stateful computations • Stateful computations like aggregations (e.g. counting), joins, or windowing require state • State stores are the backbone of state management • … are local for best performance • … are backed up to Kafka for elasticity and for fault-tolerance • ... are per stream task for isolation – think: share-nothing • Pluggable storage engines • Default: RocksDB (a key-value store) to allow for local state that is larger than available RAM • You can also use your own, custom storage engine • From the user perspective: • DSL: no need to worry about anything, state management is automatically being done for you • Processor API: direct access to state stores – very flexible but more manual work
  62. 62. 62Confidential
  63. 63. 63Confidential
  64. 64. 64Confidential
  65. 65. 65Confidential
  66. 66. 66Confidential Use case: real-time, distributed joins at large scale
  67. 67. 67Confidential Use case: real-time, distributed joins at large scale
  68. 68. 68Confidential Use case: real-time, distributed joins at large scale
  69. 69. 69Confidential Stateful computations • Use the Processor API to interact directly with state stores Get  the  store Use  the  store
  70. 70. 70Confidential Key features in 0.10 • Native, 100%-compatible Kafka integration • Secure stream processing using Kafka’s security features • Elastic and highly scalable • Fault-tolerant • Stateful and stateless computations • Interactive queries
  71. 71. 71Confidential
  72. 72. 72Confidential Interactive Queries: architecture comparison Kafka Streams App App App App 1 Capture business events in Kafka 2 Process the events with Kafka Streams 4 Other apps query external systems for latest results ! Must use external systems to share latest results App App App 1 Capture business events in Kafka 2 Process the events with Kafka Streams 3 Now other apps can directly query the latest results Before (0.10.0) After (0.10.1): simplified, more app-centric architecture Kafka Streams App
  73. 73. 73Confidential Key features in 0.10 • Native, 100%-compatible Kafka integration • Secure stream processing using Kafka’s security features • Elastic and highly scalable • Fault-tolerant • Stateful and stateless computations • Interactive queries • Time model
  74. 74. 74Confidential Time
  75. 75. 75Confidential Time A C B
  76. 76. 76Confidential Time • You configure the desired time semantics through timestamp extractors • Default extractor yields event-time semantics • Extracts embedded timestamps of Kafka messages (introduced in v0.10)
  77. 77. 77Confidential Key features in 0.10 • Native, 100%-compatible Kafka integration • Secure stream processing using Kafka’s security features • Elastic and highly scalable • Fault-tolerant • Stateful and stateless computations • Interactive queries • Time model • Windowing
  78. 78. 78Confidential Windowing • Group events in a stream using time-based windows • Use case examples: • Time-based analysis of ad impressions (”number of ads clicked in the past hour”) • Monitoring statistics of telemetry data (“1min/5min/15min averages”) Input data, where colors represent different users events Rectangles denote different event-time windows processing-time event-time windowing alice bob dave
  79. 79. 79Confidential Windowing in the DSL TimeWindows.of(3000) TimeWindows.of(3000).advanceBy(1000)
  80. 80. 80Confidential Key features in 0.10 • Native, 100%-compatible Kafka integration • Secure stream processing using Kafka’s security features • Elastic and highly scalable • Fault-tolerant • Stateful and stateless computations • Interactive queries • Time model • Windowing • Supports late-arriving and out-of-order data
  81. 81. 81Confidential Out-of-order and late-arriving data • Is very common in practice, not a rare corner case • Related to time model discussion
  82. 82. 82Confidential Out-of-order and late-arriving data: example when this will happen Users with mobile phones enter airplane, lose Internet connectivity Emails are being written during the 10h flight Internet connectivity is restored, phones will send queued emails now
  83. 83. 83Confidential Out-of-order and late-arriving data • Is very common in practice, not a rare corner case • Related to time model discussion • We want control over how out-of-order data is handled, and handling must be efficient • Example: We process data in 5-minute windows, e.g. compute statistics • Option A: When event arrives 1 minute late: update the original result! • Option B: When event arrives 2 hours late: discard it!
  84. 84. 84Confidential Key features in 0.10 • Native, 100%-compatible Kafka integration • Secure stream processing using Kafka’s security features • Elastic and highly scalable • Fault-tolerant • Stateful and stateless computations • Interactive queries • Time model • Windowing • Supports late-arriving and out-of-order data • Millisecond processing latency, no micro-batching • At-least-once processing guarantees (exactly-once is in the works as we speak)
  85. 85. 85Confidential Roadmap Outlook
  86. 86. 86Confidential Roadmap outlook for Kafka Streams • Exactly-Once processing semantics • Unified API for real-time processing and “batch” processing • Global KTables • Session windows • … and more …
  87. 87. 87Confidential Wrapping Up
  88. 88. 88Confidential Where to go from here • Kafka Streams is available in Confluent Platform 3.1 and in Apache Kafka 0.10.1 • http://www.confluent.io/download • Kafka Streams demos: https://github.com/confluentinc/examples • Java 7, Java 8+ with lambdas, and Scala • WordCount, Interactive Queries, Joins, Security, Windowing, Avro integration, … • Confluent documentation: http://docs.confluent.io/current/streams/ • Quickstart, Concepts, Architecture, Developer Guide, FAQ • Recorded talks • Introduction to Kafka Streams: http://www.youtube.com/watch?v=o7zSLNiTZbA • Application Development and Data in the Emerging World of Stream Processing (higher level talk): https://www.youtube.com/watch?v=JQnNHO5506w
  89. 89. 89Confidential Thank You
  90. 90. 90Confidential Appendix: Streams and Tables A closer look
  91. 91. 91Confidential Motivating example: continuously compute current users per geo-region 4 7 5 3 2 8 Real-time dashboard “How many users younger than 30y, per region?” alice Asia, 25y, … bob Europe, 46y, … … … user-locations (mobile team) user-prefs (web team)
  92. 92. 92Confidential Motivating example: continuously compute current users per geo-region 4 7 5 3 2 8 Real-time dashboard “How many users younger than 30y, per region?” alice Europe user-locations alice Asia, 25y, … bob Europe, 46y, … … … user-locations (mobile team) user-prefs (web team)
  93. 93. 93Confidential Motivating example: continuously compute current users per geo-region 4 7 5 3 2 8 Real-time dashboard “How many users younger than 30y, per region?” alice Europe user-locations user-locations (mobile team) user-prefs (web team) alice Asia, 25y, … bob Europe, 46y, … … … alice Europe, 25y, … bob Europe, 46y, … … …
  94. 94. 94Confidential Motivating example: continuously compute current users per geo-region 4 7 5 3 2 8 4 7 6 3 2 7 Alice Real-time dashboard “How many users younger than 30y, per region?” alice Europe user-locations alice Asia, 25y, … bob Europe, 46y, … … … alice Europe, 25y, … bob Europe, 46y, … … … -1 +1 user-locations (mobile team) user-prefs (web team)
  95. 95. 95Confidential Same data, but different use cases require different interpretations alice San Francisco alice New York City alice Rio de Janeiro alice Sydney alice Beijing alice Paris alice Berlin
  96. 96. 96Confidential Same data, but different use cases require different interpretations alice San Francisco alice New York City alice Rio de Janeiro alice Sydney alice Beijing alice Paris alice Berlin Use  case  1:  Frequent  traveler  status? Use  case  2:  Current  location?
  97. 97. 97Confidential Same data, but different use cases require different interpretations “Alice has been to SFO, NYC, Rio, Sydney, Beijing, Paris, and finally Berlin.” “Alice is in SFO, NYC, Rio, Sydney, Beijing, Paris, Berlin right now.” ⚑ ⚑ ⚑⚑ ⚑ ⚑ ⚑ ⚑ ⚑ ⚑⚑ ⚑ ⚑ ⚑ Use  case  1:  Frequent  traveler  status? Use  case  2:  Current  location?
  98. 98. 98Confidential Same data, but different use cases require different interpretations alice San Francisco alice New York City alice Rio de Janeiro alice Sydney alice Beijing alice Paris alice Berlin Use  case  1:  Frequent  traveler  status? Use  case  2:  Current  location? ⚑ ⚑ ⚑⚑ ⚑ ⚑⚑ ⚑
  99. 99. 99Confidential Same data, but different use cases require different interpretations alice San Francisco alice New York City alice Rio de Janeiro alice Sydney alice Beijing alice Paris alice Berlin Use  case  1:  Frequent  traveler  status? Use  case  2:  Current  location? ⚑ ⚑ ⚑⚑ ⚑ ⚑⚑ ⚑
  100. 100. 100Confidential Same data, but different use cases require different interpretations alice San Francisco alice New York City alice Rio de Janeiro alice Sydney alice Beijing alice Paris alice Berlin Use  case  1:  Frequent  traveler  status? Use  case  2:  Current  location? ⚑ ⚑ ⚑⚑ ⚑ ⚑⚑ ⚑
  101. 101. 101Confidential Streams meet Tables record stream When you need… so that the topic is interpreted as a All the values of a key KStream then you’d read the Kafka topic into a Example All the places Alice has ever been to with messages interpreted as INSERT (append)
  102. 102. 102Confidential Streams meet Tables record stream changelog stream When you need… so that the topic is interpreted as a All the values of a key Latest value of a key KStream KTable then you’d read the Kafka topic into a Example All the places Alice has ever been to Where Alice is right now with messages interpreted as INSERT (append) UPSERT (overwrite existing)
  103. 103. 103Confidential Same data, but different use cases require different interpretations “Alice has been to SFO, NYC, Rio, Sydney, Beijing, Paris, and finally Berlin.” “Alice is in SFO, NYC, Rio, Sydney, Beijing, Paris, Berlin right now.” ⚑ ⚑ ⚑⚑ ⚑ ⚑ ⚑ ⚑ ⚑ ⚑⚑ ⚑ ⚑ ⚑ Use  case  1:  Frequent  traveler  status? Use  case  2:  Current  location? KStream KTable
  104. 104. 104Confidential Motivating example: continuously compute current users per geo-region 4 7 5 3 2 8 4 7 6 3 2 7 Alice Real-time dashboard “How many users younger than 30y, per region?” alice Europe user-locations alice Asia, 25y, … bob Europe, 46y, … … … alice Europe, 25y, … bob Europe, 46y, … … … -1 +1 user-locations (mobile team) user-prefs (web team)
  105. 105. 105Confidential Motivating example: continuously compute current users per geo-region KTable<UserId,  Location>  userLocations =  builder.table(“user-­‐locations-­‐topic”); KTable<UserId,  Prefs>        userPrefs =  builder.table(“user-­‐preferences-­‐topic”);
  106. 106. 106Confidential Motivating example: continuously compute current users per geo-region alice Europe user-locations alice Asia, 25y, … bob Europe, 46y, … … … alice Europe, 25y, … bob Europe, 46y, … … … KTable<UserId,  Location>  userLocations =  builder.table(“user-­‐locations-­‐topic”); KTable<UserId,  Prefs>        userPrefs =  builder.table(“user-­‐preferences-­‐topic”); //  Merge  into  detailed  user  profiles  (continuously  updated) KTable<UserId,  UserProfile>  userProfiles = userLocations.join(userPrefs,  (loc,  prefs)  -­‐>  new  UserProfile(loc,  prefs)); KTable userProfilesKTable userProfiles
  107. 107. 107Confidential Motivating example: continuously compute current users per geo-region KTable<UserId,  Location>  userLocations =  builder.table(“user-­‐locations-­‐topic”); KTable<UserId,  Prefs>        userPrefs =  builder.table(“user-­‐preferences-­‐topic”); //  Merge  into  detailed  user  profiles  (continuously  updated) KTable<UserId,  UserProfile>  userProfiles = userLocations.join(userPrefs,  (loc,  prefs)  -­‐>  new  UserProfile(loc,  prefs)); //  Compute  per-­‐region  statistics  (continuously  updated) KTable<UserId,  Long>  usersPerRegion =  userProfiles .filter((userId,  profile)    -­‐>  profile.age <  30) .groupBy((userId,  profile)  -­‐>  profile.location) .count(); alice Europe user-locations Africa 3 … … Asia 8 Europe 5 Africa 3 … … Asia 7 Europe 6 KTable usersPerRegion KTable usersPerRegion
  108. 108. 108Confidential Motivating example: continuously compute current users per geo-region 4 7 5 3 2 8 4 7 6 3 2 7 Alice Real-time dashboard “How many users younger than 30y, per region?” alice Europe user-locations alice Asia, 25y, … bob Europe, 46y, … … … alice Europe, 25y, … bob Europe, 46y, … … … -1 +1 user-locations (mobile team) user-prefs (web team)
  109. 109. 109Confidential Another common use case: continuous transformations • Example: to enrich an input stream (user clicks) with side data (current user profile) KStream alice /rental/p8454vb, 06:59 PM PDT user-clicks-topics (at 1M msgs/s) “facts”
  110. 110. 110Confidential Another common use case: continuous transformations • Example: to enrich an input stream (user clicks) with side data (current user profile) KStream alice /rental/p8454vb, 06:59 PM PDT alice Asia, 25y bob Europe, 46y … … KTable user-profiles-topic user-clicks-topics (at 1M msgs/s) “facts” “dimensions”
  111. 111. 111Confidential Another common use case: continuous transformations • Example: to enrich an input stream (user clicks) with side data (current user profile) KStream alice /rental/p8454vb, 06:59 PDT, Asia, 25y stream.JOIN(table) alice /rental/p8454vb, 06:59 PM PDT alice Asia, 25y bob Europe, 46y … … KTable user-profiles-topic user-clicks-topics (at 1M msgs/s) “facts” “dimensions”
  112. 112. 112Confidential Another common use case: continuous transformations • Example: to enrich an input stream (user clicks) with side data (current user profile) KStream alice /rental/p8454vb, 06:59 PDT, Asia, 25y stream.JOIN(table) alice /rental/p8454vb, 06:59 PM PDT alice Asia, 25y bob Europe, 46y … … KTable alice Europe, 25y bob Europe, 46y … …alice Europe new update for alice from user-locations topic user-profiles-topic user-clicks-topics (at 1M msgs/s) “facts” “dimensions”
  113. 113. 113Confidential Appendix: Interactive Queries A closer look
  114. 114. 114Confidential Interactive Queries
  115. 115. 115Confidential Interactive Queries charlie 3bob 5 alice 2
  116. 116. 116Confidential Interactive Queries New  API  to  access local  state  stores  of an  app  instance charlie 3bob 5 alice 2
  117. 117. 117Confidential Interactive Queries New  API  to  discover running  app  instances charlie 3bob 5 alice 2 “host1:4460” “host5:5307” “host3:4777”
  118. 118. 118Confidential Interactive Queries You:  inter-­app  communication  (RPC  layer)

×