SlideShare a Scribd company logo
1 of 28
Download to read offline
Kafka Streams Windowing
Behind the Curtain
Neil Buesing
Principal Solutions Architect, Rill
Confluent Meetup
July 15th, 2021
• Operational intelligence for data in motion
• Easy In - Easy Up - Easy Out
• Work with customers to build & modernize their pipelines
What Does Rill Do?
• Principal Solutions Architect
• Help customers with pipelines leveraging Apache Druid, Apache Kafka, Kafka
Streams, Apache Beam, and other technologies.
• Data Modeling and Governance
• Rill Data / Apache Druid
What Do I Do?
• Overview of the Windowing Options within Kafka Streams
• Windowing Use-Cases
• Examples of Aggregate Windowing
• What each windowing options does within RocksDB and the -changelog topics
• The key serialization of the -changeling topics
• Developer Tools & Ideas
Takeaways
• Stream / Table Duality
• Compacted Topics need stateful front-end
• Stateful Operations
• Finite datasets — tables
• Boundaries for unbounded data — windows
Why Kafka Streams
Windowing Options
Window Type Time boundary Examples
# records for key
@ point in time
Fixed Size
Tumbling Epoch
[8:00, 8:30)
[8:30, 9:00)
single
Yes
Hopping Epoch
[8:00, 8:30)
[8:15, 8:45)
[8:30, 8:45)
[8:45, 9:00)
constant
Yes
Sliding Record
[8:02, 8:32]
[8:20, 8:50]
[8:21, 8:51]
variable
Yes
Session Record
[8:02, 8:02]
[8:02, 8:10]
[9:10, 12:56]
single
(by tombstoning)
No
001 / 12:00
002 / 12:00
003 / 12:00
001 / 12:30
002 / 12:30
003 / 12:30
12:00 12:15 12:30 12:45 1:00
01
5
Tumbling Time Windows
01
3
02
9
03
5
01
4
5 8 12
5
9
03
11
02
3
02
6
02
5
02
1
03
2
03
7
01
7
01
5
01
3
16 7 9
6 11 12
3 8 15
12
Hopping Time Windows
001/ 12:00
002 / 12:00
003 / 12:00
001 / 12:30
002 / 12:30
003 / 12:30
12:00 12:15 12:30 12:45 1:00
01
5
01
3
02
9
03
5
01
4
5 8 12
5
9
03
11
02
3
02
6
02
5
02
1
03
2
03
7
01
7
01
5
01
3
16 7 9
6 11 12
3 8 15
001 / 11:45
5 8 12
001 / 12:45
002 / 11:45 002 / 12:15 002 / 12:45
003 / 11:45 003 / 12:45
003 / 12:15
3 8 15
1
18 20
11
5
9
12
3 9 14
Sliding Time Windows
001 / 11:31:00.000
12:00 12:15 12:30 12:45 1:00
01
5
01
3
01
4
5
01
5
01
3
001 / 11:33:00.000
8
001 / 11:47.00.000
12
001 / 12:31:00.001
7
001 / 11:33.00:001
4
001 / 11:45:00.000
3
12:01
12:03
12:12
12:48
12:55
001 / 12:31:00.001
3
001 / 11:55:00.000
001 / 11:45:00.001
8
5
12:00 12:15 12:30 12:45 1:00
01
5
Session Windows
01
3
02
9
03
5
01
4
03
11
02
3
02
6
02
5
02
1
03
2
03
7
01
7
01
5
01
3
5 8 12 3 8 15
5
9
16 23 25
18 23 24
12
Windowing Options
Window Type Time boundary Examples
# records for key
@ point in time
Fixed Size
Tumbling Epoch
[8:00, 8:30)
[8:30, 9:00)
single
Yes
Hopping Epoch
[8:00, 8:30)
[8:15, 8:45)
[8:30, 8:45)
[8:45, 9:00)
constant
Yes
Sliding Record
[8:02, 8:32]
[8:20, 8:50]
[8:21, 8:51]
variable
Yes
Session Record
[8:02, 8:02]
[8:02, 8:10]
[9:10, 12:56]
single
(by tombstoning)
No
• Good
• Web Visitors
• Products Purchased
• Inventory Management
• IoT Sensors
• Ad Impressions*
• Bad
• Fraud Detection
• User Interactions
• Composition*
Tumbling Time Windows
* event timestamp & grace period
• Good
• Web Visitors
• Products Purchased
• Fraud Detection
• IoT Sensors
• Bad
• User Interactions
• Inventory Management
• Composition
• Ad Impressions
Hopping Time Windows
• Good
• User Interactions
• Fraud Detection
• Usage Changes
• Bad
• Composition
• IoT Sensors
Sliding Time Windows
• Good
• User Interactions / Click Stream
• User Behavior Analysis
• IoT device - session oriented
(running)
• Bad
• Data Analytics (Generalizations)
• IoT sensors - always on
(pacemaker)
Session Windows
• Good
• Composition*
• Finite Datasets
• Bad
• Fraud Detection
• Monitoring
• Unbounded data*
No Windows
* manual tombstoning
Order Processing
Order Analytics
Demo Applications
orders-purchase orders-pickup
repartition
attach
user
& store
attach
line item
pricing
assemble
product
analytics
pickup-order-handler-purchase-order-join-product-repartition
product-repartition product-stats
State
Materialized!<> store = Materialized.as("po").withCachingDisabled();
builder.<String, PurchaseOrder>stream(opt.getTopic())
.groupByKey(Grouped.as("groupByKey"))
.windowedBy(TimeWindows.of(Duration.ofSeconds(opt.getWindowSize()))
.grace(Duration.ofSeconds(opt.getGracePeriod())))
.aggregate(Streams!::initialize,
Streams!::aggregator,
Named.as("aggregate"),
store)
.toStream(Named.as("toStream"))
.selectKey((k, v) !-> k.key() + " " + toStr(k.window()) + "," + ")")
.mapValues(Streams!::minimize)
.to(opt.topic(), Produced.as("to"));
Materialized!<> store = Materialized.as("po").withCachingDisabled();
builder.<String, PurchaseOrder>stream(opt.getTopic())
.groupByKey(Grouped.as("groupByKey"))
.windowedBy(TimeWindows.of(Duration.ofSeconds(opt.getWindowSize()))
.grace(Duration.ofSeconds(opt.getGracePeriod())))
.aggregate(Streams!::initialize,
Streams!::aggregator,
Named.as("aggregate"),
store)
.toStream(Named.as("toStream"))
.selectKey((k, v) !-> k.key() + " " + toStr(k.window()) + "," + ")")
.mapValues(Streams!::minimize)
.to(opt.topic(), Produced.as("to"));
TimeWindows.of(Duration.ofSeconds(opt.getWindowSize()))
.advanceBy(Duration.ofSeconds(opt.getWindowSize() / 2))
.grace(Duration.ofSeconds(opt.getGracePeriod())
SlidingWindows.withTimeDifferenceAndGrace(
Duration.ofSeconds(opt.getWindowSize()),
Duration.ofSeconds(opt.getGracePeriod()))
SessionWindows.with(Duration.ofSeconds(opt.getWindowSize()))
Demo “Time”
product
analytics
product-repartition product-stats
rocksdb_ldb
console-
consumer
console-
consumer
Metrics
-changelog
* Caching disabled
• Deserializers. % ln -s {jar} /usr/local/confluent/share/java/kafka
• WindowDeserializer
• SessionDeserializer
• RocksDB
• RocksDB’s rocksdb_ldb % brew install rocksdb
• Scripts
• rocksdb_key_parser
• rocksdb_window_parser
• rocksdb_session_parser
Demo Tools
DEMO
• Emitting Results
• Suppression
• Commit Time
• Window Boundaries
• Epoch vs. Event
• Long Windows*
• Join Windowing
• RocksDB Tuning
• RocksDB state store instances…
What Next
• Overview of the Windowing Options within Kafka Streams
• Windowing Use-Cases
• Examples of Aggregate Windowing
• What each windowing options does within RocksDB and the -changelog topics
• The key serialization of the -changeling topics
• Advance Considerations
Takeaways
• Demo Application
https://github.com/nbuesing/kafka-streams-dashboards
• Kafka Summit Europe 2021
https://www.confluent.io/events/kafka-summit-europe-2021/what-is-the-state-of-
my-kafka-streams-application-unleashing-metrics/
Resources
• Nick Dearden’s
https://www.confluent.io/kafka-summit-ny19/zen-and-the-art-of-streaming-joins/
• Anna McDonald’s
https://www.confluent.io/kafka-summit-san-francisco-2019/using-kafka-to-discover-events-hidden-in-
your-database/
• Matthias Sax’s
https://www.confluent.io/kafka-summit-san-francisco-2019/whats-the-time-and-why/
https://www.confluent.io/resources/kafka-summit-2020/the-flux-capacitor-of-kafka-streams-and-ksqldb/
Additional Resources
Blooper Reel
Kafka Streams Windowing Options

More Related Content

Similar to Kafka Streams Windowing Options

William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...Flink Forward
 
Onyx data processing the clojure way
Onyx   data processing  the clojure wayOnyx   data processing  the clojure way
Onyx data processing the clojure wayBahadir Cambel
 
[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics
[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics
[WSO2Con EU 2017] Deriving Insights for Your Digital Business with AnalyticsWSO2
 
Streaming Data Pipelines With Apache Beam
Streaming Data Pipelines With Apache BeamStreaming Data Pipelines With Apache Beam
Streaming Data Pipelines With Apache BeamAll Things Open
 
Event Hub & Azure Stream Analytics
Event Hub & Azure Stream AnalyticsEvent Hub & Azure Stream Analytics
Event Hub & Azure Stream AnalyticsDavide Mauri
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQLWSO2
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Data Con LA
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalSub Szabolcs Feczak
 
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...DataWorks Summit
 
SplunkLive! Washington DC May 2013 - Splunk Security Workshop
SplunkLive! Washington DC May 2013 - Splunk Security WorkshopSplunkLive! Washington DC May 2013 - Splunk Security Workshop
SplunkLive! Washington DC May 2013 - Splunk Security WorkshopSplunk
 
Sql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.pptSql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.pptQingsong Yao
 
Google's Infrastructure and Specific IoT Services
Google's Infrastructure and Specific IoT ServicesGoogle's Infrastructure and Specific IoT Services
Google's Infrastructure and Specific IoT ServicesIntel® Software
 
Empowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with AlternatorEmpowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with AlternatorScyllaDB
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDBDenny Lee
 
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...HostedbyConfluent
 
Ibm_IoT_Architecture_and_Capabilities
Ibm_IoT_Architecture_and_CapabilitiesIbm_IoT_Architecture_and_Capabilities
Ibm_IoT_Architecture_and_CapabilitiesIBM_Info_Management
 
Stream processing with Apache Flink - Maximilian Michels Data Artisans
Stream processing with Apache Flink - Maximilian Michels Data ArtisansStream processing with Apache Flink - Maximilian Michels Data Artisans
Stream processing with Apache Flink - Maximilian Michels Data ArtisansEvention
 

Similar to Kafka Streams Windowing Options (20)

William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
 
Onyx data processing the clojure way
Onyx   data processing  the clojure wayOnyx   data processing  the clojure way
Onyx data processing the clojure way
 
[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics
[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics
[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics
 
Streaming Data Pipelines With Apache Beam
Streaming Data Pipelines With Apache BeamStreaming Data Pipelines With Apache Beam
Streaming Data Pipelines With Apache Beam
 
Event Hub & Azure Stream Analytics
Event Hub & Azure Stream AnalyticsEvent Hub & Azure Stream Analytics
Event Hub & Azure Stream Analytics
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - final
 
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
 
SplunkLive! Washington DC May 2013 - Splunk Security Workshop
SplunkLive! Washington DC May 2013 - Splunk Security WorkshopSplunkLive! Washington DC May 2013 - Splunk Security Workshop
SplunkLive! Washington DC May 2013 - Splunk Security Workshop
 
Sql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.pptSql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.ppt
 
Google's Infrastructure and Specific IoT Services
Google's Infrastructure and Specific IoT ServicesGoogle's Infrastructure and Specific IoT Services
Google's Infrastructure and Specific IoT Services
 
Empowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with AlternatorEmpowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with Alternator
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...
 
Ibm_IoT_Architecture_and_Capabilities
Ibm_IoT_Architecture_and_CapabilitiesIbm_IoT_Architecture_and_Capabilities
Ibm_IoT_Architecture_and_Capabilities
 
Big Data Warsaw
Big Data WarsawBig Data Warsaw
Big Data Warsaw
 
Stream processing with Apache Flink - Maximilian Michels Data Artisans
Stream processing with Apache Flink - Maximilian Michels Data ArtisansStream processing with Apache Flink - Maximilian Michels Data Artisans
Stream processing with Apache Flink - Maximilian Michels Data Artisans
 

Recently uploaded

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Recently uploaded (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

Kafka Streams Windowing Options

  • 1. Kafka Streams Windowing Behind the Curtain Neil Buesing Principal Solutions Architect, Rill Confluent Meetup July 15th, 2021
  • 2. • Operational intelligence for data in motion • Easy In - Easy Up - Easy Out • Work with customers to build & modernize their pipelines What Does Rill Do?
  • 3. • Principal Solutions Architect • Help customers with pipelines leveraging Apache Druid, Apache Kafka, Kafka Streams, Apache Beam, and other technologies. • Data Modeling and Governance • Rill Data / Apache Druid What Do I Do?
  • 4. • Overview of the Windowing Options within Kafka Streams • Windowing Use-Cases • Examples of Aggregate Windowing • What each windowing options does within RocksDB and the -changelog topics • The key serialization of the -changeling topics • Developer Tools & Ideas Takeaways
  • 5. • Stream / Table Duality • Compacted Topics need stateful front-end • Stateful Operations • Finite datasets — tables • Boundaries for unbounded data — windows Why Kafka Streams
  • 6. Windowing Options Window Type Time boundary Examples # records for key @ point in time Fixed Size Tumbling Epoch [8:00, 8:30) [8:30, 9:00) single Yes Hopping Epoch [8:00, 8:30) [8:15, 8:45) [8:30, 8:45) [8:45, 9:00) constant Yes Sliding Record [8:02, 8:32] [8:20, 8:50] [8:21, 8:51] variable Yes Session Record [8:02, 8:02] [8:02, 8:10] [9:10, 12:56] single (by tombstoning) No
  • 7. 001 / 12:00 002 / 12:00 003 / 12:00 001 / 12:30 002 / 12:30 003 / 12:30 12:00 12:15 12:30 12:45 1:00 01 5 Tumbling Time Windows 01 3 02 9 03 5 01 4 5 8 12 5 9 03 11 02 3 02 6 02 5 02 1 03 2 03 7 01 7 01 5 01 3 16 7 9 6 11 12 3 8 15 12
  • 8. Hopping Time Windows 001/ 12:00 002 / 12:00 003 / 12:00 001 / 12:30 002 / 12:30 003 / 12:30 12:00 12:15 12:30 12:45 1:00 01 5 01 3 02 9 03 5 01 4 5 8 12 5 9 03 11 02 3 02 6 02 5 02 1 03 2 03 7 01 7 01 5 01 3 16 7 9 6 11 12 3 8 15 001 / 11:45 5 8 12 001 / 12:45 002 / 11:45 002 / 12:15 002 / 12:45 003 / 11:45 003 / 12:45 003 / 12:15 3 8 15 1 18 20 11 5 9 12 3 9 14
  • 9. Sliding Time Windows 001 / 11:31:00.000 12:00 12:15 12:30 12:45 1:00 01 5 01 3 01 4 5 01 5 01 3 001 / 11:33:00.000 8 001 / 11:47.00.000 12 001 / 12:31:00.001 7 001 / 11:33.00:001 4 001 / 11:45:00.000 3 12:01 12:03 12:12 12:48 12:55 001 / 12:31:00.001 3 001 / 11:55:00.000 001 / 11:45:00.001 8 5
  • 10. 12:00 12:15 12:30 12:45 1:00 01 5 Session Windows 01 3 02 9 03 5 01 4 03 11 02 3 02 6 02 5 02 1 03 2 03 7 01 7 01 5 01 3 5 8 12 3 8 15 5 9 16 23 25 18 23 24 12
  • 11. Windowing Options Window Type Time boundary Examples # records for key @ point in time Fixed Size Tumbling Epoch [8:00, 8:30) [8:30, 9:00) single Yes Hopping Epoch [8:00, 8:30) [8:15, 8:45) [8:30, 8:45) [8:45, 9:00) constant Yes Sliding Record [8:02, 8:32] [8:20, 8:50] [8:21, 8:51] variable Yes Session Record [8:02, 8:02] [8:02, 8:10] [9:10, 12:56] single (by tombstoning) No
  • 12. • Good • Web Visitors • Products Purchased • Inventory Management • IoT Sensors • Ad Impressions* • Bad • Fraud Detection • User Interactions • Composition* Tumbling Time Windows * event timestamp & grace period
  • 13. • Good • Web Visitors • Products Purchased • Fraud Detection • IoT Sensors • Bad • User Interactions • Inventory Management • Composition • Ad Impressions Hopping Time Windows
  • 14. • Good • User Interactions • Fraud Detection • Usage Changes • Bad • Composition • IoT Sensors Sliding Time Windows
  • 15. • Good • User Interactions / Click Stream • User Behavior Analysis • IoT device - session oriented (running) • Bad • Data Analytics (Generalizations) • IoT sensors - always on (pacemaker) Session Windows
  • 16. • Good • Composition* • Finite Datasets • Bad • Fraud Detection • Monitoring • Unbounded data* No Windows * manual tombstoning
  • 17. Order Processing Order Analytics Demo Applications orders-purchase orders-pickup repartition attach user & store attach line item pricing assemble product analytics pickup-order-handler-purchase-order-join-product-repartition product-repartition product-stats State
  • 18. Materialized!<> store = Materialized.as("po").withCachingDisabled(); builder.<String, PurchaseOrder>stream(opt.getTopic()) .groupByKey(Grouped.as("groupByKey")) .windowedBy(TimeWindows.of(Duration.ofSeconds(opt.getWindowSize())) .grace(Duration.ofSeconds(opt.getGracePeriod()))) .aggregate(Streams!::initialize, Streams!::aggregator, Named.as("aggregate"), store) .toStream(Named.as("toStream")) .selectKey((k, v) !-> k.key() + " " + toStr(k.window()) + "," + ")") .mapValues(Streams!::minimize) .to(opt.topic(), Produced.as("to"));
  • 19. Materialized!<> store = Materialized.as("po").withCachingDisabled(); builder.<String, PurchaseOrder>stream(opt.getTopic()) .groupByKey(Grouped.as("groupByKey")) .windowedBy(TimeWindows.of(Duration.ofSeconds(opt.getWindowSize())) .grace(Duration.ofSeconds(opt.getGracePeriod()))) .aggregate(Streams!::initialize, Streams!::aggregator, Named.as("aggregate"), store) .toStream(Named.as("toStream")) .selectKey((k, v) !-> k.key() + " " + toStr(k.window()) + "," + ")") .mapValues(Streams!::minimize) .to(opt.topic(), Produced.as("to")); TimeWindows.of(Duration.ofSeconds(opt.getWindowSize())) .advanceBy(Duration.ofSeconds(opt.getWindowSize() / 2)) .grace(Duration.ofSeconds(opt.getGracePeriod()) SlidingWindows.withTimeDifferenceAndGrace( Duration.ofSeconds(opt.getWindowSize()), Duration.ofSeconds(opt.getGracePeriod())) SessionWindows.with(Duration.ofSeconds(opt.getWindowSize()))
  • 21. • Deserializers. % ln -s {jar} /usr/local/confluent/share/java/kafka • WindowDeserializer • SessionDeserializer • RocksDB • RocksDB’s rocksdb_ldb % brew install rocksdb • Scripts • rocksdb_key_parser • rocksdb_window_parser • rocksdb_session_parser Demo Tools
  • 22. DEMO
  • 23. • Emitting Results • Suppression • Commit Time • Window Boundaries • Epoch vs. Event • Long Windows* • Join Windowing • RocksDB Tuning • RocksDB state store instances… What Next
  • 24. • Overview of the Windowing Options within Kafka Streams • Windowing Use-Cases • Examples of Aggregate Windowing • What each windowing options does within RocksDB and the -changelog topics • The key serialization of the -changeling topics • Advance Considerations Takeaways
  • 25. • Demo Application https://github.com/nbuesing/kafka-streams-dashboards • Kafka Summit Europe 2021 https://www.confluent.io/events/kafka-summit-europe-2021/what-is-the-state-of- my-kafka-streams-application-unleashing-metrics/ Resources
  • 26. • Nick Dearden’s https://www.confluent.io/kafka-summit-ny19/zen-and-the-art-of-streaming-joins/ • Anna McDonald’s https://www.confluent.io/kafka-summit-san-francisco-2019/using-kafka-to-discover-events-hidden-in- your-database/ • Matthias Sax’s https://www.confluent.io/kafka-summit-san-francisco-2019/whats-the-time-and-why/ https://www.confluent.io/resources/kafka-summit-2020/the-flux-capacitor-of-kafka-streams-and-ksqldb/ Additional Resources