"In the stream processing context, event-time processing means the events are processed based on when the events occurred, rather than when the events are observed (processing-time) in the system. Apache Flink has a powerful framework for event-time processing, which plays a pivotal role in ensuring temporal order and result accuracy.
In this talk, we will introduce Flink event-time semantics and demonstrate how watermarks as a means of handling late-arriving events are generated, propagated, and triggered using Flink SQL. We will explore operators such as window and join that are often used with event time processing, and how different configurations can impact the processing speed, cost and correctness.
Join us for this exploration where event-time theory meets practical SQL implementation, providing you with the tools to make informed decisions for making optimal trade-offs."
3. What is Apache Flink
Stateful Computations over Data Streams
● Highly Scalable
● Exactly-once processing semantics
● Event time semantics and watermarks
● Layered APIs: Streaming SQL (easy to use) ↔ DataStream (expressive)
17. Idle source/partition
● If a partition is idle (no events), the watermark
will not advance
● No result will be produced
● Solutions
○ Configure source idle timeout
■ set table.exec.source.idle-timeout = 1m
○ Balance the partitions
18. Implications
● Tradeoff between Correctness and Latency
● Latency
○ Results of a window is only seen after the window
closes
● Correctness
○ Late arriving events are discarded after the window
is closed
20. But…can I have both?
● Yes! Flink can process & emit
“updates” (changelog)
● No watermark is needed
● Downstream system must support
“updates”
● It’s costly - need to store global state
22. Quick Summary
● Timely response & analytics are based on event time
● Flink uses watermark to account for out-of-order
events
● Watermark allows trade-off between accuracy and
latency
26. Window Types - Tumble/Fixed
Ref: https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/table/sql/queries/window-tvf/
● Fixed window
size
● No overlapping
● Each event
belongs to
exactly 1 window
27. Flink SQL (Window TVF)
● TVF - Table-Valued Function
● Returns a new relation with all columns of original stream and
additional 3 columns:
○ window_start, window_end, window_time
30. Window Types - Cumulative
Ref: https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/table/sql/queries/window-tvf/
● Similar to tumble
window, but with
early firing at the
defined interval
● Defined by max
window size and
window step
31. Window Types - Session
😃 Supported in Flink
1.19
● A new window is
created when the
consecutive event
time > session gap
Ref: https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/sql/queries/window-tvf/#session
32. Window Join
● A window join adds the dimension of time into
the join criteria themselves.
● Use case: compute click-through events
35. Temporal Join
● Enrich a stream with the value of the joined
record at the event time.
● Example: Continuously computing the price for
each order based on the exchange rate
happened when the order is placed
36. Example - temporal join
Ref: https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/table/sql/queries/joins/#temporal-joins
37.
38. Summary
● Event time is essential for timely response and
analytics
● Watermark and windowing are the key concepts
● Flink SQL simplifies event time processing