"Today's consumers have come to expect timely and accurate information from the companies they do business with. Whether it's being alerted that someone just used your credit card to rent a car in Prague, or checking on the balance of your mobile data plan, it's not good enough to learn about yesterday's information today. We all expect the companies managing our data to be able to provide fully up-to-the-moment reporting.
Apache Flink is a battle-hardened stream processor widely used for demanding applications like these. Its performance and robustness are the result of a handful of core design principles: a shared-nothing architecture with local state, event-time processing, and state snapshots (for recovery). During this talk, we'll bring these principles to life with real-world examples and demos."
3. Real-time
Data
A Sale
A Shipment
A Trade
Rich Front-End
Customer Experiences
A Customer
Experience
Real-Time Backend
Operations
Real-time services rely on stream processing
Real-time Stream Processing
4. Driving business value with Apache Flink
Real-time analytics Event-driven applications
Streaming data
pipelines
Continuously produce and
update results which are
displayed and delivered to users
as real-time data streams are
consumed
● Ad/campaign performance
● Content performance
● Quality monitoring of Telco
networks
● Usage metering and billing
Recognize patterns and react to
incoming events by triggering
computations, state updates, or
external actions
● Fraud detection
● Anomaly detection
● Business process
monitoring
● Geo-fencing
Real-time data pipelines that
continuously ingest, enrich,
and transform data streams,
loading them into destination
systems for timely action (vs.
batch processing)
● Continuous ETL
● Real-time search index
building
● ML pipelines
● Data lake ingestion
5. Developers are choosing Flink because of
Its performance and rich feature set
Scalability & high
performance
Flink supports stream
processing workloads at
tremendous scale
Flink supports Java,
Python, & SQL, enabling
developers to work in
their language of choice
Flink supports stream
processing, batch
processing, and ad-hoc
analytics through one
technology
Unified processing
Flink's checkpointing
mechanism provides
exactly-once guarantees
automatically
Fault tolerance &
high availability
Language flexibility
Flink is a top 5 Apache project and has a very active community
7. ● A stream is a sequence of events
● Business data is always a stream: bounded or unbounded
● For Flink, batch processing is just a special case in the runtime
now
past future
bounded stream
unbounded stream
Streaming
16. Stream processing with SQL
INSERT INTO results
SELECT color, COUNT(*)
FROM events
WHERE color <> orange
GROUP BY color;
GROUP BY color
results
COUNT
WHERE color <> orange
events
17. Stream processing with SQL
INSERT INTO results
SELECT color, COUNT(*)
FROM events
WHERE color <> orange
GROUP BY color;
GROUP BY color
results
COUNT
WHERE color <> orange
events
18. Stream processing with SQL
INSERT INTO results
SELECT color, COUNT(*)
FROM events
WHERE color <> orange
GROUP BY color;
GROUP BY color
events
results
COUNT
WHERE color <> orange
19. Stream processing with SQL
INSERT INTO results
SELECT color, COUNT(*)
FROM events
WHERE color <> orange
GROUP BY color;
GROUP BY color
results
COUNT
WHERE color <> orange
events
20. Stream processing with SQL
INSERT INTO results
SELECT color, COUNT(*)
FROM events
WHERE color <> orange
GROUP BY color;
GROUP BY color
results
COUNT
WHERE color <> orange
events
events
21. Flink’s APIs
Apache Flink Runtime
Low-Level
Stream Operator API
Optimizer / Planner
Table / SQL API
DataStream API
24. Flink supports streaming
● Bounded or unbounded streams
● Entire pipeline must always be running
● Input must be processed as it arrives
● Results are reported as they become ready
● Failure recovery resumes from a recent snapshot
● Flink guarantees effectively exactly-once results
despite out-of-order data and restarts due to
failures, etc.
● Only bounded streams
● Execution proceeds in stages, running as needed
● Input may be pre-sorted by time and key
● Results are reported at the end of the job
● Failure recovery does a reset and full restart
● Effectively exactly-once guarantees are more
straightforward
and batch
26. Stateful stream processing with Flink SQL
INSERT INTO results
SELECT color, COUNT(*)
FROM events
WHERE color <> orange
GROUP BY color;
GROUP BY color
events
results
COUNT
WHERE color <> orange
27. Stateful stream processing with Flink SQL
INSERT INTO results
SELECT color, COUNT(*)
FROM events
WHERE color <> orange
GROUP BY color;
GROUP BY color
events
results
COUNT
WHERE color <> orange
28. Stateful stream processing with Flink SQL
● Counting requires
state
GROUP BY color
events
results
COUNT
WHERE color <> orange
32. Time
• Synchronize
• Wait
• Timeout
09:05:44
When the event was created at its
original source.
Event time
09:08:01
When the event is being processed.
This time varies between applications.
Processing time
33. ● Streams are (roughly) ordered by time
Out-of-order event streams
10:10
10:14
10:10
10:14
34. Coping with out of order events
This event will
be read next
36. Coping with out of order events
Imagine a window counting events for the hour ending at 2:00. How long should this
window wait before producing its results?
37. Watermarks measure progress of event time
Watermark
● This watermark has been generated by assuming that the stream is at most 5 minutes
out-of-order
38. Watermarks measure progress of event time
● This watermark has been generated by assuming that the stream is at most 5 minutes
out-of-order
● The watermark is the max timestamp seen so far, minus this out-of-orderness estimate
1:50 - 5 = 1:45
39. Watermarks measure progress of event time
● This watermark has been generated by assuming that the stream is at most 5 minutes
out-of-order
● The watermark is the max timestamp seen so far, minus this out-of-orderness estimate
● A watermark is an assertion about the completeness of the stream
Now this stream is
complete up to 1:45
40. Watermarks measure progress of event time
Imagine a window counting events for the hour ending at 2:00. How long should this
window wait before producing its results?
It should wait for a watermark with a timestamp of at least 2:00.
42. The idle stream
problem
● Streams that are idle do not
advance the watermark
● This prevents windows from
producing results
43. The idle stream
problem
● Streams that are idle do not
advance the watermark
● This prevents windows from
producing results
Solutions
● Balance the partitions so none are
empty or idle, or
● Send keep-alive events, or
● Configure the watermarking to
use idleness detection
44. Watermarks
● Not needed for applications that only use wall-clock (processing) time
● Not needed for batch processing
● Are needed for triggering actions based on event-time, e.g., closing a
window
● Are generated based on an assumption of how out of order the data might
be
● Provide control over the tradeoff between completeness and latency
● Flink SQL drops late events; the DataStream API offers more control
● Allow for consistent, reproducible results
● Potentially idle sources require special attention
46. A checkpoint is an
automatic snapshot
created by Flink,
primarily for the
purpose of failure
recovery
47. A checkpoint is an
automatic snapshot
created by Flink,
primarily for the
purpose of failure
recovery
A savepoint is a
manual snapshot
created for some
operational purpose
(e.g., a stateful
upgrade)
50. Snapshots
Source Filter Count by color Sink
Offsets for some
partitions
Offsets for other
partitions
events
results
COUNT
FILTER
GROUP BY color
51. Snapshots
Source Filter Count by color Sink
Offsets for some
partitions
______________________
Offsets for other
partitions
______________________
events
results
COUNT
FILTER
GROUP BY color
52. Snapshots
Source Filter Count by color Sink
Offsets for some
partitions
______________________ Counters for some
colors
Offsets for other
partitions
______________________ Counters for other
colors
events
results
COUNT
FILTER
GROUP BY color
53. Snapshots
Source Filter Count by color Sink
Offsets for some
partitions
______________________ Counters for some
colors
Transaction ID
Offsets for other
partitions
______________________ Counters for other
colors
__________________
events
results
COUNT
FILTER
GROUP BY color
54. Taking a
snapshot
does NOT
stop the
world
Checkpoints and savepoints are created
asynchronously, while the job continues to
process events and produce results
57. Streaming
Unfamiliar to many
developers, but
ultimately
straightforward
Watermarks
encapsulate something
complex in one place –
the sources
● how out-of-order?
● can it be idle?
Transparent to
application developers
State snapshots for
recovery
Delightfully simple
● local
● key/value
● single-threaded
State Event time and
watermarks
58. Where is the
Flink community?
To subscribe to the mailing lists, or get an invite
to Slack, see https://flink.apache.org/community/