7. ▪ So, for a simple counting program:
• Custom logic for handling state
• Custom logic for handling time
• Custom logic for fault tolerance
7
The ol’ traditional batch way
8. ▪ So, for a simple counting program:
• Custom logic for handling state
• Custom logic for handling time
• Custom logic for fault tolerance
8
The ol’ traditional batch way
Difficult and has nothing to do with your
program.
9. Why should we care?
▪...this is just for continuous data, right?
9
10. Why should we care?
▪...this is just for continuous data, right?
10
Most datasets are
continuously arriving streams.
15. A practical stream processor
15
state
●Fault-tolerance
●Scalability
●Efficiency
●Event-time
●Allows you to work in
event-time
time
16. 16
Stateful Stream Processor
that handles
consistently, robustly, and efficiently
Large
Distributed State
Time / Order /
Completeness
● Stateful stream processing as
a new paradigm to
continuously process
continuously arriving data
● Produce accurate results
● Real-time is only a natural
consequence of the model
A practical stream processor
17. This is where Flink shines...
▪ Supports out-of-order streams
▪ Manages state transparently
• exactly-once processing
▪ Offers high throughput and low latency
▪ Scales to large deployments
• https://data-artisans.com/blog/blink-flink-alibaba-search
• https://data-artisans.com/blog/rbea-scalable-real-time-analytics-at-king
17
22. Event Time: Watermarks
22
● Special markers,
called Watermarks
● Flow with elements
● A watermark of
timestamp t means
that no records with
timestamp < t should
be expected
26. Fault tolerance simple case
26
event log
single process
main memoryperiodically take a
Snapshot of the memory
27. 27
event log
single process
main memoryRecovery
restore snapshot and replay
events since snapshot
persists events
(temporarily)
Fault tolerance simple case
28. Fault tolerance distributed
▪ How to create consistent snapshots of
distributed state?
▪ How to do it efficiently?
28
40. Apache Flink Stack
40
DataStream API
Stream Processing
DataSet API
Batch Processing
Runtime
Distributed Streaming Data Flow
Libraries
Streaming and batch as first class citizens.
41. Levels of abstraction
41
Process Function (events, state, time)
DataStream API (streams, windows)
Table API (dynamic tables)
Stream SQL
low-level (stateful
stream processing)
stream processing &
analytics
declarative DSL
high-level langauge
47. TL;DR
▪ Stateful stream processing as a paradigm for
continuous data processing
▪ Flink is a sophisticated and battle-tested stateful
stream processor
▪ Efficiency, management, and operational issues for
state are taken very seriously
47
49. 49
Stream Processing
and Apache Flink®'s
approach to it
@StephanEwen
Apache Flink PMC
CTO @ data ArtisansFLINK FORWARD IS COMING BACK TO BERLIN
SEPTEMBER 11-13, 2017
BERLIN.FLINK-FORWARD.ORG -
55. Hazelcast Pain Points
▪Explicit load-balancing, repartitioning
▪Time handling:
•sync/reordering
▪State handling:
•fault tolerance, shuffling
55
Flink handles all this for you transparently
...focus on your application logic