Apache Flink(tm) - A Next-Generation Stream Processor

Aljoscha Krettek
aljoscha@apache.org
@aljoscha
Apache FlinkTM
A Next-Generation Stream Processor

2
1. What is streaming?
2. What’s the technological
landscape?
3. Why is Apache Flink
special?

5
Infinite data
vs.
finite data
infinite data → streaming
finite data → batch

6
Sometimes it
can feel like
this…

7
Stream Processing in a
Nutshell
 Infinite stream of incoming data
 We want up-to-date results
 Don’t wait for the nightly batch job

9
Tracking
user
satisfaction
in web
shops

10
Financial
transactions
Fraud
detection

Online
tracking of
gaming
stats

14
What do they all have in
common?
Counting things over
certain periods of time.

A (Parallel) Streaming
Architecture
15

16
Why would I need a
parallel stream processor?
(tweet,
#hello
moe)
(?)

18
Parallel Stream Processing
(tweet,
#hello
sue)
(?)
(tweet,
#hello
poe)
(?)
(tweet,
#hello
moe)
(?)

24
// create stream from Kafka source
DataStream<LogEvent> stream =
env.addSource(new FlinkKafkaConsumer(...));
// group by country
DataStream<LogEvent> keyedStream = stream.keyBy("country");
keyedStream
.timeWindow(Time.minutes(60)) // window of size 1 hour
.apply(new CountPerWindowFunction()); // do operations per window
Counting with the
Flink API

25
From API to Topology
Kafka Source Kafka Sink
“count”
Operator
Job Graph

26
Master
Worker Worker Worker
A Flink Cluster

27
All Together, Parallel
MasterWorkers

What Makes Flink Special?
A Next-Generation Stream Processor
29

30
Disclaimer
 Some of this stuff is in Flink right now
 Some will probably make it into the
next release
 We (dataArtisans) don’t control the
Flink Roadmap, the community does

32
A Streaming Pipeline
Kafka Kafka
“count”
operator
count user interactions per 1 hour window

33
Interlude: Stateful vs.
Stateless
(tweet,
#hello
moe)
(?)operator

34
Stateless
(tweet,
#hello
moe)
“ciao”
operator
(tweet,
#ciao
moe)
The operation can only look at one element at a time.

35
Stateful
(tweet,
#hello
moe)
“count”
operator
(moe
mentioned
5 times)
The operation can keep information about past elements.

36
Stateful Stateless
 Aggregation
 Complex Event
Processing (CEP)
 Machine learning
models
 Ingestion
 Data cleansing
 Stateless
transformations

38
A Streaming Pipeline
(again)
Kafka Kafka
“count”
operator

39
Problem?
Results only arrive by the
hour.

41
Internal State
moe
5
sue
12
poe
2
State
timer service

42
Queryable State
Kafka Kafka
count for
“moe” ?
moe
5
Queryable State at Twitter: http://data-artisans.com/extending-the-yahoo-streaming-benchmark/
“count”
operator

43
Back to our Streaming
Pipeline (…again, really?)
Kafka Kafka
“count”
operator

44
What happens if you
need/want to…
 change the number of workers
 migrate to a different cluster
 fix a bug in your code
 fix a bug in our code (Flink)
 test different versions of an
algorithm

45
Stateless job → easy
Just stop and restart the
job

46
Stateful job → tricky
State needs to be
re-loaded/re-distributed

47
Savepoints
“create savepoint”
“change program”
“restart from savepoint”

48
Sessionization*
* also called “session windows”
 Based on the timestamp of
events
 Turns out this is tricky to do
right
 Flink supports this out-of-the-
box with version 1.1
 ask me about this afterwards☺

50
There is more cool stuff
 Dynamic rescaling of streaming jobs
 SQL on streams
 Windowing API improvements
 Running on Mesos

51
tl;dl*
 Stream processing is the cool new
thing
 Flink is already very good at it
 There is plenty of interesting stuff
coming up
* too long, didn’t listen

52
 Follow @ApacheFlink, @dataArtisans
 Read flink.apache.org/blog, data-artisans.com/blog
 Subscribe (news | user | dev) @ flink.apache.org
Join the Community!

We are hiring!
data-artisans.com/careers

Flink Forward 2016, Berlin
Submission deadline: June 30, 2016
Early bird deadline: July 15, 2016
www.flink-forward.org

57
Performance
• Performance always depends on your own use
cases, so test it yourself!
• We based our experiments on a recent
benchmark published by Yahoo!
• They benchmarked Storm, Spark Streaming
and Flink with a production use-case (counting
ad impressions)
Full Yahoo! article: https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-
computation-engines-at

58
Yahoo! Benchmark
• Count ad impressions grouped by campaign
• Compute aggregates over a 10 second window
• Emit current value of window aggregates to
Redis every second for query

59
Flink and Storm usually
at sub-second latencies
Spark latency increases
with throughout, at 8 sec
Results (lower is better)

60
• Benchmark stops at Storm’s throughput limits.
Where is Flink’s limit?
• How will Flink’s own window implementation
perform compared to Yahoo’s “state in redis
windowing” approach?
Extending the benchmark

61
KafkaConsumer
map()
filter()
group
windowing &
caching code
realtime queries
Windowing with State in
Redis

62
KafkaConsumer
map()
filter()
group
Flink event
time
windows
realtime queries
Rewrite to use Flink’s
own Windowing

63
0 750,000 1,500,000 2,250,000 3,000,000 3,750,000
Storm
Flink
Throughput: msgs/sec
400k msgs/sec
Results after Rewrite

64
KafkaConsumer
map()
filter()
group
Flink event
time
windows
Network link to
Kafka cluster is
bottleneck!
(1GigE)
Data Generator
map()
filter()
group
Flink event
time
windows
Solution: Move
data generator
into job (10 GigE)
Can we go further?

65
0 4,000,000 8,000,000 12,000,000 16,000,000
Storm
Flink
Flink (10 GigE)
Throughput: msgs/sec
10 GigE end-to-end
15m msgs/sec
400k msgs/sec
3m msgs/sec
Results without Network
Bottleneck

66
• Flink achieves throughput of 15 million
messages/second on 10 machines
• 35x higher throughput compared to Storm
(80x compared to Yahoo’s runs)
• Flink ran with exactly once guarantees, Storm
with at least once.
• Read the full report: http://data-
artisans.com/extending-the-yahoo-streaming-
benchmark/
Benchmark Summary

Roadmap 2016
68
• SQL / StreamSQL
• CEP Library
• Dynamic Scaling
• Miscellaneous

Miscellaneous
• Support for Apache Mesos
• Security
– Over-the-wire encryption of RPC (akka) and data
transfers (netty)
• More connectors
– Apache Cassandra
– Amazon Kinesis
• Enhance metrics
– Throughput / Latencies
– Backpressure monitoring
– Spilling / Out of Core
69

Fault Tolerance and correctness
70
4
3
4 2
• How can we ensure the state is always in sync
with the events?
event counter
final operator

Naïve state checkpointing approach
71
• Process some records:
• Stop everything,
store state:
• Continue processing …
0
0
0 0
1
1
2 2
Operator State
a 1
b 1
c 2
d 2
a
b
c d

Distributed Snapshots
72
0
0
0 0
1
1
0 0
Initial state
Start processing
1
1
0 0
Trigger checkpoint
Operator State
a 1
b 1

Distributed Snapshots
73
2
1
2 0
Operator State
a 1
b 1
c 2
Barrier flows with events
2
1
2 2
Checkpoint completed Operator State
a 1
b 1
c 2
d 2
• Valid snapshot without stopping the topology
• Multiple checkpoints can be in-flight
Complete,
consistent
state snapshot

Analysis of naïve approach
 Introduces latency
 Reduces throughput
• Can we create a correct snapshot while
keeping the job running?
• Yes! By creating a distributed snapshot
74

Handling Backpressure
75
Slow down
upstream
operators
Backpressure might occur when:
• Operators create checkpoints
• Windows are evaluated
• Operators depend on external
resources
• JVMs do Garbage Collection
Operator not able
to process
incoming data
immediately

Handling Backpressure
76
Sender
Sender
Receiver
Receiver
Sender does not have any
empty buffers available:
Slowdown
Network transfer (Netty) or
local buffer exchange
(when S and R are on the
same machine)
• Data sources slow down pulling data from their underlying
system (Kafka or similar queues)
Full buffer
Empty buffer

How do latency and throughput affect each
other?
flink.apache.org 7730 Machines, one repartition step
Sender
Sender
Receiver
Receiver
Send buffer when
full or timeout
• High throughput by batching events in network
buffers
• Filling the buffers introduces latency
• Configurable buffer timeout

Aggregate throughput for stream record
grouping
78
0
10,000,000
20,000,000
30,000,000
40,000,000
50,000,000
60,000,000
70,000,000
80,000,000
90,000,000
100,000,000
Flink, no
fault
tolerance
Flink,
exactly
once
Storm, no
fault
tolerance
Storm, at
least once
aggregate throughput
of 83 million elements
per second
8,6 million elements/s
309k elements/s  Flink achieves 260x
higher throughput with
fault tolerance
30 machines,
120 cores,
Google Compute

Performance: Summary
79
Continuous
streaming
Latency-bound
buffering
Distributed
Snapshots
High Throughput &
Low Latency
With configurable throughput/latency tradeoff

The building blocks: Summary
80
Low latency
High throughput
State handling
Windowing / Out
of order events
Fault tolerance
and correctness
• Tumbling / sliding windows
• Event time / processing time
• Low watermarks for out of order
events
• Managed operator state for
backup/recovery
• Large state with RocksDB
• Savepoints for operations
• Exactly-once semantics for
managed operator state
• Lightweight, asynchronous
distributed snapshotting algorithm
• Efficient, pipelined runtime
• no per-record operations
• tunable latency / throughput
tradeoff
• Async checkpoints

Low Watermarks
• We periodically send low-watermarks through
the system to indicate the progression of
event time.
81
For more details: “MillWheel: Fault-Tolerant Stream Processing at Internet
Scale” by T. Akidau et. al.
33 11 28 21 15 958
Guarantee that no event with time
<= 5 will arrive afterwards
Window
between
0 and 15
Window is evaluated when
watermarks arrive

Low Watermarks
82
For more details: “MillWheel: Fault-Tolerant Stream Processing at Internet Scale”
by T. Akidau et. al.
Operator 35
Operators with multiple inputs
always forward the lowest
watermark

Fault Tolerance in streaming
• Failure with “at least once”: replay
87
4
3
4 2
Restore from: Final result:
7
5
9 7

Fault Tolerance in streaming
• Failure with “exactly once”: state restore
88
1
1
2 2
Restore from: Final result:
4
3
7 7

Latency in stream record grouping
89
Data
Generator
Receiver:
Throughput /
Latency measure
• Measure time for a record to
travel from source to sink
0.00
5.00
10.00
15.00
20.00
25.00
30.00
Flink, no
fault
tolerance
Flink, exactly
once
Storm, at
least once
Median latency
25 ms
1 ms
0.00
10.00
20.00
30.00
40.00
50.00
60.00
Flink, no
fault
tolerance
Flink,
exactly
once
Storm, at
least
once
99th percentile
latency
50 ms

Savepoints: Simplifying Operations
• Streaming jobs usually run 24x7 (unlike batch).
• Application bug fixes: Replay your job from a
certain point in time (savepoint)
• Flink bug fixes
• Maintenance and system migration
• What-If simulations: Run different
implementations of your code against a
savepoint
90

Pipelining
91
Basic building block to “keep the data moving”
• Low latency
• Operators push
data forward
• Data shipping as
buffers, not tuple-
wise
• Natural handling
of back-pressure

Apache Flink(tm) - A Next-Generation Stream Processor

More Related Content

What's hot

Similar to Apache Flink(tm) - A Next-Generation Stream Processor

More from Aljoscha Krettek

Recently uploaded

Apache Flink(tm) - A Next-Generation Stream Processor

Editor's Notes