Upcoming Features:
Apache Flink™ 0.10
Aljoscha Krettek
aljoscha@apache.org
What to Expect
 High-Availability of Master Node
(JobManager)
 Live Monitoring
 Event-time, watermarks and windowing
improvements
 Demo: Fault Tolerance
2
These are only the highlights, more stuff is being worked on!
High Availability
3
Status Quo
4
JobManager
TasManager
PANIC!
With High Availability
5
JobManager
TaskManager
Stand-by
JobManager
Apache Zookeeper™
KEEP GOING
Some Details
 Flink uses ZooKeeper™ for two things:
• Leader selection (in case of multiple
JobManagers)
• Reliable Storage of Dataflow graph and
checkpoint metadata (more on that later)
6
Live Monitoring
7
Live Monitoring
 Before:
• Accumulators only available after Job finishes
 Now:
• Accumulators updated while Job is running
• System accumulators (number of
bytes/records processed…)
8
9
Timestamps, Watermarks and
the Rest™
10
Why all the Fuss?
11
Window
Operator112131143
Payload: 0x45FD
Timestamp: 13
Window Window
Flow of Data
Elements do not arrive ordered by Timestamp.
? ?
Processing Time Windows
12
Window
Operator112131143
Payload: 0x45FD
Timestamp: 13
1143
Window
11213
Window
Flow of Data
Elements do not arrive ordered by Timestamp.
Event Time Windows
13
Window
Operator112131143
Payload: 0x45FD
Timestamp: 13
Flow of Data
Elements do not arrive ordered by Timestamp.
111314
Window
312
Window
Problem: How do you
know when to process
windows?
Watermarks to the Rescue
14
Source 11213163115571420
4
This is a Watermark
815
Some Details
 Window Operator waits for watermarks
 Upon Watermark Arrival we can process
elements with timestamps lower than the
watermark
 Operators forward watermarks once they
know they cannot emit elements with
lower timestamp
15
Fault Tolerance
16
Streaming Fault Tolerance
 Ensure that operators see all events
• “At least once”
• Solved by replaying a stream from a
checkpoint, e.g., from a past Kafka offset
 Ensure that operators do not perform
duplicate updates to their state
• “Exactly once”
• Several solutions
17
Exactly-Once Approaches
 Discretized streams (Spark Streaming)
• Treat streaming as a series of small atomic computations
• “Fast track” to fault tolerance, but restricts computational
and programming model (e.g., cannot mutate state across
“mini-batches”, window functions correlated with mini-
batch size)
 MillWheel (Google Cloud Dataflow)
• State update and derived events committed as atomic
transaction to a high-throughput transactional store
• Requires a very high-throughput transactional store 
 Chandy-Lamport distributed snapshots (Flink)
18
19
20
21
22
Best of all Worlds for Streaming
 Low latency
• Thanks to pipelined engine
 Exactly-once guarantees
• Variation of Chandy-Lamport
 High throughput
• Controllable checkpointing overhead
 Separates app logic from recovery
• Checkpointing interval is just a config parameter
23
Demo time
24
25
flink-forward.org
I Flink, do you? 
26
If you find this exciting,
get involved and start a discussion on Flink‘s
mailing list,
or stay tuned by
subscribing to news@flink.apache.org,
following flink.apache.org/blog, and
@ApacheFlink on Twitter

Flink 0.10 - Upcoming Features

  • 1.
    Upcoming Features: Apache Flink™0.10 Aljoscha Krettek aljoscha@apache.org
  • 2.
    What to Expect High-Availability of Master Node (JobManager)  Live Monitoring  Event-time, watermarks and windowing improvements  Demo: Fault Tolerance 2 These are only the highlights, more stuff is being worked on!
  • 3.
  • 4.
  • 5.
  • 6.
    Some Details  Flinkuses ZooKeeper™ for two things: • Leader selection (in case of multiple JobManagers) • Reliable Storage of Dataflow graph and checkpoint metadata (more on that later) 6
  • 7.
  • 8.
    Live Monitoring  Before: •Accumulators only available after Job finishes  Now: • Accumulators updated while Job is running • System accumulators (number of bytes/records processed…) 8
  • 9.
  • 10.
  • 11.
    Why all theFuss? 11 Window Operator112131143 Payload: 0x45FD Timestamp: 13 Window Window Flow of Data Elements do not arrive ordered by Timestamp. ? ?
  • 12.
    Processing Time Windows 12 Window Operator112131143 Payload:0x45FD Timestamp: 13 1143 Window 11213 Window Flow of Data Elements do not arrive ordered by Timestamp.
  • 13.
    Event Time Windows 13 Window Operator112131143 Payload:0x45FD Timestamp: 13 Flow of Data Elements do not arrive ordered by Timestamp. 111314 Window 312 Window Problem: How do you know when to process windows?
  • 14.
    Watermarks to theRescue 14 Source 11213163115571420 4 This is a Watermark 815
  • 15.
    Some Details  WindowOperator waits for watermarks  Upon Watermark Arrival we can process elements with timestamps lower than the watermark  Operators forward watermarks once they know they cannot emit elements with lower timestamp 15
  • 16.
  • 17.
    Streaming Fault Tolerance Ensure that operators see all events • “At least once” • Solved by replaying a stream from a checkpoint, e.g., from a past Kafka offset  Ensure that operators do not perform duplicate updates to their state • “Exactly once” • Several solutions 17
  • 18.
    Exactly-Once Approaches  Discretizedstreams (Spark Streaming) • Treat streaming as a series of small atomic computations • “Fast track” to fault tolerance, but restricts computational and programming model (e.g., cannot mutate state across “mini-batches”, window functions correlated with mini- batch size)  MillWheel (Google Cloud Dataflow) • State update and derived events committed as atomic transaction to a high-throughput transactional store • Requires a very high-throughput transactional store   Chandy-Lamport distributed snapshots (Flink) 18
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
    Best of allWorlds for Streaming  Low latency • Thanks to pipelined engine  Exactly-once guarantees • Variation of Chandy-Lamport  High throughput • Controllable checkpointing overhead  Separates app logic from recovery • Checkpointing interval is just a config parameter 23
  • 24.
  • 25.
  • 26.
    I Flink, doyou?  26 If you find this exciting, get involved and start a discussion on Flink‘s mailing list, or stay tuned by subscribing to news@flink.apache.org, following flink.apache.org/blog, and @ApacheFlink on Twitter