January 2016 Flink Community Update & Roadmap 2016

Community Update &
Roadmap 2016
Robert Metzger
@rmetzger_
rmetzger@apache.org
Berlin Apache Flink Meetup,
January 26, 2016

January Community Update
What happened in the last month
2

What happened?
3
 Google proposed Dataflow API to Apache
Incubator
 Proposal discussions at the mailing list:
• SQL / Stream SQL support
• CEP (Complex Event Processing) library
 Flink Kinesis Connector
 Chengxiang Li added as committer
 Discussions for releasing 1.0.0

Now merged to master (1.0-SNAPSOT)
4
 Savepoints: Manual checkpoints for
restarting jobs with state
 Kafka 0.9.0.0 integration
 Job submission through JobManager web
interface
 Checkpoint statistics in JobManager web
interface
 Streaming examples are now in the binary
dist

Reading List
 Benchmarking Streaming Computation
Engines at Yahoo!
 Receiving metrics from Apache Flink
applications
 Running Apache Flink on Amazon Elastic
Mapreduce
5
1. http://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-
computation-engines-at
2. http://mnxfst.tumblr.com/post/136539620407/receiving-metrics-from-apache-flink-
applications
3. http://themodernlife.github.io/scala/hadoop/hdfs/sclading/flink/streaming/realtime/e
mr/aws/2016/01/06/running-apache-flink-on-amazon-elastic-mapreduce/

Upcoming talks
 FOSDEM Brussels (4 talks) (Jan 30-31)
 Big Data Technology Summit Warsaw
(Feb. 25-26)
 Qcon London (March 7-9)
 Hadoop Summit Dublin (2 talks) (April 13-
14)
 Strata San Jose
 Strata London
6

Global Meetup Community
 Brazil-Sao Paulo Apache Flink Meetup
 Apache Flink Taiwan User Group
 Also new groups in Delhi, Phoenix and
Dallas
7

Overview
10
 SQL / StreamSQL
 CEP Library
 Managed Operator State
 Dynamic Scaling
 Miscellaneous

SQL / StreamSQL
12
 Structured queries over data sets and
streams
 Add support for SQL
• Standard SQL queries over (batch) data sets
• Continuous StreamSQL queries over data
streams
 Keep and extend Table API as structured
query API on data sets and streams

Proposed Architecture
13
Table API
(Batch) SQL
Query
StreamSQL
Query
ApacheCalcite
Standard
SQL parser
Customized
StreamSQL
parser
Optimizer
Logical Plan
DataSet
Program
DataStream
Program
APIs
Internals

SQL integration into APIs
14
val stream : DataStream[(String, Double, Int)]
= env.addSource(new FlinkKafkaConsumer(...))
val tabEnv = new TableEnvironment(env)
tabEnv.registerStream(stream, “myStream”,
(“ID”, “MEASURE”, “COUNT”))
val sqlQuery = tabEnv.sql(
“SELECT ID, MEASURE FROM myStream WHERE
COUNT > 17”)
 Define Kafka input stream
 Define table environment
 SQL Query

CEP Library
 Complex Event Processing: the analysis of
complex patterns such as correlations and
sequence detection from multiple sources
 Most current systems are not distributed
(beyond multi-threading)
 Goal: provide an easy to use API for CEP,
running on a distributed high-throughput, low
latency engine.
16

CEP Example
17
Realtime stock prices
15.1 15.3 15.2 15.5
State
Machine
Alerts
Start
Price drop by at least $.5
Ignore
Alert

Programming API for CEP
CEPStream<Event> cepStream = CEP.from(inputDataStream)
// grouping
GroupedCEPStream<Event> grouped = cepStream.groupBy(“id”)
// windows
WindowedCEPStream windowed = grouped.timeWindow(Time.minutes(10),
Time.minutes(1))
WindowedCEPStream windowed = grouped.countWindow(10L, 1L)
// pattern matching
CEPStream<Result> resultStream =
CEP.from(input).groupBy(0).pattern(
Pattern.<Event>next("e1").where( (evt) -> evt.id == 42 )
.followedBy("e2").where( (evt) -> evt.id == 1337 )
.within(Time.minutes(10))
).select( (Map<String, Event> patternElements) ->
new Result(patternElements.get("e2").timestamp -
patternElements.get("e1").timestamp) ) 18
 convert stream into CEPStream of Events
 Window events
 Define a pattern to match

DSL for CEP
select e1.id, e1.price
from every e1 = Event(price > 10) →
e2 = Event(date == 42) → e3 =
Event(price == 10) within 10 seconds
where e1.id == e2.id
19
 No programming required
 Potentially integrated with SQL

State in Flink
21
Operator
“count tweet
impressions”
User Function
state
impression counts
Retrieve/set
count for
tweet it

State in Flink
22
Operator
“count tweet
impressions”
User Function
state
impression counts
Retrieve/set
count for
tweet it
What happens if the job
crashes?
Loss of data

Solution: Checkpoints
23
Operator
“count tweet
impressions”
User Function
impression counts
Retrieve/set
count for
tweet it
Periodic checkpoints
of state to HDFS
Restore from HDFS
in case of failure
state

Solution: Checkpoints
24
Operator
“count tweet
impressions”
User Function
impression counts
Retrieve/set
count for
tweet it
Periodic checkpoints
of state to HDFS
Restore from HDFS
in case of failure
state
This is the current state in
Flink!

State on Steroids
25
Operator
“count tweet
impressions”
User Function
impression counts
Retrieve/set
count for
tweet it
state

State on Steroids
26
Operator
“count tweet
impressions”
User Function
impression counts
Retrieve/set
count for
tweet it
state
Spill to disk
async/incremental snapshots
Restore from HDFS
in case of failure
What if state
grows too big?

State on Steroids
27
Operator
“count tweet
impressions”
User Function
impression counts
Retrieve/set
count for
tweet it
state
Spill to disk

State on Steroids
28
Operator
“count tweet
impressions”
User Function
impression counts
Retrieve/set
count for
tweet it
state
Spill to disk
Restore from HDFS
in case of failure
What if state
grows too big?
Checkpointing stalls
processing!

State on Steroids
29
Operator
“count tweet
impressions”
User Function
impression counts
Retrieve/set
count for
tweet it
state
Spill to disk
Restore from HDFS
in case of failure

Dealing with Dynamic
Resources
30

Streams with varying data rate
31
time
events/second
With static resources: Provision for max. rate
Idle capacity

(1) Adjust Parallelism
32
Initial
configuration
Scale Out
(for load)
Scale In
(save resources)

(1) Adjust Parallelism
 Adjusting parallelism without (significantly)
interrupting the program
 Initial version:
• Checkpoint -> stop -> restart-with-different-parallelism
 Stateless operators: Trivial
 Stateful operators: Repartition state
• Transparent for key/value state and windows
• Consistent hashing simplifies state reorganization
33

(2) Dynamic Worker Pool
34
JobManager
Resource
Manager
Pool of Cluster
ResourcesYARN/Mesos/…
TaskManager
TaskManager

Miscellaneous
 Support for Apache Mesos
 Security
• Over-the-wire encryption of RPC (akka) and data
transfers (netty)
 More connectors
• Apache Cassandra
• Amazon Kinesis
 Enhance metrics
• Throughput / Latencies
• Backpressure monitoring
• Spilling / Out of Core
35

January 2016 Flink Community Update & Roadmap 2016

More Related Content

What's hot

Viewers also liked

Similar to January 2016 Flink Community Update & Roadmap 2016

More from Robert Metzger

Recently uploaded

January 2016 Flink Community Update & Roadmap 2016

Editor's Notes