Apache Flink® Training
System Overview
2
A stream processor
with many applications
Streaming dataflow runtime
Apache Flink
1 year of Flink - code
April 2014 April 2015
Flink Community
0
20
40
60
80
100
120
Aug-10 Feb-11 Sep-11 Apr-12 Oct-12 May-13 Nov-13 Jun-14 Dec-14 Jul-15
#unique contributor ids by git commits
In top 5 of Apache's big
data projects after one year
in the Apache Software
Foundation
The Apache Way
 Flink is an Apache top-level
project
5
• Independent, non-profit organization
• Community-driven open source software development
approach
• Public communication and open to new contributors
What is Apache Flink?
6
Gelly
Table
ML
SAMOA
DataSet (Java/Scala/Python) DataStream (Java/Scala)
HadoopM/R
Local Remote Yarn Tez Embedded
Dataflow
Dataflow(WiP)
MRQL
Table
Cascading(WiP)
Streaming dataflow runtime
Native workload support
7
Flink
Streaming
topologies
Heavy
batch jobs
Machine Learning at scale
How can an engine natively support all these workloads?
And what does native mean?
E.g.: Non-native iterations
8
Step Step Step Step Step
Client
for (int i = 0; i < maxIterations; i++) {
// Execute MapReduce job
}
E.g.: Non-native streaming
9
stream
discretizer
Job Job Job Job
while (true) {
// get next few records
// issue batch job
}
Native workload support
10
Flink
Stream
processing
Batch
processing
Machine Learning at scale
How can an engine natively support all these workloads?
And what does "native" mean?
Graph Analysis
Flink Engine
1. Execute everything as streams
2. Iterative (cyclic) dataflows
3. Mutable state
4. Operate on managed memory
5. Special code paths for batch
11
State +
Computation
What is a Flink Program?
12
Gelly
Table
ML
SAMOA
DataSet (Java/Scala/Python) DataStream (Java/Scala)
HadoopM/R
Local Remote Yarn Tez Embedded
Dataflow
Dataflow(WiP)
MRQL
Table
Cascading(WiP) Streaming dataflow runtime
Flink stack
Basic API Concept
How do I write a Flink program?
1. Bootstrap sources
2. Apply operations
3. Output to source
14
Data
Stream
Operation
Data
Stream
Source Sink
Data
Set
Operation
Data
Set
Source Sink
Batch & Stream Processing
 DataStream API
15
Stock Feed
Name Price
Microsoft 124
Google 516
Apple 235
… …
Alert if
Microsoft
> 120
Write
event to
database
Sum every
10
seconds
Alert if
sum >
10000
Microsoft 124
Google 516
Apple 235
Microsoft 124
Google 516
Apple 235
b h
2 1
3 5
7 4
… …
Map Reduce
a
1
2
…
 DataSet API
Example: Map/Reduce paradigm
Example: Live Stock Feed
Streaming & Batch
16
Batch
finite
blocking or
pipelined
high
Streaming
infinite
pipelined
low
Input
Data transfer
Latency
Scaling out
17
Data
Set
Operation
Data
Set
Source Sink
Data
Set
Operation
Data
Set
Source SinkData
Set
Operation
Data
Set
Source SinkData
Set
Operation
Data
Set
Source SinkData
Set
Operation
Data
Set
Source SinkData
Set
Operation
Data
Set
Source SinkData
Set
Operation
Data
Set
Source SinkData
Set
Operation
Data
Set
Source Sink
Scaling up
18
Sources (selection)
Collection-based
 fromCollection
 fromElements
File-based
 TextInputFormat
 CsvInputFormat
Other
 SocketInputFormat
 KafkaInputFormat
 Databases
19
Sinks (selection)
File-based
 TextOutputFormat
 CsvOutputFormat
 PrintOutput
Others
 SocketOutputFormat
 KafkaOutputFormat
 Databases
20
Hadoop Integration
Out of the box
 Access HDFS
 Yarn Execution (covered later)
 Reuse data types (Writables)
With a thin wrapper
 Reuse Hadoop input and output formats
 Reuse functions like Map and Reduce
21
What’s the Lifecycle of a
Program?
22
Architecture Overview
 Client
 Master (Job Manager)
 Worker (Task Manager)
24
Client
Job Manager
Task
Manager
Task
Manager
Task
Manager
Client
 Optimize
 Construct job graph
 Pass job graph to job manager
 Retrieve job results
25
Job Manager
Client
case class Path (from: Long, to:
Long)
val tc = edges.iterate(10) {
paths: DataSet[Path] =>
val next = paths
.join(edges)
.where("to")
.equalTo("from") {
(path, edge) =>
Path(path.from, edge.to)
}
.union(paths)
.distinct()
next
}
Optimizer
Type
extraction
Data
Source
orders.tbl
Filter
Map
DataSourc
e
lineitem.tbl
Join
Hybrid Hash
build
HT
probe
hash-part [0] hash-part [0]
GroupRed
sort
forward
Job Manager
 Parallelization: Create Execution Graph
 Scheduling: Assign tasks to task managers
 State: Supervise the execution
26
Job Manager
Data
Source
orders.tbl
Filter
Map
DataSour
ce
lineitem.tbl
Join
Hybrid Hash
build
HT
prob
e
hash-part [0] hash-part [0]
GroupRed
sort
forwar
d
Task Manager
Task Manager
Task Manager
Task Manager
Data
Source
orders.tbl
Filter
Map
DataSour
ce
lineitem.tbl
Join
Hybrid Hash
build
HT
prob
e
hash-part [0] hash-part [0]
GroupRed
sort
forwar
d
Data
Source
orders.tbl
Filter
Map
DataSour
ce
lineitem.tbl
Join
Hybrid Hash
build
HT
prob
e
hash-part [0] hash-part [0]
GroupRed
sort
forwar
d
Data
Source
orders.tbl
Filter
Map
DataSour
ce
lineitem.tbl
Join
Hybrid Hash
build
HT
prob
e
hash-part [0] hash-part [0]
GroupRed
sort
forwar
d
Data
Source
orders.tbl
Filter
Map
DataSour
ce
lineitem.tbl
Join
Hybrid Hash
build
HT
prob
e
hash-part [0] hash-part [0]
GroupRed
sort
forwar
d
Task Manager
 Operations are split up into tasks depending
on the specified parallelism
 Each parallel instance of an operation runs in
a separate task slot
 The scheduler may run several tasks from
different operators in one task slot
27
Task Manager
S
l
o
t
Task ManagerTask Manager
S
l
o
t
S
l
o
t
Execution Setups
28
Ways to Run a Flink Program
29
Gelly
Table
ML
SAMOA
DataSet (Java/Scala/Python) DataStream (Java/Scala)
HadoopM/R
Local Remote Yarn Tez Embedded
Dataflow
Dataflow(WiP)
MRQL
Table
Cascading(WiP) Streaming dataflow runtime
Local Execution
 Starts local Flink cluster
 All processes run in the
same JVM
 Behaves just like a
regular Cluster
 Very useful for developing
and debugging
30
Job Manager
Task
Manager
Task
Manager
Task
Manager
Task
Manager
JVM
Embedded Execution
 Runs operators on simple Java
collections
 Lower overhead
 Does not use memory management
 Useful for testing and debugging
31
Remote Execution
 Submit a Job
remotely
 Monitor the status
of a job
32
Client Job Manager
Cluster
Task
Manager
Task
Manager
Task
Manager
Task
Manager
Submit job
YARN Execution
 Multi-user scenario
 Resource sharing
 Uses YARN
containers to run a
Flink cluster
 Easy to setup
33
Client
Node Manager
Job Manager
YARN Cluster
Resource Manager
Node Manager
Task
Manager
Node Manager
Task
Manager
Node Manager
Other
Application
Execution
 Leverages Apache Tez’s runtime
 Built on top of YARN
 Good YARN citizen
 Fast path to elastic deployments
 Slower than native Flink
34
Flink compared to other
projects
35
Batch & Streaming projects
Batch only
Streaming only
Hybrid
36
Batch comparison
37
API low-level high-level high-level
Data Transfer batch batch pipelined & batch
Memory
Management
disk-based JVM-managed Active managed
Iterations
file system
cached
in-memory
cached
streamed
Fault tolerance task level task level job level
Good at massive scale out data exploration
heavy backend &
iterative jobs
Libraries many external built-in & external
evolving built-in &
external
Streaming comparison
38
Streaming “true” mini batches “true”
API low-level high-level high-level
Fault tolerance tuple-level ACKs RDD-based (lineage) coarse checkpointing
State not built-in external internal
Exactly once at least once exactly once exactly once
Windowing not built-in restricted flexible
Latency low medium low
Throughput medium high high
Thank you for listening!
39

Apache Flink Training: System Overview

  • 1.
  • 2.
    2 A stream processor withmany applications Streaming dataflow runtime Apache Flink
  • 3.
    1 year ofFlink - code April 2014 April 2015
  • 4.
    Flink Community 0 20 40 60 80 100 120 Aug-10 Feb-11Sep-11 Apr-12 Oct-12 May-13 Nov-13 Jun-14 Dec-14 Jul-15 #unique contributor ids by git commits In top 5 of Apache's big data projects after one year in the Apache Software Foundation
  • 5.
    The Apache Way Flink is an Apache top-level project 5 • Independent, non-profit organization • Community-driven open source software development approach • Public communication and open to new contributors
  • 6.
    What is ApacheFlink? 6 Gelly Table ML SAMOA DataSet (Java/Scala/Python) DataStream (Java/Scala) HadoopM/R Local Remote Yarn Tez Embedded Dataflow Dataflow(WiP) MRQL Table Cascading(WiP) Streaming dataflow runtime
  • 7.
    Native workload support 7 Flink Streaming topologies Heavy batchjobs Machine Learning at scale How can an engine natively support all these workloads? And what does native mean?
  • 8.
    E.g.: Non-native iterations 8 StepStep Step Step Step Client for (int i = 0; i < maxIterations; i++) { // Execute MapReduce job }
  • 9.
    E.g.: Non-native streaming 9 stream discretizer JobJob Job Job while (true) { // get next few records // issue batch job }
  • 10.
    Native workload support 10 Flink Stream processing Batch processing MachineLearning at scale How can an engine natively support all these workloads? And what does "native" mean? Graph Analysis
  • 11.
    Flink Engine 1. Executeeverything as streams 2. Iterative (cyclic) dataflows 3. Mutable state 4. Operate on managed memory 5. Special code paths for batch 11 State + Computation
  • 12.
    What is aFlink Program? 12
  • 13.
    Gelly Table ML SAMOA DataSet (Java/Scala/Python) DataStream(Java/Scala) HadoopM/R Local Remote Yarn Tez Embedded Dataflow Dataflow(WiP) MRQL Table Cascading(WiP) Streaming dataflow runtime Flink stack
  • 14.
    Basic API Concept Howdo I write a Flink program? 1. Bootstrap sources 2. Apply operations 3. Output to source 14 Data Stream Operation Data Stream Source Sink Data Set Operation Data Set Source Sink
  • 15.
    Batch & StreamProcessing  DataStream API 15 Stock Feed Name Price Microsoft 124 Google 516 Apple 235 … … Alert if Microsoft > 120 Write event to database Sum every 10 seconds Alert if sum > 10000 Microsoft 124 Google 516 Apple 235 Microsoft 124 Google 516 Apple 235 b h 2 1 3 5 7 4 … … Map Reduce a 1 2 …  DataSet API Example: Map/Reduce paradigm Example: Live Stock Feed
  • 16.
    Streaming & Batch 16 Batch finite blockingor pipelined high Streaming infinite pipelined low Input Data transfer Latency
  • 17.
    Scaling out 17 Data Set Operation Data Set Source Sink Data Set Operation Data Set SourceSinkData Set Operation Data Set Source SinkData Set Operation Data Set Source SinkData Set Operation Data Set Source SinkData Set Operation Data Set Source SinkData Set Operation Data Set Source SinkData Set Operation Data Set Source Sink
  • 18.
  • 19.
    Sources (selection) Collection-based  fromCollection fromElements File-based  TextInputFormat  CsvInputFormat Other  SocketInputFormat  KafkaInputFormat  Databases 19
  • 20.
    Sinks (selection) File-based  TextOutputFormat CsvOutputFormat  PrintOutput Others  SocketOutputFormat  KafkaOutputFormat  Databases 20
  • 21.
    Hadoop Integration Out ofthe box  Access HDFS  Yarn Execution (covered later)  Reuse data types (Writables) With a thin wrapper  Reuse Hadoop input and output formats  Reuse functions like Map and Reduce 21
  • 22.
    What’s the Lifecycleof a Program? 22
  • 23.
    Architecture Overview  Client Master (Job Manager)  Worker (Task Manager) 24 Client Job Manager Task Manager Task Manager Task Manager
  • 24.
    Client  Optimize  Constructjob graph  Pass job graph to job manager  Retrieve job results 25 Job Manager Client case class Path (from: Long, to: Long) val tc = edges.iterate(10) { paths: DataSet[Path] => val next = paths .join(edges) .where("to") .equalTo("from") { (path, edge) => Path(path.from, edge.to) } .union(paths) .distinct() next } Optimizer Type extraction Data Source orders.tbl Filter Map DataSourc e lineitem.tbl Join Hybrid Hash build HT probe hash-part [0] hash-part [0] GroupRed sort forward
  • 25.
    Job Manager  Parallelization:Create Execution Graph  Scheduling: Assign tasks to task managers  State: Supervise the execution 26 Job Manager Data Source orders.tbl Filter Map DataSour ce lineitem.tbl Join Hybrid Hash build HT prob e hash-part [0] hash-part [0] GroupRed sort forwar d Task Manager Task Manager Task Manager Task Manager Data Source orders.tbl Filter Map DataSour ce lineitem.tbl Join Hybrid Hash build HT prob e hash-part [0] hash-part [0] GroupRed sort forwar d Data Source orders.tbl Filter Map DataSour ce lineitem.tbl Join Hybrid Hash build HT prob e hash-part [0] hash-part [0] GroupRed sort forwar d Data Source orders.tbl Filter Map DataSour ce lineitem.tbl Join Hybrid Hash build HT prob e hash-part [0] hash-part [0] GroupRed sort forwar d Data Source orders.tbl Filter Map DataSour ce lineitem.tbl Join Hybrid Hash build HT prob e hash-part [0] hash-part [0] GroupRed sort forwar d
  • 26.
    Task Manager  Operationsare split up into tasks depending on the specified parallelism  Each parallel instance of an operation runs in a separate task slot  The scheduler may run several tasks from different operators in one task slot 27 Task Manager S l o t Task ManagerTask Manager S l o t S l o t
  • 27.
  • 28.
    Ways to Runa Flink Program 29 Gelly Table ML SAMOA DataSet (Java/Scala/Python) DataStream (Java/Scala) HadoopM/R Local Remote Yarn Tez Embedded Dataflow Dataflow(WiP) MRQL Table Cascading(WiP) Streaming dataflow runtime
  • 29.
    Local Execution  Startslocal Flink cluster  All processes run in the same JVM  Behaves just like a regular Cluster  Very useful for developing and debugging 30 Job Manager Task Manager Task Manager Task Manager Task Manager JVM
  • 30.
    Embedded Execution  Runsoperators on simple Java collections  Lower overhead  Does not use memory management  Useful for testing and debugging 31
  • 31.
    Remote Execution  Submita Job remotely  Monitor the status of a job 32 Client Job Manager Cluster Task Manager Task Manager Task Manager Task Manager Submit job
  • 32.
    YARN Execution  Multi-userscenario  Resource sharing  Uses YARN containers to run a Flink cluster  Easy to setup 33 Client Node Manager Job Manager YARN Cluster Resource Manager Node Manager Task Manager Node Manager Task Manager Node Manager Other Application
  • 33.
    Execution  Leverages ApacheTez’s runtime  Built on top of YARN  Good YARN citizen  Fast path to elastic deployments  Slower than native Flink 34
  • 34.
    Flink compared toother projects 35
  • 35.
    Batch & Streamingprojects Batch only Streaming only Hybrid 36
  • 36.
    Batch comparison 37 API low-levelhigh-level high-level Data Transfer batch batch pipelined & batch Memory Management disk-based JVM-managed Active managed Iterations file system cached in-memory cached streamed Fault tolerance task level task level job level Good at massive scale out data exploration heavy backend & iterative jobs Libraries many external built-in & external evolving built-in & external
  • 37.
    Streaming comparison 38 Streaming “true”mini batches “true” API low-level high-level high-level Fault tolerance tuple-level ACKs RDD-based (lineage) coarse checkpointing State not built-in external internal Exactly once at least once exactly once exactly once Windowing not built-in restricted flexible Latency low medium low Throughput medium high high
  • 38.
    Thank you forlistening! 39

Editor's Notes

  • #5 dev list: 300-400 messages/month. record 1000 messages on