Apache Flink
In action
About me
Name: Artsem Semianenka
Position: BigData engineer in adform.com
LinkedIn: https://www.linkedin.com/in/artsemsemianenka/
GitHub: https://github.com/art4ul
https://www.facebook.com/art4ulFacebook:
Agenda
• Basic concepts
• Stateful processing
• Practice
• Fault tolerance
• Window functions
What is Apache Flink?
Apache Flink is an open source stream
processing framework
Stream
What is Apache Flink?
True
Streaming
Low latency
High Throughput
Back
pressure
Stateful
Computation
Savepoint
Exactly-once
semantic
User defined
state
Windows
Functions
Event time
Flexible windows
functions
API &
Libraries CEP
Flink MLGelly
Standalone Cluster
Mesos/Yarn/Kubernete
s
Cloud
AWS / Google
Engine)
Deploy
Streaming dataflow runtimeCore
DataStream API ( Java / Scala ) DataSet API ( Java / Scala)
Api
CEP Table API Flink ML Gelly Table API
Libraries
Apache Flink stack
Standalone Cluster
Mesos/Yarn/Kubernete
s
Cloud
AWS / Google
Engine)
Deploy
Streaming dataflow runtimeCore
DataStream API ( Java / Scala ) DataSet API ( Java / Scala)
Api
CEP Table API Flink ML Gelly Table API
Libraries
Apache Flink stack
Stream processing
approaches
Micro batch
True streaming
Micro batch
Micro Batch
True streaming
Source
Transformation
s
map(..)
flatMap(..)
filter(..)
reduce(..)fold(..)
join(..)
union(..)
Sink
Dataflow Programming Model
DAG
Source
Transformation
s Sink
Operators
map(..)/
keyBy(..)
Sink
/windows(..)/
apply(..)
Source
Parallelism:3 Parallelism:1Parallelism:3 Parallelism:3
Operators
map(..)/
keyBy(..)
Sink
/windows(..)/
apply(..)
Source
Parallelism:3
Parallelism:1
Operator subtasks
map(..)
keyBy(..)
windows(..)/
apply(..)
Source
map(..)/
keyBy(..) Sink
windows(..)/
apply(..)
Source
map(..)
keyBy(..)
windows(..)/
apply(..)
Source
Distributed Runtime Environment
Distributed Runtime Environment
Job Manager
JVM
Distributed Runtime Environment
Job Manager
JVM
Task Manager
JVM
Task Manager
JVM
Distributed Runtime Environment
Job Manager
JVM
Task Manager
JVM
Task Manager
JVM
map(..) Sinkwindow
Sourc
e
Distributed Runtime Environment
Job Manager
JVM
Task Manager
JVM
Task Manager
JVM
Sourc
e
Sourc
e
map(..)
window
window
map(..)
map(..) Sinkwindow
Sourc
e
Distributed Runtime Environment
Job Manager
JVM
Task Manager
JVM
Task Manager
JVM
map(..)
window
window
map(..)
map(..)
Sinkwindow
Sourc
e
Sourc
e
Sourc
e
Sourc
e
Sourc
e
map(..)
window
window
map(..)
map(..) Sinkwindow
Sourc
e
Example
Example
Datacenter
DeviceId: 1234
Value: 26.0
Example: Data Flow
Datacenter
Source KeyBy
(DeviceId)
Map
(Metric)
Sink
Example: Data Flow
Source KeyBy
(DeviceId)
Map
(Metric)
Sink
KeyBy
(DeviceId)
Map
(Metric)
Sink
KeyBy
(DeviceId)
Map
(Metric)
Sink
Let’s code
Example
DatacenterDeviceId: 1234
Value: 26.0
Value: 26.0
Device1 -> User1
Device2 -> User2
Example: Data Flow
Source KeyBy
(DeviceId)
Map
(Metric)
Sink
FlatMap
(UserMapping)
Example: Data Flow
KeyBy
(DeviceId)
Map
(Metric)
Sink
Control
Source
Source KeyBy
(DeviceId)
DeviceId UserId
Operator State
Device1 User1
Device2 User2
DeviceId:Device1
UserId: User1
DeviceId:Device1
Value: 23.0
DeviceId:Device1
UserId: User1
Value: 23.0
DeviceId:Device2
UserId: User2
DeviceId:Device3
Value: 30.0
DeviceId:Device2
Value: 10.0
DeviceId:Device2
UserId: User2
Value: 10.0
<code/>
Storage
Checkpointing: Barriers
State: sum =
4 13 2N5
01366
Operator
Storage
Checkpointing: Barriers
Operator1Source Operator2 Sink
State1 State2
N=1 N=2
Source
Operator 1
Operator 2
Sink
Src State Sink State
12
Src State State1 State2Src State
Ack
State1
Operator
Barrier alining
Operator
Barrier alining
1
Operator
Barrier alining
11
Operator
Barrier alining
112
Operator
Barrier alining
113 2
Operator
Barrier alining
113 2
Operator
Barrier alining (Exactly once)
113 2
Windows functions
Window functions
Unbounded event stream
Window 1Window 2Window 3Window 4
Time driven
(example: every 30 seconds)
Window functions
Unbounded event stream
Window 1Window 2Window 3
Data driven
(example: every 2 elements)
Tumbling Windows
User1
User2
User3
Time
30 sec30 sec 30 sec 30 sec
Tumbling Windows
User1
User2
User3
Time
Sliding Windows
User1
User2
User3
Time
Sliding Windows
User1
User2
User3
TimeWindow
Slide
Interval
Sliding Windows
User1
User2
User3
Time
User1
User2
User3
Time
Session Windows
Window
Session
Gap
User1
User2
User3
Time
Session Windows
Window
Session
Gap
Time
Window
Function
Window
Function
Source
Source
Event time Ingestion time Processing time
Time
Window
Function
Window
Function
Source
Sourc
e
Watermarks
Window
Operator
Window( 00:00 - 0:10)
Window( 00:05 - 0:15)
00:0
0
00:0
2
00:0
3
00:0
5
00:0
5
Watermark
t = 00:05
00:0
8
00:1
1
Watermark
t = 00:11
Thank you for
attention
Questions?

Flink in action