Flink. Pure Streaming

Flink Pure
Streaming
Paco Guerrero
Big Data & Solutions Architect 9/21/16

“Abstraction of reality used to facilitate information processing”
Batch

Batch
Batch Job
All
Input
All
Output
Nothing about time
Timestamps used as trick to
keep real time fingerprint

Streaming
“Continuous processing of data that is continuously produced”

Streaming
“Streaming is the next programming paradigm for
data applications, and you need to start thinking in
terms of streams”

Streaming
terms of streams”
Data Stream: Infinite sequence of data arriving in a
continuous fashion.

Streaming
terms of streams”
continuous fashion.
Stream processing is the backbone of the new data
infrastructure.

Streaming
terms of streams”
continuous fashion.
Stream processing is the backbone of the new data
infrastructure.
“The world beyond batch”
A high-level tour of modern data-processing concepts. By Tyler Akidau
August 5, 2015 https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101

Streaming
Streaming Job
Real Life Time !!

Streaming is the biggest change in
data infraestructure since Hadoop
Streaming

The biggest change is moving from
batch to streaming is handling time explicitly
Streaming

Batch Job
Batch Job 2
All
Output
Batch Job 1
Micro Batch

Batch Job
Batch Job 2
All
Input
All
OutputBatch Job 3
All
Output
All
Output
Batch Job 1
Batch Frequency ?
Timestamps keeps real time
fingerprint
Micro Batch

Streaming Technologies
Batch StreamingMicro Batch
StateLess –
Record acknowledgements
CPU bounded performance
Not expressive declarative
functional API – Low Level API
Not auto scaling
Low level programmatic topology
Poor Streaming Windows
funcionalities
Not compatible with Hadoop APIs
Streams

Streaming Technologies
Batch StreamingMicro Batch

Flink
Open Source Stream Processing Framework.
Last available Release 1.1.1
Top Level Apache Project since Dec '14

Flink
Open Source Stream Processing Framework.
Last available Release 1.1.1
Top Level Apache Project since Dec '14
Main Features
Native Stream
Low Latency
High throughput
Stateful
Exactly-one guarantees
Distributed
Expressive APIs
And more ….

YARN upcoming..
.
Flink Integration

Flink Integration
YARN upcoming..
.

upcoming..
.
Flink Integration
YARN upcoming..
.

Distributed pipelined processing
Execute everything as Stream
Iterative ( cyclic ) dataflows
Mutable state in operations
Operate on managed memory (*)
Also works on batch !!
Job Manager
Client
Optimizer
Dataflow
Graph
Flink Runtime Engine

Distributed pipelined processing
Execute everything as Stream
Iterative ( cyclic ) dataflows
Mutable state in operations
Operate on managed memory (*)
Also works on batch !!
Workers ( Task Managers )
Job Manager
Client
Optimizer
Dataflow
Graph
Execution
Graph

Stream Job
Batch Job
ML Job
Graph Job
optimizer
optimizer
optimizer
optimizer

Tasks scheduled and executed in workers ( slots )
Tasks as chain of operators
Run operator logic in a pipelined fashion
Stream Job
Batch Job
ML Job
Graph Job
optimizer
optimizer
optimizer
optimizer

If you want to know one thing about Flink
is that you don't need to know
the internals of Flink

Events Time
&
Windows
Fault Tolerance
&
Correctness
State Handling
Low Latency
&
High Throughput
API Libraries SQL
Building Blocks

Events Time
&
Windows
Fault Tolerance
&
Correctness
State Handling
Low Latency
&
High Throughput
API Libraries SQL
Building Blocks
lTime references
lOut of order events
lPowerful Windowing

Event Times & Windowing
Event
Time
Event
Time

Flink
Data Source
Event
Time
Event
Time
Ingestion
Time

Flink
Data Source
Flink
Window Operator
Event
Time
Event
Time
Ingestion
Time
Processing
Time

Event Time: when data is generated
Ingestion time: when data is loaded from source
Processing time: when data is processed
Event time help to process out- of-order events and replay elements as the ocurred (
deterministic results )
Explicit handling of time. 3
choices:

1 2 3 5 7
4 6 8 9 10
Event time. Out or Order

1 2 3 5 7
4 6 8 9 10
Out or Order
1 2 3 5 74 6 8 9 10

1 2 3 5 7
4 6 8 9 10 1 2 3 5 74 6 8 9 104
Ingestion Time WindowsOut or Order
1 2 3 5 74 6 8 9 10

1 2 3 5 7
4 6 8 9 10 1 2 3 5 74 6 8 9 10
1 2 3
4
4 5
6 7 8 9 10
Event Time Windows
1 2 3 5 74 6 8 9 10

1 2 3 5 7
4 6 8 9 10
Event time. Watermarks

1 2 3 5 7
4 6 8 9 10
1 2 3 54 6 8
1 2 3 54 6 8
1 2 3
4
4 5
6 8
Event Time Windows

1 2 3 5 7
4 6 8 9 10
1 2 3 54 6 8 910
1 2 3 54 6 8 910
1 2 3
4
4 5
6 8 9 10
Event Time Windows
Not event time
before 5 will come
Late Time of 2
5

1 2 3 5 7
4 6 8 9 10
1 2 3 5 74 6 8 910
1 2 3 5 74 6 8 910
1 2 3
4
4 5
6 7 8 9 10
Event Time Windows
Not event time
before 10 will
come
Late Time of 2
10

Windowing
Windows: grouping of events according to time, session*, count

Windowing
Powerful built-in windows:

Windowing
Count: number of events to trigger the window. Process X last events each Y events.

Windowing
Time:
lTumbling: trigger every X time with received events
lSliding: trigger every X time with received events in last Y time

Windowing
Time:
Session: all events from session/user X until session time expired ( Gap )

Windowing
Time:
Session: all events from session/user X until session time expired ( Gap )
High level API for user windows: Window Assigner, Trigger, Evictor

Events Time
&
Windows
Fault Tolerance
&
Correctness
State Handling
Low Latency
&
High Throughput
API Libraries SQL
Building Blocks
lManaged operator state for
backup/recovery
lSavepoints

Stateful Streaming
Op
Stateless Stream
Processing

Stateful Streaming
Op Op
State
Stateless Stream
Processing
Stateful Stream
Processing
lBuilt-in internal state in each operator for
exactly-once semantics
lUser state can be declared in each operator to be
saved locally in memory ( API, key/value pars )
lSnapshots: periodically local states
in memory are persisted in lightweight
distributed snapshots. No global pause !!
lCheckpoint as global consistent point-in-time
snapshot build by set of distributed snapshots.
lPluggable state backend for snapshots:
JobManager, HDFS, RocksDB
lSavepoints: user-triggered retained checkpoint

Events Time
&
Windows
Fault Tolerance
&
Correctness
State Handling
Low Latency
&
High Throughput
API Libraries SQL
Building Blocks
lExactly-once semantics with
managed operator state
lDistributed Snapshotting
Algorithm

Periodically
Chandy-Lamport Snapshots
“The global-state-detection algorithm is to be superimposed
on the underlying computation:
It must run concurrently with, but no alter, this underlying
computation”
. Triggers snapshots asynchronously
. Embedded snapshots algorithm in stream of data ( barriers )
. No global pause, lightweight impact in performance
Handling Checkpoints

snapshot
Job Manager
Periodically
pushes
barriers
for new state
New state X+1
Ack for Snapshot state X from Task N

snapshot
Job Manager

snapshot
Job Manager
All Acks received
Register Checkpoint for restore
in case of fail

Streaming Fault Tolerance
In case of fail, last global checkpoint is recovered
( recovery from partial checkpoint / individual snapshots is coming )
Need of stateful source like kafka to ensure end-to-end exactly-once
semantic in case of fail.
Kafka sink doesn't guarantee end-to-end exactly-once ( multiple writes in
topic ) ( at least-once )
Semantics in Flink:
At Least Once: never loses events, events might be reprocessed
Exactly once: neither reprocessed nor lost events.
Exactly once by default, with low impact in performance

Events Time
&
Windows
Fault Tolerance
&
Correctness
State Handling
Low Latency
&
High Throughput
API Libraries SQL
Building Blocks
lPipelined runtime
lLatency vs throughput tunning

Exactly-once semantic with low impact in performance
Controllable checkpointing overhead
Higher throughput using processing time
Performance improvements thanks to:
. operator chaining during optimization phase
. own optimized serialization stack with code generation
Performance
Tunning

Benchmark for “Streaming Computation” published by Yahoo. Dec 18, 2015
https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
Production use-case
lcounting ad impressions group by
campaign
laggregations over a 10 second
window
lsave current aggregate value to Redis
every second
Streaming
Benchmark

Throughput vs Latency Graph
Throughput ( 1000 events / sec )
99 Percentile
Latency ( ms )
Not Operator combinig in Storm, more complicate topology, more steps for events and more overhead

Apache Storm Without Trident
lAt least once / Double counting after fail / Lost state after Failures
lCPU bounded
Apache Spark
lLatency increase with throughput
Apache Flink
lExactly once / No double counting / No state loss
lLimited by bandwidth between Kafka and Flink cluster
l(1 GigE).
lkafka brokers within Kafka Cluster ( 10 GigE )
lAchieved 15 million messages /sec
l( before 3 million m/sec) with exactly once semantic
10,000,000 20,000,000
1 GigE
10 GigE
Performance
Tunning

Events Time
&
Windows
Fault Tolerance
&
Correctness
State Handling
Low Latency
&
High Throughput
API Libraries SQL
Building Blocks
lHigh Level API
lWide range of basic and advanced
operators
lJava , Scala. Python soon !!

API
Working on data streams ( bounded ? )

API
Stream Processing: Explicit Handling of Time

API
Java & Scala. Python coming.
Java: Bean type classes vs Tuples with position addresses.
Scala: case classes.

API
Operators:
Sources: kafka, FileSystem, Cassandra …
Sinks: Kafka, HDFS, Cassandra ….
Transformations:
Basic: map, flatmap, filter, grouping, iterate, project, join, cross, …
Streaming: Windowing + Aggregations, Temporal Binary
Iterative Stream operators

API
Operators:
Sources: kafka, FileSystem, Cassandra …
Sinks: Kafka, HDFS, Cassandra ….
Transformations:
Basic: map, flatmap, filter, grouping, iterate, project, join, cross, …
Streaming: Windowing + Aggregations, Temporal Binary
Iterative Stream operators
DataStream<?> DataSet<?>
Core API
1 implementation*, 2 interfaces

Source Map Reduce
Fliter
Join Sum Sink
Map
Source
Operators

Source Map Reduce
Fliter
Join Sum Sink
Source
Filter
Operators

Source Map Reduce
Fliter
Join Sum Sink
Source
Reduce
Operators

Source Map Reduce
Fliter
Join Sum Sink
Source
Join
Operators

Source Map Reduce
Fliter
Join Sum Sink
Source
Operators

Events Time
&
Windows
Fault Tolerance
&
Correctness
State Handling
Low Latency
&
High Throughput
API Libraries SQL
Building Blocks
lEasy to use. SQL !!
lBased on Apache Calcite

API extension for DataSets y DataStreams
Based on relational Table abstraction
Table <=> Source / DataSet / DataStream
Operators like: where, select, as, groupBy, join, union, minus, distinct, orderBy, ...
Table API

Execute SQL-Like sentences on DataSets and Datastreams
Resuts returned as Table ( Table API ), convertible to DataStream or DataSets
SQL and Table API can be seamlessly mixed over DataStream/DataSets
Flink’s SQL support is not feature complete, yet.
Queries that include unsupported SQL will fail !!
SQL Support
SQL

Parsing and Logical plan for Table operators and SQL are optimized using Apache Calcite
Only supported a Subset of the comprehensive SQL standard
Apache Calcite provides with:
SQL Parsing
API for building expressions in relational algebra
Query planning engine
Provides SQL for Streaming Queries with windows aggregations
SELECT STREAM TUMBLE_END(rowtime, INTERVAL '1' HOUR) AS rowtime, productId, COUNT(*) AS c, SUM(units) AS units
FROM Orders
Apache Calcite
SQL Sentence
Apache Calcite:
SQL to Logical
Plan as Relational Algebra
Flink Optimizer: Logical Plan to
Execution Plan

If you want to know
one thing about Flink
is that you don't need
to know the internals of Flink
So … Batch

Stream: Unbounded Data Stream
Unbounded
Data Stream
Batch on Stream

Batch: Bounded stream ( dataset ) on a stream processor
Global window over the entire dataset
Optimization in operators for joins and grouping,
with blocking data exchange if needed
Unbounded
Data Stream
Bounded
Data Set
Batch on Stream

Batch: Bounded stream ( dataset ) on a stream processor
Global window over the entire dataset
Optimization in operators for joins and grouping,
with blocking data exchange if needed
Batch specific optimizations:
Cost-based optimizer: dataset size known before hand
Manage memory on / off-heap for join, sort, …
Optimization serialization stack for user-types
Bounded
Data Set
Batch on Stream
Unbounded
Data Stream

Conclusions
Flink Pure streaming engine matches real life. No Abstraction

Conclusions
Batch on streaming

Conclusions
Batch on streaming
Flexible Windowing Semantics with Explicit Time handling

Conclusions
Batch on streaming
Competitive Performance, low latency and hight throughput

Conclusions
Batch on streaming
Competitive Performance, low latency and hight throughput
Apache Beam, open sourced by Google, uses Flink as its first order runner for
Batch and Streaming processing in partnership with Data Artisans.
100% Compliance of data processing model “what, where, when, how “

Flink. Pure Streaming

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (14)

Similar to Flink. Pure Streaming

Similar to Flink. Pure Streaming (20)

Recently uploaded

Recently uploaded (20)

Flink. Pure Streaming