2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka for Stateful Streaming Applications"

- STEPHAN EWEN, CO-FOUNDER & CTO, APACHE FLINK PMC
APACHE FLINK AND APACHE KAFKA FOR
STATEFUL STREAMING APPLICATIONS

2
Original creators of
Apache Flink®
dA Platform
Stream Processing for the
Enterprise

What is Apache Flink?
3
Batch Processing
process static and
historic data
Data Stream
Processing
realtime results
from data streams
Event-driven
Applications
data-driven actions
and services
Stateful Computations Over Data Streams

Apache Flink in a Nutshell
4
Queries
Applications
Devices
etc.
Database
Stream
File / Object
Storage
Stateful computations over streams
real-time and historic
fast, scalable, fault tolerant, in-memory,
event time, large state, exactly-once
Historic
Data
Streams
Application

Everything Streams
5
Apache Flink handles everything as streams internally.
Continuous streaming and applications use "unbounded streams".
Batch processing and finite applications use "bounded streams".

Layered abstractions
6
Process Function (events, state, time)
DataStream API (streams, windows)
Stream SQL / Tables (dynamic tables)
Stream- & Batch
Data Processing
High-level
Analytics API
Stateful Event-
Driven Applications
val stats = stream
.keyBy("sensor")
.timeWindow(Time.seconds(5))
.sum((a, b) -> a.add(b))
def processElement(event: MyEvent, ctx: Context, out: Collector[Result]) = {
// work with event and state
(event, state.value) match { … }
out.collect(…) // emit events
state.update(…) // modify state
// schedule a timer callback
ctx.timerService.registerEventTimeTimer(event.timestamp + 500)
}
Navigate simple to complex use cases

DataStream API
7
Source
Transformation
Windowed Transformation
Sink
val lines: DataStream[String] = env.addSource(new FlinkKafkaConsumer011(…))
val events: DataStream[Event] = lines.map((line) => parse(line))
val stats: DataStream[Statistic] = stream
.keyBy("sensor")
.timeWindow(Time.seconds(5))
.sum(new MyAggregationFunction())
stats.addSink(new RollingSink(path))
Streaming
Dataflow
Source Transform Window
(state read/write)
Sink

High Level: SQL (ANSI)
9
SELECT
campaign,
TUMBLE_START(clickTime, INTERVAL ’1’ HOUR),
COUNT(ip) AS clickCnt
FROM adClicks
WHERE clickTime > ‘2017-01-01’
GROUP BY campaign, TUMBLE(clickTime, INTERVAL ‘1’ HOUR)
Query
past futurenowstart of
the stream

10
How Large (or Small)
can Flink get?

11
Blink is Alibaba's
Flink-based System

12
Keystone Routing Pipeline at Netflix
(as presented at Flink Forward San Francisco, 2018)

Small Flink
 Can run in single process
 Some users run it on IoT Gateways
 Also runs with zero dependencies in IDE
13

14
Checkpoints instead
of Transactions

Event Sourcing + Memory Image
15
event log
persists events
(temporarily)
event /
command
Process
main memory
update local
variables/structures
periodically snapshot
the memory

Event Sourcing + Memory Image
16
Recovery: Restore snapshot and replay events
since snapshot
event log
persists events
(temporarily)
Process

Consistent Distributed Snapshots
17

Checkpoints for Recovery
18
Re-load state
Reset positions
in input streams
Rolling back computation
Re-processing

Why Checkpoints?
 No barriers / boundaries  low latency
 No intermediate stream/state replication necessary
• High throughput
• Shuffles are very cheap! No load on brokers.
 Handles very large state well (TBs)
 Supports fast batch processing
 Supports flexibly types of states and timers
19

Localized State Recovery (Flink 1.5)
21
Piggybags on internal multi-version
data structures:
• LSM Tree (RocksDB)
• MV Hashtable (Fs / Mem State Backend)
Setup:
• 500 MB state per node
• Checkpoints to S3
• Soft failure (Flink fails, machine survives)

Checkpoints for Program Evolution
22
Restore to different
programs
Bugfixes, Upgrades, A/B testing, etc

State Archiving Through Savepoints
23
time

Replay from Savepoints to Drill Down
24
time
Incident of Interest
"Debug Job"
(modified version of original Job)
Filter
(events of interest only)
Extra sink for
trace output

Pause / Resume style execution
25
time
Bursty Event Stream (events only at only end-of-day )

Pause / Resume style execution
26
time
Bursty Event Stream (events only at only end-of-day )
Checkpoint / Savepoint
Store

27
Flink and Kafka
Integration

Flink Kafka Reader
 Supports version 0.8 – 0.11/1.0
 Exactly-once semantics
• Flink checkpoints manage offsets
• Can optionally participate in reader groups offset committing
 Topic and partition discovery
 Multiple topics at the same time
 Per-partition watermarking
28

Flink Kafka Writer
 Supports version 0.8 – 0.11/1.0
 Exactly-once via Kafka Transactions (0.11+)
• Details in a later sections
 Supports partitioners, timestamps, and the usual
29

Transaction Coordination
 Similar to a distributed 2-phase/3-phase commit
 Coordinated by asynchronous checkpoints
==> non-blocking, no voting delays
 Basic algorithm:
• Between checkpoints: Produce into transaction or Write Ahead Log
• On operator snapshot: Flush local transaction (vote-to-commit)
• On checkpoint complete: Commit transactions (commit)
• On recovery: check and commit any pending transactions
30

Exactly-once via Transactions
31
chk-1 chk-2
TXN-1
✔chk-1 ✔chk-
2
TXN-2
✘
TXN-3
✔ global ✔ global

Transaction fails after local snapshot
32
chk-1 chk-2
TXN-1
✔chk-1
TXN-2
✘
TXN-3
✔ global

Transaction fails before commit…
33
chk-1 chk-2
TXN-1
✔chk-1
TXN-2
✘
TXN-3
✔ global ✔ global

… commit on recovery
34
chk-2
TXN-2 TXN-3
✔ global
recover
TXN handle
chk-3

35
Some interesting
Flink and Kafka
Use Cases

Flink and Kafka
36
Application
Sensor
APIs
Application
Application
Application

Data Stream Parsing & Routing
37
Steven Wu / Netflix - "Scaling Flink in Cloud"

Machine Learning Pipelines
38
Dave Torok & Sameer Wadkar (Comcast)
"Embedding Flink Throughout an Operationalized Streaming ML Lifecycle"

Machine Learning Pipelines
39
Xingzhong Xu / Uber
"Scaling Uber’s Realtime Optimization with Apache Flink"

Real-time and Historic Data
40
Kafka
HDFS, S3, GCS,
SAN, NAS, NFS, ECS,
Swift, Ceph, …
The anatomy of a data stream

Combining S3 and Kafka/Kinesis data
41
Gregory Fee / Lyft - "Bootstrapping State In Apache Flink"

Event Sourcing CQRS Applictions
42
Aris Koliopoulos
"Drivetribe's Kappa Architecture
with Apache Flink"

Sophisticated Time Semantics
43
Erik de Nooij / ING
"StreamING models, how ING adds models at runtime to catch fraudsters"
Low-latency event-time joins/aggregations

Flink 1.5
 Big change to process model
• Better support for Framework and Library modes
• Dynamic resource acquisition
• All communication with clients is REST
 Special network protocol to speed up checkpoint alignments
 Lower shuffle latency with same throughput
 Faster recovery of large state
 Managed broadcast state
 Interactive SQL Client (beta)
 … much more …
47

2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka for Stateful Streaming Applications"

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka for Stateful Streaming Applications"

Similar to 2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka for Stateful Streaming Applications" (20)

More from Ververica

More from Ververica (13)

Recently uploaded

Recently uploaded (20)

2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka for Stateful Streaming Applications"

Editor's Notes