Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2FcTsIA.
Stephan Ewen talks about how Apache Flink handles stateful stream processing and how to manage distributed stream processing and data driven applications efficiently with Flink's checkpoints and savepoints. Filmed at qconsf.com.
Stephan Ewen is a PMC member and one of the original creators of Apache Flink, and co-founder and CTO of data Artisans.
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
The Power of Distributed Snapshots in Apache Flink
1. The Power of Snapshots
Stateful Stream Processing
with Apache Flink
Stephan Ewen
QCon San Francisco, 2017
1
2. InfoQ.com: News & Community Site
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
distributed-stream-processing-flink
3. Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
6. What changes faster? Data or Query?
4
Data changes slowly
compared to fast
changing queries
ad-hoc queries, data exploration,
ML training and
(hyper) parameter tuning
Batch Processing
Use Case
Data changes fast
application logic
is long-lived
continuous applications,
data pipelines, standing queries,
anomaly detection, ML evaluation, …
Stream Processing
Use Case
12. Apache Flink in a Nutshell
10
Queries
Applications
Devices
etc.
Database
Stream
File / Object
Storage
Stateful computations over streams
real-time and historic
fast, scalable, fault tolerant, in-memory,
event time, large state, exactly-once
Historic
Data
Streams
Application
13. 11
Event Streams State (Event) Time Snapshots
The Core Building Blocks
real-time and
hindsight
complex
business logic
consistency with
out-of-order data
and late data
forking /
versioning /
time-travel
15. Stateful Event & Stream Processing
13
Scalable embedded state
Access at memory speed &
scales with parallel operators
16. Event time and Processing Time
14
Event Producer Message Queue
Flink
Data Source
Flink
Window Operator
partition 1
partition 2
Event
Time
Ingestion
Time
Processing
Time
Broker
Time
Event time, Watermarks, as in the Dataflow model
17. Powerful Abstractions
15
Process Function (events, state, time)
DataStream API (streams, windows)
Stream SQL / Tables (dynamic tables)
Stream- & Batch
Data Processing
High-level
Analytics API
Stateful Event-
Driven Applications
val stats = stream
.keyBy("sensor")
.timeWindow(Time.seconds(5))
.sum((a, b) -> a.add(b))
def processElement(event: MyEvent, ctx: Context, out: Collector[Result]) = {
// work with event and state
(event, state.value) match { … }
out.collect(…) // emit events
state.update(…) // modify state
// schedule a timer callback
ctx.timerService.registerEventTimeTimer(event.timestamp + 500)
}
Layered abstractions to
navigate simple to complex use cases
30. Savepoints: Opt. for Operability
Self contained: No references to other checkpoints
Canonical format: Switch between state structures
Efficiently re-scalable: Indexed by key group
Future: More self-describing serialization format for to
archiving / versioning (like Avro, Thrift, etc.)
29
31. Checkpoints: Opt. for Efficiency
Often incremental:
• Snapshot only diff from last snapshot
• Reference older snapshots, compaction over time
Format specific to state backend:
• No extra copied or re-encoding
• Not possible to switch to another state backend between checkpoints
Compact serialization: Optimized for speed/space, not long term
archival and evolution
Key goups not indexed: Re-distribution may be more expensive
30
33. What users built on checkpoints
Upgrades and Rollbacks
Cross Datacenter Failover
State Archiving
State Bootstrapping
Application Migration
Spot Instance Region Arbitrage
A/B testing
…
32
35. Transaction coordination for side fx
34
One snapshot can transactionally move
data between different systems
Snapshots may include side effects
36. Transaction coordination for side fx
Similar to a distributed 2-phase commit
Coordinated by asynchronous checkpoints, no voting delays
Basic algorithm:
• Between checkpoints: Produce into transaction or Write Ahead Log
• On operator snapshot: Flush local transaction (vote-to-commit)
• On checkpoint complete: Commit transactions (commit)
• On recovery: check and commit any pending transactions
35
42. Stateful Stream Proc. to the rescue
41
Application
Sensor
APIs
Application
Application
Application
very simple: state is just part
of the application
43. Compute, State, and Storage
42
Classic tiered architecture Streaming architecture
database
layer
compute
layer
application state
+ backup
compute
+
stream storage
and
snapshot storage
(backup)
application state
46. Scaling a Service
45
separately provision additional
database capacity
provision compute
and state together
Classic tiered architecture Streaming architecture
provision
compute
47. Rolling out a new Service
46
provision a new database
(or add capacity to an existing one)
simply occupies some
additional backup space
Classic tiered architecture Streaming architecture
provision compute
and state together
48. Time, Completeness, Out-of-order
47
?
event time clocks
define data completeness
event time timers
handle actions for
out-of-order data
Classic tiered architecture Streaming architecture
50. The Challenges with that:
Upgrades are stateful, need consistency
• application evolution and bug fixes
Migration of application state
• cluster migration, A/B testing
Re-processing and reinstatement
• fix corrupt results, bootstrap new applications
State evolution (schema evolution)
49