SlideShare a Scribd company logo
1 of 62
Download to read offline
1
Kostas Tzoumas
@kostas_tzoumas
Flink London Meetup
November 3, 2016
Apache Flink®: State of the Union and
What's Next
2
Kostas Tzoumas
@kostas_tzoumas
Flink London Meetup
November 3, 2016
Debunking Six Common Myths in
Stream Processing
3
Original creators of Apache
Flink®
Providers of the
dA Platform, a supported
Flink distribution
Outline
 What is data streaming
 Myth 1: The Lambda architecture
 Myth 2: The throughput/latency tradeoff
 Myth 3: Exactly once not possible
 Myth 4: Streaming is for (near) real-time
 Myth 5: Batching and buffering
 Myth 6: Streaming is hard
4
The streaming architecture
5
6
Reconsideration of data architecture
 Better app isolation
 More real-time reaction to events
 Robust continuous applications
 Process both real-time and historical data
7
app state
app state
app state
event log
Query
service
What is (distributed) streaming
 Computations on never-
ending “streams” of data
records (“events”)
 A stream processor
distributes the
computation in a cluster
8
Your
code
Your
code
Your
code
Your
code
What is stateful streaming
 Computation and state
• E.g., counters, windows of past
events, state machines, trained ML
models
 Result depends on history of
stream
 A stateful stream processor gives
the tools to manage state
• Recover, roll back, version,
upgrade, etc
9
Your
code
state
What is event-time streaming
 Data records associated with
timestamps (time series data)
 Processing depends on timestamps
 An event-time stream processor gives
you the tools to reason about time
• E.g., handle streams that are out of
order
• Core feature is watermarks – a clock
to measure event time
10
Your
code
state
t3 t1 t2t4 t1-t2 t3-t4
What is streaming
 Continuous processing on data that is
continuously generated
 I.e., pretty much all “big” data
 It’s all about state and time
11
12
Myth 1: The Lambda architecture
13
Myth variations
 Stream processing is approximate
 Stream processing is for transient data
 Stream processing cannot handle high data
volume
 Hence, stream processing needs to be
coupled with batch processing
14
Lambda architecture
15
file 1
file 2
Job 1
Job 2
Scheduler
Streaming job
Serve&
store
Lambda no longer needed
 Lambda was useful in the first days of stream
processing (beginning of Apache Storm)
 Not any more
• Stream processors can handle very large volumes
• Stream processors can compute accurate results
 Good news is I don’t hear Lambda so often
anymore
16
Myth 2: Throughput/latency
tradeoff
17
Myth flavors
 Low latency systems cannot support high
throughput
 In general, you need to trade off one for the
other
 There is a “high throughput” category and a
“low-latency” category (naming varies)
18
Physical limits
 Most stream processing pipelines are
network bottlenecked
 The network dictates both (1) what is the
latency and (2) what is the throughput
 A well-engineered system achieves the
physical limits allowed by the network
19
Buffering
 It is natural to handle many records together
• All software and hardware systems do that
• E.g., network bundles bytes into frames
 Every streaming system buffers records for
performance (Flink certainly does)
• You don’t want to send single records over the
network
• "Record-at-a-time" does not exist at the physical level
20
Buffering (2)
 Buffering is a performance optimization
• Should be opaque to the user
• Should not dictate system behavior in any other
way
• Should not impose artificial boundaries
• Should not limit what you can do with the system
• Etc...
21
Some numbers
22
Some more
23
TeraSort
Relational Join
Classic Batch Jobs
Graph
Processing
Linear
Algebra
Myth 3: Exactly once not possible
24
What is “exactly once”
 Under failures, system computes result as if there
was no failure
 In contrast to:
• At most once: no guarantees
• At least once: duplicates possible
 Exactly once state versus exactly once delivery
25
Myth variations
 Exactly once is not possible in nature
 Exactly once is not possible end-to-end
 Exactly once is not needed
 You need to trade off performance for exactly once
(Usually perpetuated by folks until they implement
exactly once )
26
Transactions
 “Exactly once” is transactions: either all
actions succeed or none succeed
 Transactions are possible
 Transactions are useful
 Let’s not start eventual consistency all over
again…
27
Flink checkpoints
 Periodic asynchronous consistent snapshots of
application state
 Provide exactly-once state guarantees under failures
28
9/2/2016 stream_barriers.svg
checkpoint
barrier n­1
data stream
stream record
(event)
checkpoint
barrier n
newer records
part of
checkpoint n­1
part of
checkpoint n
part of
checkpoint n+1
older records
End-to-end exactly once
 Checkpoints double as transaction coordination mechanism
 Source and sink operators can take part in checkpoints
 Exactly once internally, "effectively once" end to end: e.g.,
Flink + Cassandra with idempotent updates
29
transactional sinks
State management
 Checkpoints triple as state
versioning mechanism
(savepoints)
 Go back and forth in time while
maintaining state consistency
 Ease code upgrades (Flink or
app), maintenance, migration,
and debugging, what-if
simulations, A/B tests
30
Myth 4: Streaming = real time
31
Myth variations
 I don’t have low latency applications hence I
don’t need stream processing
 Stream processing is only relevant for data
before storing them
 We need a batch processor to do heavy
offline computations
32
Low latency and high latency streams
33
2016-3-1
12:00 am
2016-3-1
1:00 am
2016-3-1
2:00 am
2016-3-11
11:00pm
2016-3-12
12:00am
2016-3-12
1:00am
2016-3-11
10:00pm
2016-3-12
2:00am
2016-3-12
3:00am…
partition
partition
Stream (low latency)
Batch
(bounded stream)
Stream (high latency)
Robust continuous applications
34
Accurate computation
 Batch processing is not an accurate
computation model for continuous data
• Misses the right concepts and primitives
• Time handling, state across batch boundaries
 Stateful stream processing a better model
• Real-time/low-latency is the icing on the cake
35
Myth 5: Batching and buffering
36
Myth variations
 There is a "mini-batch" category between
batch and streaming
 “Record-at-a-time” versus “mini-batching” or
similar "choices"
 Mini-batch systems can get better throughput
37
Myth variations (2)
 The difference between mini-batching and
streaming is latency
 I don’t need low latency hence I need mini-
batching
 I have a mini-batching use case
38
We have answered this already
 Can get throughput and latency (myth #2)
• Every system buffers data, from the network to
the OS to Flink
 Streaming is a model, not just fast (myth #4)
• Time and state
• Low latency is the icing on the cake
39
Continuous operation
 Data is continuously produced
 Computation should track data production
• With dynamic scaling, pause-and-resume
 Restarting our pipelines every second is not a
great idea, and not just for latency reasons
40
Myth 6: Streaming is hard
41
Myth variations
 Streaming is hard to learn
 Streaming is hard to reason about
 Windows? Event time? Triggers? Oh, my!!
 Streaming needs to be coupled with batch
 I know batch already
42
It's about your data and code
 What's the form of your data?
• Unbounded (e.g., clicks, sensors, logs), or
• Bounded (e.g., ???*)
 What changes more often?
• My code changes faster than my data
• My data changes faster than my code
43
* Please help me find a great example of naturally static data
It's about your data and code
 If your data changes faster than your code
you have a streaming problem
• You may be solving it with hourly batch jobs
depending on someone else to create the
hourly batches
• You are probably living with inaccurate results
without knowing it
44
It's about your data and code
 If your code changes faster than your data
you have an exploration problem
• Using notebooks or other tools for quick data
exploration is a good idea
• Once your code stabilizes you will have a
streaming problem, so you might as well think
of it as such from the beginning
45
Flink in the real world
46
Flink community
 > 240 contributors, 95 contributors in Flink 1.1
 42 meetups around the world with > 15,000 members
 2x-3x growth in 2015, similar in 2016
47
Powered by Flink
48
Zalando, one of the largest ecommerce
companies in Europe, uses Flink for real-
time business process monitoring.
King, the creators of Candy Crush Saga,
uses Flink to provide data science teams
with real-time analytics.
Bouygues Telecom uses Flink for real-time
event processing over billions of Kafka
messages per day.
Alibaba, the world's largest retailer, built a
Flink-based system (Blink) to optimize
search rankings in real time.
See more at flink.apache.org/poweredby.html
30 Flink applications in production for more than one
year. 10 billion events (2TB) processed daily
Complex jobs of > 30 operators running 24/7,
processing 30 billion events daily, maintaining state
of 100s of GB with exactly-once guarantees
Largest job has > 20 operators, runs on > 5000
vCores in 1000-node cluster, processes millions of
events per second
49
50
Flink Forward 2016
Current work in Flink
52
Flink's unique combination of features
53
Low latency
High Throughput
Well-behaved
flow control
(back pressure)
Consistency
Works on real-time
and historic data
Performance Event Time
APIs
Libraries
Stateful
Streaming
Savepoints
(replays, A/B testing,
upgrades, versioning)
Exactly-once semantics
for fault tolerance
Windows &
user-defined state
Flexible windows
(time, count, session, roll-your own)
Complex Event Processing
Fluent API
Out-of-order events
Fast and large
out-of-core state
Flink 1.1
54
Connectors
Metric
System
(Stream) SQL Session
Windows
Library
enhancements
Flink 1.1 + ongoing development
55
Connectors
Session
Windows
(Stream) SQL
Library
enhancements
Metric
System
Metrics &
Visualization
Dynamic Scaling
Savepoint
compatibility Checkpoints
to savepoints
More connectors Stream SQL
Windows
Large state
Maintenance
Fine grained
recovery
Side in-/outputs
Window DSL
Security
Mesos &
others
Dynamic Resource
Management
Authentication
Queryable State
Flink 1.1 + ongoing development
56
Connectors
Session
Windows
(Stream) SQL
Library
enhancements
Metric
System
Operations
Ecosystem
Application
Features
Metrics &
Visualization
Dynamic Scaling
Savepoint
compatibility Checkpoints
to savepoints
More connectors Stream SQL
Windows
Large state
Maintenance
Fine grained
recovery
Side in-/outputs
Window DSL
Broader
Audience
Security
Mesos &
others
Dynamic Resource
Management
Authentication
Queryable State
A longer-term vision for Flink
57
Streaming use cases
Application
(Near) real-time apps
Continuous apps
Analytics on historical
data
Request/response apps
Technology
Low-latency streaming
High-latency streaming
Batch as special case of
streaming
Large queryable state
58
Request/response applications
 Queryable state: query Flink state directly instead
of pushing results in a database
 Large state support and query API coming in Flink
59
queries
In summary
 The need for streaming comes from a rethinking of data infra
architecture
• Stream processing then just becomes natural
 Debunking 5 myths
• Myth 1: The Lambda architecture
• Myth 2: The throughput/latency tradeoff
• Myth 3: Exactly once not possible
• Myth 4: Streaming is for (near) real-time
• Myth 5: Batching and buffering
• Myth 6: Streaming is hard
60
6
Thank you!
@kostas_tzoumas
@ApacheFlink
@dataArtisans
We are hiring!
data-artisans.com/careers

More Related Content

What's hot

Stateful Distributed Stream Processing
Stateful Distributed Stream ProcessingStateful Distributed Stream Processing
Stateful Distributed Stream ProcessingGyula Fóra
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Apache Flink Taiwan User Group
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkFabian Hueske
 
Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Kostas Tzoumas
 
Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonStephan Ewen
 
Flink Streaming @BudapestData
Flink Streaming @BudapestDataFlink Streaming @BudapestData
Flink Streaming @BudapestDataGyula Fóra
 
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Flink Forward
 
Tech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HATech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HAParis Carbone
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapKostas Tzoumas
 
Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016Till Rohrmann
 
Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?Till Rohrmann
 
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on FlinkTran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on FlinkFlink Forward
 
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...ucelebi
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkDataWorks Summit
 
Marton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingMarton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingFlink Forward
 
Aljoscha Krettek – Notions of Time
Aljoscha Krettek – Notions of TimeAljoscha Krettek – Notions of Time
Aljoscha Krettek – Notions of TimeFlink Forward
 
Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Stephan Ewen
 
Flink Streaming Berlin Meetup
Flink Streaming Berlin MeetupFlink Streaming Berlin Meetup
Flink Streaming Berlin MeetupMárton Balassi
 
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...Ververica
 
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...Flink Forward
 

What's hot (20)

Stateful Distributed Stream Processing
Stateful Distributed Stream ProcessingStateful Distributed Stream Processing
Stateful Distributed Stream Processing
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016
 
Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World London
 
Flink Streaming @BudapestData
Flink Streaming @BudapestDataFlink Streaming @BudapestData
Flink Streaming @BudapestData
 
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
 
Tech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HATech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HA
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
 
Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016
 
Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?
 
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on FlinkTran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
 
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
 
Marton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingMarton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream Processing
 
Aljoscha Krettek – Notions of Time
Aljoscha Krettek – Notions of TimeAljoscha Krettek – Notions of Time
Aljoscha Krettek – Notions of Time
 
Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016
 
Flink Streaming Berlin Meetup
Flink Streaming Berlin MeetupFlink Streaming Berlin Meetup
Flink Streaming Berlin Meetup
 
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
 
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
 

Similar to Debunking Six Common Myths in Stream Processing

Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®Ververica
 
Counting Elements in Streams
Counting Elements in StreamsCounting Elements in Streams
Counting Elements in StreamsJamie Grier
 
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...Ververica
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorAljoscha Krettek
 
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's NextKostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's NextVerverica
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
 
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Big Data Spain
 
Devoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basDevoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basFlorent Ramiere
 
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Flink Forward
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Kai Wähner
 
Jay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, Confluent
Jay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, ConfluentJay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, Confluent
Jay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, Confluentconfluent
 
Tale of two streaming frameworks- Apace Storm & Apache Flink
Tale of two streaming frameworks- Apace Storm & Apache FlinkTale of two streaming frameworks- Apace Storm & Apache Flink
Tale of two streaming frameworks- Apace Storm & Apache FlinkKarthik Deivasigamani
 
Tale of two streaming frameworks (Karthik D - Walmart)
Tale of two streaming frameworks (Karthik D - Walmart)Tale of two streaming frameworks (Karthik D - Walmart)
Tale of two streaming frameworks (Karthik D - Walmart)KafkaZone
 
Data analytics at scale implementing stateful stream processing - publish
Data analytics at scale implementing stateful stream processing - publishData analytics at scale implementing stateful stream processing - publish
Data analytics at scale implementing stateful stream processing - publishCodeValue
 
QCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkQCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkRobert Metzger
 
GOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache FlinkGOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache FlinkRobert Metzger
 
Robust stream processing with Apache Flink
Robust stream processing with Apache FlinkRobust stream processing with Apache Flink
Robust stream processing with Apache FlinkAljoscha Krettek
 

Similar to Debunking Six Common Myths in Stream Processing (20)

Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 
Counting Elements in Streams
Counting Elements in StreamsCounting Elements in Streams
Counting Elements in Streams
 
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream Processor
 
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's NextKostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
 
Kafka Streams
Kafka StreamsKafka Streams
Kafka Streams
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
 
Devoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basDevoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en bas
 
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
 
Jay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, Confluent
Jay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, ConfluentJay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, Confluent
Jay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, Confluent
 
Tale of two streaming frameworks- Apace Storm & Apache Flink
Tale of two streaming frameworks- Apace Storm & Apache FlinkTale of two streaming frameworks- Apace Storm & Apache Flink
Tale of two streaming frameworks- Apace Storm & Apache Flink
 
Tale of two streaming frameworks (Karthik D - Walmart)
Tale of two streaming frameworks (Karthik D - Walmart)Tale of two streaming frameworks (Karthik D - Walmart)
Tale of two streaming frameworks (Karthik D - Walmart)
 
Data analytics at scale implementing stateful stream processing - publish
Data analytics at scale implementing stateful stream processing - publishData analytics at scale implementing stateful stream processing - publish
Data analytics at scale implementing stateful stream processing - publish
 
QCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkQCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache Flink
 
GOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache FlinkGOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache Flink
 
Robust stream processing with Apache Flink
Robust stream processing with Apache FlinkRobust stream processing with Apache Flink
Robust stream processing with Apache Flink
 

Recently uploaded

Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Neo4j
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadIvo Andreev
 
20240330_고급진 코드를 위한 exception 다루기
20240330_고급진 코드를 위한 exception 다루기20240330_고급진 코드를 위한 exception 다루기
20240330_고급진 코드를 위한 exception 다루기Chiwon Song
 
About .NET 8 and a first glimpse into .NET9
About .NET 8 and a first glimpse into .NET9About .NET 8 and a first glimpse into .NET9
About .NET 8 and a first glimpse into .NET9Jürgen Gutsch
 
Mastering Kubernetes - Basics and Advanced Concepts using Example Project
Mastering Kubernetes - Basics and Advanced Concepts using Example ProjectMastering Kubernetes - Basics and Advanced Concepts using Example Project
Mastering Kubernetes - Basics and Advanced Concepts using Example Projectwajrcs
 
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine HarmonyLeveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmonyelliciumsolutionspun
 
Why Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfWhy Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfBrain Inventory
 
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorOpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorShane Coughlan
 
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfTobias Schneck
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies
 
Top Software Development Trends in 2024
Top Software Development Trends in  2024Top Software Development Trends in  2024
Top Software Development Trends in 2024Mind IT Systems
 
Watermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesWatermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesShyamsundar Das
 
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Jaydeep Chhasatia
 
Deep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampDeep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampVICTOR MAESTRE RAMIREZ
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLAlluxio, Inc.
 
online pdf editor software solutions.pdf
online pdf editor software solutions.pdfonline pdf editor software solutions.pdf
online pdf editor software solutions.pdfMeon Technology
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIIvo Andreev
 

Recently uploaded (20)

Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and Bad
 
20240330_고급진 코드를 위한 exception 다루기
20240330_고급진 코드를 위한 exception 다루기20240330_고급진 코드를 위한 exception 다루기
20240330_고급진 코드를 위한 exception 다루기
 
About .NET 8 and a first glimpse into .NET9
About .NET 8 and a first glimpse into .NET9About .NET 8 and a first glimpse into .NET9
About .NET 8 and a first glimpse into .NET9
 
Mastering Kubernetes - Basics and Advanced Concepts using Example Project
Mastering Kubernetes - Basics and Advanced Concepts using Example ProjectMastering Kubernetes - Basics and Advanced Concepts using Example Project
Mastering Kubernetes - Basics and Advanced Concepts using Example Project
 
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine HarmonyLeveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
 
Why Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfWhy Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdf
 
Program with GUTs
Program with GUTsProgram with GUTs
Program with GUTs
 
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorOpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS Calculator
 
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Sustainable Web Design - Claire Thornewill
Sustainable Web Design - Claire ThornewillSustainable Web Design - Claire Thornewill
Sustainable Web Design - Claire Thornewill
 
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in Trivandrum
 
Top Software Development Trends in 2024
Top Software Development Trends in  2024Top Software Development Trends in  2024
Top Software Development Trends in 2024
 
Watermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesWatermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security Challenges
 
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
 
Deep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampDeep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - Datacamp
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
online pdf editor software solutions.pdf
online pdf editor software solutions.pdfonline pdf editor software solutions.pdf
online pdf editor software solutions.pdf
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AI
 

Debunking Six Common Myths in Stream Processing

  • 1. 1 Kostas Tzoumas @kostas_tzoumas Flink London Meetup November 3, 2016 Apache Flink®: State of the Union and What's Next
  • 2. 2 Kostas Tzoumas @kostas_tzoumas Flink London Meetup November 3, 2016 Debunking Six Common Myths in Stream Processing
  • 3. 3 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution
  • 4. Outline  What is data streaming  Myth 1: The Lambda architecture  Myth 2: The throughput/latency tradeoff  Myth 3: Exactly once not possible  Myth 4: Streaming is for (near) real-time  Myth 5: Batching and buffering  Myth 6: Streaming is hard 4
  • 6. 6 Reconsideration of data architecture  Better app isolation  More real-time reaction to events  Robust continuous applications  Process both real-time and historical data
  • 7. 7 app state app state app state event log Query service
  • 8. What is (distributed) streaming  Computations on never- ending “streams” of data records (“events”)  A stream processor distributes the computation in a cluster 8 Your code Your code Your code Your code
  • 9. What is stateful streaming  Computation and state • E.g., counters, windows of past events, state machines, trained ML models  Result depends on history of stream  A stateful stream processor gives the tools to manage state • Recover, roll back, version, upgrade, etc 9 Your code state
  • 10. What is event-time streaming  Data records associated with timestamps (time series data)  Processing depends on timestamps  An event-time stream processor gives you the tools to reason about time • E.g., handle streams that are out of order • Core feature is watermarks – a clock to measure event time 10 Your code state t3 t1 t2t4 t1-t2 t3-t4
  • 11. What is streaming  Continuous processing on data that is continuously generated  I.e., pretty much all “big” data  It’s all about state and time 11
  • 12. 12
  • 13. Myth 1: The Lambda architecture 13
  • 14. Myth variations  Stream processing is approximate  Stream processing is for transient data  Stream processing cannot handle high data volume  Hence, stream processing needs to be coupled with batch processing 14
  • 15. Lambda architecture 15 file 1 file 2 Job 1 Job 2 Scheduler Streaming job Serve& store
  • 16. Lambda no longer needed  Lambda was useful in the first days of stream processing (beginning of Apache Storm)  Not any more • Stream processors can handle very large volumes • Stream processors can compute accurate results  Good news is I don’t hear Lambda so often anymore 16
  • 18. Myth flavors  Low latency systems cannot support high throughput  In general, you need to trade off one for the other  There is a “high throughput” category and a “low-latency” category (naming varies) 18
  • 19. Physical limits  Most stream processing pipelines are network bottlenecked  The network dictates both (1) what is the latency and (2) what is the throughput  A well-engineered system achieves the physical limits allowed by the network 19
  • 20. Buffering  It is natural to handle many records together • All software and hardware systems do that • E.g., network bundles bytes into frames  Every streaming system buffers records for performance (Flink certainly does) • You don’t want to send single records over the network • "Record-at-a-time" does not exist at the physical level 20
  • 21. Buffering (2)  Buffering is a performance optimization • Should be opaque to the user • Should not dictate system behavior in any other way • Should not impose artificial boundaries • Should not limit what you can do with the system • Etc... 21
  • 23. Some more 23 TeraSort Relational Join Classic Batch Jobs Graph Processing Linear Algebra
  • 24. Myth 3: Exactly once not possible 24
  • 25. What is “exactly once”  Under failures, system computes result as if there was no failure  In contrast to: • At most once: no guarantees • At least once: duplicates possible  Exactly once state versus exactly once delivery 25
  • 26. Myth variations  Exactly once is not possible in nature  Exactly once is not possible end-to-end  Exactly once is not needed  You need to trade off performance for exactly once (Usually perpetuated by folks until they implement exactly once ) 26
  • 27. Transactions  “Exactly once” is transactions: either all actions succeed or none succeed  Transactions are possible  Transactions are useful  Let’s not start eventual consistency all over again… 27
  • 28. Flink checkpoints  Periodic asynchronous consistent snapshots of application state  Provide exactly-once state guarantees under failures 28 9/2/2016 stream_barriers.svg checkpoint barrier n­1 data stream stream record (event) checkpoint barrier n newer records part of checkpoint n­1 part of checkpoint n part of checkpoint n+1 older records
  • 29. End-to-end exactly once  Checkpoints double as transaction coordination mechanism  Source and sink operators can take part in checkpoints  Exactly once internally, "effectively once" end to end: e.g., Flink + Cassandra with idempotent updates 29 transactional sinks
  • 30. State management  Checkpoints triple as state versioning mechanism (savepoints)  Go back and forth in time while maintaining state consistency  Ease code upgrades (Flink or app), maintenance, migration, and debugging, what-if simulations, A/B tests 30
  • 31. Myth 4: Streaming = real time 31
  • 32. Myth variations  I don’t have low latency applications hence I don’t need stream processing  Stream processing is only relevant for data before storing them  We need a batch processor to do heavy offline computations 32
  • 33. Low latency and high latency streams 33 2016-3-1 12:00 am 2016-3-1 1:00 am 2016-3-1 2:00 am 2016-3-11 11:00pm 2016-3-12 12:00am 2016-3-12 1:00am 2016-3-11 10:00pm 2016-3-12 2:00am 2016-3-12 3:00am… partition partition Stream (low latency) Batch (bounded stream) Stream (high latency)
  • 35. Accurate computation  Batch processing is not an accurate computation model for continuous data • Misses the right concepts and primitives • Time handling, state across batch boundaries  Stateful stream processing a better model • Real-time/low-latency is the icing on the cake 35
  • 36. Myth 5: Batching and buffering 36
  • 37. Myth variations  There is a "mini-batch" category between batch and streaming  “Record-at-a-time” versus “mini-batching” or similar "choices"  Mini-batch systems can get better throughput 37
  • 38. Myth variations (2)  The difference between mini-batching and streaming is latency  I don’t need low latency hence I need mini- batching  I have a mini-batching use case 38
  • 39. We have answered this already  Can get throughput and latency (myth #2) • Every system buffers data, from the network to the OS to Flink  Streaming is a model, not just fast (myth #4) • Time and state • Low latency is the icing on the cake 39
  • 40. Continuous operation  Data is continuously produced  Computation should track data production • With dynamic scaling, pause-and-resume  Restarting our pipelines every second is not a great idea, and not just for latency reasons 40
  • 41. Myth 6: Streaming is hard 41
  • 42. Myth variations  Streaming is hard to learn  Streaming is hard to reason about  Windows? Event time? Triggers? Oh, my!!  Streaming needs to be coupled with batch  I know batch already 42
  • 43. It's about your data and code  What's the form of your data? • Unbounded (e.g., clicks, sensors, logs), or • Bounded (e.g., ???*)  What changes more often? • My code changes faster than my data • My data changes faster than my code 43 * Please help me find a great example of naturally static data
  • 44. It's about your data and code  If your data changes faster than your code you have a streaming problem • You may be solving it with hourly batch jobs depending on someone else to create the hourly batches • You are probably living with inaccurate results without knowing it 44
  • 45. It's about your data and code  If your code changes faster than your data you have an exploration problem • Using notebooks or other tools for quick data exploration is a good idea • Once your code stabilizes you will have a streaming problem, so you might as well think of it as such from the beginning 45
  • 46. Flink in the real world 46
  • 47. Flink community  > 240 contributors, 95 contributors in Flink 1.1  42 meetups around the world with > 15,000 members  2x-3x growth in 2015, similar in 2016 47
  • 48. Powered by Flink 48 Zalando, one of the largest ecommerce companies in Europe, uses Flink for real- time business process monitoring. King, the creators of Candy Crush Saga, uses Flink to provide data science teams with real-time analytics. Bouygues Telecom uses Flink for real-time event processing over billions of Kafka messages per day. Alibaba, the world's largest retailer, built a Flink-based system (Blink) to optimize search rankings in real time. See more at flink.apache.org/poweredby.html
  • 49. 30 Flink applications in production for more than one year. 10 billion events (2TB) processed daily Complex jobs of > 30 operators running 24/7, processing 30 billion events daily, maintaining state of 100s of GB with exactly-once guarantees Largest job has > 20 operators, runs on > 5000 vCores in 1000-node cluster, processes millions of events per second 49
  • 50. 50
  • 52. Current work in Flink 52
  • 53. Flink's unique combination of features 53 Low latency High Throughput Well-behaved flow control (back pressure) Consistency Works on real-time and historic data Performance Event Time APIs Libraries Stateful Streaming Savepoints (replays, A/B testing, upgrades, versioning) Exactly-once semantics for fault tolerance Windows & user-defined state Flexible windows (time, count, session, roll-your own) Complex Event Processing Fluent API Out-of-order events Fast and large out-of-core state
  • 54. Flink 1.1 54 Connectors Metric System (Stream) SQL Session Windows Library enhancements
  • 55. Flink 1.1 + ongoing development 55 Connectors Session Windows (Stream) SQL Library enhancements Metric System Metrics & Visualization Dynamic Scaling Savepoint compatibility Checkpoints to savepoints More connectors Stream SQL Windows Large state Maintenance Fine grained recovery Side in-/outputs Window DSL Security Mesos & others Dynamic Resource Management Authentication Queryable State
  • 56. Flink 1.1 + ongoing development 56 Connectors Session Windows (Stream) SQL Library enhancements Metric System Operations Ecosystem Application Features Metrics & Visualization Dynamic Scaling Savepoint compatibility Checkpoints to savepoints More connectors Stream SQL Windows Large state Maintenance Fine grained recovery Side in-/outputs Window DSL Broader Audience Security Mesos & others Dynamic Resource Management Authentication Queryable State
  • 57. A longer-term vision for Flink 57
  • 58. Streaming use cases Application (Near) real-time apps Continuous apps Analytics on historical data Request/response apps Technology Low-latency streaming High-latency streaming Batch as special case of streaming Large queryable state 58
  • 59. Request/response applications  Queryable state: query Flink state directly instead of pushing results in a database  Large state support and query API coming in Flink 59 queries
  • 60. In summary  The need for streaming comes from a rethinking of data infra architecture • Stream processing then just becomes natural  Debunking 5 myths • Myth 1: The Lambda architecture • Myth 2: The throughput/latency tradeoff • Myth 3: Exactly once not possible • Myth 4: Streaming is for (near) real-time • Myth 5: Batching and buffering • Myth 6: Streaming is hard 60