SlideShare a Scribd company logo
Aljoscha Krettek, engineering manager & co-founder
Till Rohrmann, engineering manager & co-founder
The Past, Present, and
Future of Apache Flink®
© 2018 data Artisans2
Past
© 2018 data Artisans3
It all started in 2014
2009 - 2014 since 2014
● Batch processor on top of streaming runtime
● First Apache Flink 0.6.0 release August 2014
© 2018 data Artisans4
August 2014
Batch processing
© 2018 data Artisans5
Flink learns to stream in real time
DataStream API
Stream Processing
DataSet API
Batch Processing
Runtime
Distributed Streaming Data Flow
© 2018 data Artisans6
● Continuous & real-time
November 2014
Batch processing Stream processing
© 2018 data Artisans7
Flink learns to remember
© 2018 data Artisans8
Flink learns to remember
© 2018 data Artisans9
Flink learns to remember
© 2018 data Artisans10
Flink learns to remember
Remember where we left off
© 2018 data Artisans11
● Continuous & real-time
● Stateful & exactly once
June 2015
Batch processing Stream processing
© 2018 data Artisans12
Latency vs. Throughput?
high latency
low high throughput
Prevailing
belief
● 10s of millions of events/s
● Latency down to 1 ms
≠
© 2018 data Artisans13
Flink becomes event-time aware
Episode
IV
Episode
V
Episode
VI
Episode
I
Episode
II
Episode
III
Episode
VII
Episode
VIII
Processing
time
1977 1980 1983 1999 2002 2005 2015 2017
Flink becomes event-time aware
© 2018 data Artisans14
Flink becomes event-time aware
Episode
I
Episode
II
Episode
III
Episode
IV
Episode
V
Episode
VI
Episode
VII
Episode
VIII
Processing time
1999 2002 2005 1977 1980 1983 2015 2017
Processing time
Event time
© 2018 data Artisans15
● Continuous & real-time
● Stateful & exactly once
● High throughput & low
latency
● Event time
November 2015
Batch processing Stream processing
© 2018 data Artisans16
More than just analytics: ProcessFunction
class MyFunction extends ProcessFunction[MyEvent, Result] {
// declare state to use in the program
lazy val state: ValueState[CountWithTimestamp] = getRuntimeContext().getState(…)
def processElement(event: MyEvent, ctx: Context, out: Collector[Result]): Unit = {
// work with event and state and schedule timers
}
def onTimer(timestamp: Long, ctx: OnTimerContext, out: Collector[Result]): Unit = {
// handle callback when event-/processing- time instant is reached
}
}
● ProcessFunction gives access to state, time
and events
● Low level API
● Enables data-driven applications
© 2018 data Artisans17
● Continuous & real-time
● Stateful & exactly once
● High throughput & low
latency
● Event time
February 2017
Batch processing Stream processing
Data-driven
applications
© 2018 data Artisans18
Present & Future
© 2018 data Artisans19
Hardening
Faster network stack
Application level flow control
Resolving dependency hell
Present in a nutshell
Scaling
Incremental snapshots
Local recovery
Scalable timers
Interoperability
Resource elasticity
REST client-server interface
Container entrypoint
Stream SQL
SQL client
User-defined functions
More powerful joins
Misc
State TTL
Broadcast state
Kafka exactly-once producer
© 2018 data Artisans20
Large, larger, Flink
Time
State
Incremental
snapshots
● Snapshot only state diff
● Incremental snapshots allow to handle very large state
© 2018 data Artisans21
Faster failover is always better
© 2018 data Artisans22
Varying workloads
• Violating SLAs vs. wasting money
• Varying workloads require to adapt resources
© 2018 data Artisans23
Revamped distributed architecture
ResourceManager ClusterManager
TaskManagerJobManager
Dispatcher
Client
1. Submit job 2. Start job
3. Request slots
4. Allocate resources
5. Start TaskManager
6. Execute job
● Support for full resource elasticity
● Application parallelism can be dynamically changed
© 2018 data Artisans24
● Continuous & real-time
● Stateful & exactly once
● High throughput & low
latency
● Event time
● Applications as first
class citizens
Present & Future
Batch processing Stream processing
Data-driven
applications
© 2018 data Artisans25
Flink as a library (and still as a framework)
• Deploying Flink applications should be as easy as starting a process
• Bundle application code and Flink into a single image
• Process connects to other application processes and figures out its role
• Removing the cluster out of the equation
P2 P3
P1
P4
New process
© 2018 data Artisans26
How much control do I need?
Batch
processing
Continuous
processing
Real-time &
data-driven
applications
● Multiple short lived stages
● Different resource requirements
per stage
● Efficient execution requires
control over resources
● Flink allocates actively resources
● Continuously processing operators
● Constrained by external systems,
SLAs and application logic
● External system can assign
resources
● Flink reacts to available resources
© 2018 data Artisans27
Active vs. reactive mode
• Active mode
‒ Flink is aware of underlying cluster framework
‒ Flink allocate resources
‒ E.g. existing YARN and Mesos integration
• Reactive mode
‒ Flink is oblivious to its runtime environment
‒ External system allocates and releases resources
‒ Flink scales with respect to available resources
‒ Relevant for environments: Kubernetes, Docker, as a
library
© 2018 data Artisans28
Scaling automatically
• Latency
• Throughput
• Resource utilization
• Connector signals
© 2018 data Artisans29
How we create Flink Jobs
Flink APIs
Stream/Batch Processing
Runtime
Distributed Streaming Data Flow
Java/Scala
© 2018 data Artisans30
Flink SQL
Flink APIs
Stream/Batch Processing
Runtime
Distributed Streaming Data Flow
Java/Scala SQL
“NO CODING REQUIRED”
Source/Sink
definition in YAML
Configuration in
YAML
SQL commandline
User-defined
functions
Streaming and
Batch
Event time and
processing time
*since Flink 0.9.0 (June 2015)
© 2018 data Artisans31
© 2018 data Artisans32
“Join” me for some trading
buy buy sell buy
Join
$ 17
£ 42
12.5
buy sell
₪
© 2018 data Artisans33
Introducing Time-versioned Table Joins
buy buy sell buy
Join
buy sell
curr rate time
£ 42 3
£ 12 17
1453
31753
14
event time
© 2018 data Artisans34
SQL for pattern analysis?
SELECT * from ?
© 2018 data Artisans35
Introducing MATCH_RECOGNIZE
SELECT *
FROM TaxiRides
MATCH_RECOGNIZE (
PARTITION BY driverId
ORDER BY rideTime
MEASURES
S.rideId as sRideId
AFTER MATCH SKIP PAST LAST ROW
PATTERN (S M{2,} E)
DEFINE
S AS S.isStart = true,
M AS M.rideId <> S.rideId,
E AS E.isStart = false
AND E.rideId = S.rideId
)
© 2018 data Artisans36
Todays processing landscape
Streaming Batch
© 2018 data Artisans37
Batch/streaming unification
© 2018 data Artisans38
Into the Future
Big state
SQL
© 2018 data Artisans39
Thank
s!

More Related Content

What's hot

Scaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache FlinkScaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache Flink
Till Rohrmann
 
Flink Forward Berlin 2018: Viktor Klang - Keynote "The convergence of stream ...
Flink Forward Berlin 2018: Viktor Klang - Keynote "The convergence of stream ...Flink Forward Berlin 2018: Viktor Klang - Keynote "The convergence of stream ...
Flink Forward Berlin 2018: Viktor Klang - Keynote "The convergence of stream ...
Flink Forward
 
Flink Forward Berlin 2018: Wei-Che (Tony) Wei - "Lessons learned from Migrati...
Flink Forward Berlin 2018: Wei-Che (Tony) Wei - "Lessons learned from Migrati...Flink Forward Berlin 2018: Wei-Che (Tony) Wei - "Lessons learned from Migrati...
Flink Forward Berlin 2018: Wei-Che (Tony) Wei - "Lessons learned from Migrati...
Flink Forward
 
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward
 
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward
 
Flink Forward Berlin 2018: Brian Wolfe - "Upshot: distributed tracing using F...
Flink Forward Berlin 2018: Brian Wolfe - "Upshot: distributed tracing using F...Flink Forward Berlin 2018: Brian Wolfe - "Upshot: distributed tracing using F...
Flink Forward Berlin 2018: Brian Wolfe - "Upshot: distributed tracing using F...
Flink Forward
 
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
confluent
 
Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processin...
Flink Forward San Francisco 2018 keynote:  Srikanth Satya - "Stream Processin...Flink Forward San Francisco 2018 keynote:  Srikanth Satya - "Stream Processin...
Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processin...
Flink Forward
 
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward
 
dA Platform Overview
dA Platform OverviewdA Platform Overview
dA Platform Overview
Robert Metzger
 
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
HostedbyConfluent
 
Flink Forward Berlin 2018: Raj Subramani - "A streaming Quantitative Analytic...
Flink Forward Berlin 2018: Raj Subramani - "A streaming Quantitative Analytic...Flink Forward Berlin 2018: Raj Subramani - "A streaming Quantitative Analytic...
Flink Forward Berlin 2018: Raj Subramani - "A streaming Quantitative Analytic...
Flink Forward
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.
Fabian Hueske
 
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
confluent
 
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
confluent
 
Real-Time Dynamic Data Export Using the Kafka Ecosystem
Real-Time Dynamic Data Export Using the Kafka EcosystemReal-Time Dynamic Data Export Using the Kafka Ecosystem
Real-Time Dynamic Data Export Using the Kafka Ecosystem
confluent
 
You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard
You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard
You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard
confluent
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
confluent
 
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
HostedbyConfluent
 
Flink Forward San Francisco 2018: Xu Yang - "Alibaba’s common algorithm platf...
Flink Forward San Francisco 2018: Xu Yang - "Alibaba’s common algorithm platf...Flink Forward San Francisco 2018: Xu Yang - "Alibaba’s common algorithm platf...
Flink Forward San Francisco 2018: Xu Yang - "Alibaba’s common algorithm platf...
Flink Forward
 

What's hot (20)

Scaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache FlinkScaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache Flink
 
Flink Forward Berlin 2018: Viktor Klang - Keynote "The convergence of stream ...
Flink Forward Berlin 2018: Viktor Klang - Keynote "The convergence of stream ...Flink Forward Berlin 2018: Viktor Klang - Keynote "The convergence of stream ...
Flink Forward Berlin 2018: Viktor Klang - Keynote "The convergence of stream ...
 
Flink Forward Berlin 2018: Wei-Che (Tony) Wei - "Lessons learned from Migrati...
Flink Forward Berlin 2018: Wei-Che (Tony) Wei - "Lessons learned from Migrati...Flink Forward Berlin 2018: Wei-Che (Tony) Wei - "Lessons learned from Migrati...
Flink Forward Berlin 2018: Wei-Che (Tony) Wei - "Lessons learned from Migrati...
 
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
 
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
 
Flink Forward Berlin 2018: Brian Wolfe - "Upshot: distributed tracing using F...
Flink Forward Berlin 2018: Brian Wolfe - "Upshot: distributed tracing using F...Flink Forward Berlin 2018: Brian Wolfe - "Upshot: distributed tracing using F...
Flink Forward Berlin 2018: Brian Wolfe - "Upshot: distributed tracing using F...
 
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
 
Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processin...
Flink Forward San Francisco 2018 keynote:  Srikanth Satya - "Stream Processin...Flink Forward San Francisco 2018 keynote:  Srikanth Satya - "Stream Processin...
Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processin...
 
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
 
dA Platform Overview
dA Platform OverviewdA Platform Overview
dA Platform Overview
 
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
 
Flink Forward Berlin 2018: Raj Subramani - "A streaming Quantitative Analytic...
Flink Forward Berlin 2018: Raj Subramani - "A streaming Quantitative Analytic...Flink Forward Berlin 2018: Raj Subramani - "A streaming Quantitative Analytic...
Flink Forward Berlin 2018: Raj Subramani - "A streaming Quantitative Analytic...
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.
 
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
 
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
 
Real-Time Dynamic Data Export Using the Kafka Ecosystem
Real-Time Dynamic Data Export Using the Kafka EcosystemReal-Time Dynamic Data Export Using the Kafka Ecosystem
Real-Time Dynamic Data Export Using the Kafka Ecosystem
 
You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard
You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard
You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
 
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
 
Flink Forward San Francisco 2018: Xu Yang - "Alibaba’s common algorithm platf...
Flink Forward San Francisco 2018: Xu Yang - "Alibaba’s common algorithm platf...Flink Forward San Francisco 2018: Xu Yang - "Alibaba’s common algorithm platf...
Flink Forward San Francisco 2018: Xu Yang - "Alibaba’s common algorithm platf...
 

Similar to The Past, Present, and Future of Apache Flink

The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®
Aljoscha Krettek
 
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHINGBig Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING
Matt Stubbs
 
Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...
Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...
Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...
Flink Forward
 
(Past), Present, and Future of Apache Flink
(Past), Present, and Future of Apache Flink(Past), Present, and Future of Apache Flink
(Past), Present, and Future of Apache Flink
Aljoscha Krettek
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
Aljoscha Krettek
 
Stream processing for the practitioner: Blueprints for common stream processi...
Stream processing for the practitioner: Blueprints for common stream processi...Stream processing for the practitioner: Blueprints for common stream processi...
Stream processing for the practitioner: Blueprints for common stream processi...
Aljoscha Krettek
 
Stream Processing with Apache Apex
Stream Processing with Apache ApexStream Processing with Apache Apex
Stream Processing with Apache Apex
Pramod Immaneni
 
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
 IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
Apache Apex
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
WSO2
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
Fabian Hueske
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Apache Apex
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Dataconomy Media
 
data Artisans Product Announcement
data Artisans Product Announcementdata Artisans Product Announcement
data Artisans Product Announcement
Flink Forward
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streaming
datamantra
 
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
Yahoo Developer Network
 
Apache Apex - Hadoop Users Group
Apache Apex - Hadoop Users GroupApache Apex - Hadoop Users Group
Apache Apex - Hadoop Users Group
Pramod Immaneni
 
Stream Processing @ Lyft
Stream Processing @ LyftStream Processing @ Lyft
Stream Processing @ Lyft
Jamie Grier
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Apache Apex
 
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetupKafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Mingmin Chen
 

Similar to The Past, Present, and Future of Apache Flink (20)

The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®
 
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHINGBig Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING
 
Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...
Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...
Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...
 
(Past), Present, and Future of Apache Flink
(Past), Present, and Future of Apache Flink(Past), Present, and Future of Apache Flink
(Past), Present, and Future of Apache Flink
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
Stream processing for the practitioner: Blueprints for common stream processi...
Stream processing for the practitioner: Blueprints for common stream processi...Stream processing for the practitioner: Blueprints for common stream processi...
Stream processing for the practitioner: Blueprints for common stream processi...
 
Stream Processing with Apache Apex
Stream Processing with Apache ApexStream Processing with Apache Apex
Stream Processing with Apache Apex
 
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
 IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
 
data Artisans Product Announcement
data Artisans Product Announcementdata Artisans Product Announcement
data Artisans Product Announcement
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streaming
 
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
 
Apache Apex - Hadoop Users Group
Apache Apex - Hadoop Users GroupApache Apex - Hadoop Users Group
Apache Apex - Hadoop Users Group
 
Stream Processing @ Lyft
Stream Processing @ LyftStream Processing @ Lyft
Stream Processing @ Lyft
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
 
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetupKafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetup
 

More from Aljoscha Krettek

Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream Processor
Aljoscha Krettek
 
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Aljoscha Krettek
 
The Evolution of (Open Source) Data Processing
The Evolution of (Open Source) Data ProcessingThe Evolution of (Open Source) Data Processing
The Evolution of (Open Source) Data Processing
Aljoscha Krettek
 
Python Streaming Pipelines with Beam on Flink
Python Streaming Pipelines with Beam on FlinkPython Streaming Pipelines with Beam on Flink
Python Streaming Pipelines with Beam on Flink
Aljoscha Krettek
 
Robust stream processing with Apache Flink
Robust stream processing with Apache FlinkRobust stream processing with Apache Flink
Robust stream processing with Apache Flink
Aljoscha Krettek
 
Unified stateful big data processing in Apache Beam (incubating)
Unified stateful big data processing in Apache Beam (incubating)Unified stateful big data processing in Apache Beam (incubating)
Unified stateful big data processing in Apache Beam (incubating)
Aljoscha Krettek
 
Advanced Flink Training - Design patterns for streaming applications
Advanced Flink Training - Design patterns for streaming applicationsAdvanced Flink Training - Design patterns for streaming applications
Advanced Flink Training - Design patterns for streaming applications
Aljoscha Krettek
 
Apache Flink - A Stream Processing Engine
Apache Flink - A Stream Processing EngineApache Flink - A Stream Processing Engine
Apache Flink - A Stream Processing Engine
Aljoscha Krettek
 
Adventures in Timespace - How Apache Flink Handles Time and Windows
Adventures in Timespace - How Apache Flink Handles Time and WindowsAdventures in Timespace - How Apache Flink Handles Time and Windows
Adventures in Timespace - How Apache Flink Handles Time and Windows
Aljoscha Krettek
 
Flink 0.10 - Upcoming Features
Flink 0.10 - Upcoming FeaturesFlink 0.10 - Upcoming Features
Flink 0.10 - Upcoming Features
Aljoscha Krettek
 
Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)
Aljoscha Krettek
 
Apache Flink Hands-On
Apache Flink Hands-OnApache Flink Hands-On
Apache Flink Hands-On
Aljoscha Krettek
 

More from Aljoscha Krettek (12)

Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream Processor
 
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
 
The Evolution of (Open Source) Data Processing
The Evolution of (Open Source) Data ProcessingThe Evolution of (Open Source) Data Processing
The Evolution of (Open Source) Data Processing
 
Python Streaming Pipelines with Beam on Flink
Python Streaming Pipelines with Beam on FlinkPython Streaming Pipelines with Beam on Flink
Python Streaming Pipelines with Beam on Flink
 
Robust stream processing with Apache Flink
Robust stream processing with Apache FlinkRobust stream processing with Apache Flink
Robust stream processing with Apache Flink
 
Unified stateful big data processing in Apache Beam (incubating)
Unified stateful big data processing in Apache Beam (incubating)Unified stateful big data processing in Apache Beam (incubating)
Unified stateful big data processing in Apache Beam (incubating)
 
Advanced Flink Training - Design patterns for streaming applications
Advanced Flink Training - Design patterns for streaming applicationsAdvanced Flink Training - Design patterns for streaming applications
Advanced Flink Training - Design patterns for streaming applications
 
Apache Flink - A Stream Processing Engine
Apache Flink - A Stream Processing EngineApache Flink - A Stream Processing Engine
Apache Flink - A Stream Processing Engine
 
Adventures in Timespace - How Apache Flink Handles Time and Windows
Adventures in Timespace - How Apache Flink Handles Time and WindowsAdventures in Timespace - How Apache Flink Handles Time and Windows
Adventures in Timespace - How Apache Flink Handles Time and Windows
 
Flink 0.10 - Upcoming Features
Flink 0.10 - Upcoming FeaturesFlink 0.10 - Upcoming Features
Flink 0.10 - Upcoming Features
 
Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)
 
Apache Flink Hands-On
Apache Flink Hands-OnApache Flink Hands-On
Apache Flink Hands-On
 

Recently uploaded

Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 

Recently uploaded (20)

Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 

The Past, Present, and Future of Apache Flink

  • 1. Aljoscha Krettek, engineering manager & co-founder Till Rohrmann, engineering manager & co-founder The Past, Present, and Future of Apache Flink®
  • 2. © 2018 data Artisans2 Past
  • 3. © 2018 data Artisans3 It all started in 2014 2009 - 2014 since 2014 ● Batch processor on top of streaming runtime ● First Apache Flink 0.6.0 release August 2014
  • 4. © 2018 data Artisans4 August 2014 Batch processing
  • 5. © 2018 data Artisans5 Flink learns to stream in real time DataStream API Stream Processing DataSet API Batch Processing Runtime Distributed Streaming Data Flow
  • 6. © 2018 data Artisans6 ● Continuous & real-time November 2014 Batch processing Stream processing
  • 7. © 2018 data Artisans7 Flink learns to remember
  • 8. © 2018 data Artisans8 Flink learns to remember
  • 9. © 2018 data Artisans9 Flink learns to remember
  • 10. © 2018 data Artisans10 Flink learns to remember Remember where we left off
  • 11. © 2018 data Artisans11 ● Continuous & real-time ● Stateful & exactly once June 2015 Batch processing Stream processing
  • 12. © 2018 data Artisans12 Latency vs. Throughput? high latency low high throughput Prevailing belief ● 10s of millions of events/s ● Latency down to 1 ms ≠
  • 13. © 2018 data Artisans13 Flink becomes event-time aware Episode IV Episode V Episode VI Episode I Episode II Episode III Episode VII Episode VIII Processing time 1977 1980 1983 1999 2002 2005 2015 2017 Flink becomes event-time aware
  • 14. © 2018 data Artisans14 Flink becomes event-time aware Episode I Episode II Episode III Episode IV Episode V Episode VI Episode VII Episode VIII Processing time 1999 2002 2005 1977 1980 1983 2015 2017 Processing time Event time
  • 15. © 2018 data Artisans15 ● Continuous & real-time ● Stateful & exactly once ● High throughput & low latency ● Event time November 2015 Batch processing Stream processing
  • 16. © 2018 data Artisans16 More than just analytics: ProcessFunction class MyFunction extends ProcessFunction[MyEvent, Result] { // declare state to use in the program lazy val state: ValueState[CountWithTimestamp] = getRuntimeContext().getState(…) def processElement(event: MyEvent, ctx: Context, out: Collector[Result]): Unit = { // work with event and state and schedule timers } def onTimer(timestamp: Long, ctx: OnTimerContext, out: Collector[Result]): Unit = { // handle callback when event-/processing- time instant is reached } } ● ProcessFunction gives access to state, time and events ● Low level API ● Enables data-driven applications
  • 17. © 2018 data Artisans17 ● Continuous & real-time ● Stateful & exactly once ● High throughput & low latency ● Event time February 2017 Batch processing Stream processing Data-driven applications
  • 18. © 2018 data Artisans18 Present & Future
  • 19. © 2018 data Artisans19 Hardening Faster network stack Application level flow control Resolving dependency hell Present in a nutshell Scaling Incremental snapshots Local recovery Scalable timers Interoperability Resource elasticity REST client-server interface Container entrypoint Stream SQL SQL client User-defined functions More powerful joins Misc State TTL Broadcast state Kafka exactly-once producer
  • 20. © 2018 data Artisans20 Large, larger, Flink Time State Incremental snapshots ● Snapshot only state diff ● Incremental snapshots allow to handle very large state
  • 21. © 2018 data Artisans21 Faster failover is always better
  • 22. © 2018 data Artisans22 Varying workloads • Violating SLAs vs. wasting money • Varying workloads require to adapt resources
  • 23. © 2018 data Artisans23 Revamped distributed architecture ResourceManager ClusterManager TaskManagerJobManager Dispatcher Client 1. Submit job 2. Start job 3. Request slots 4. Allocate resources 5. Start TaskManager 6. Execute job ● Support for full resource elasticity ● Application parallelism can be dynamically changed
  • 24. © 2018 data Artisans24 ● Continuous & real-time ● Stateful & exactly once ● High throughput & low latency ● Event time ● Applications as first class citizens Present & Future Batch processing Stream processing Data-driven applications
  • 25. © 2018 data Artisans25 Flink as a library (and still as a framework) • Deploying Flink applications should be as easy as starting a process • Bundle application code and Flink into a single image • Process connects to other application processes and figures out its role • Removing the cluster out of the equation P2 P3 P1 P4 New process
  • 26. © 2018 data Artisans26 How much control do I need? Batch processing Continuous processing Real-time & data-driven applications ● Multiple short lived stages ● Different resource requirements per stage ● Efficient execution requires control over resources ● Flink allocates actively resources ● Continuously processing operators ● Constrained by external systems, SLAs and application logic ● External system can assign resources ● Flink reacts to available resources
  • 27. © 2018 data Artisans27 Active vs. reactive mode • Active mode ‒ Flink is aware of underlying cluster framework ‒ Flink allocate resources ‒ E.g. existing YARN and Mesos integration • Reactive mode ‒ Flink is oblivious to its runtime environment ‒ External system allocates and releases resources ‒ Flink scales with respect to available resources ‒ Relevant for environments: Kubernetes, Docker, as a library
  • 28. © 2018 data Artisans28 Scaling automatically • Latency • Throughput • Resource utilization • Connector signals
  • 29. © 2018 data Artisans29 How we create Flink Jobs Flink APIs Stream/Batch Processing Runtime Distributed Streaming Data Flow Java/Scala
  • 30. © 2018 data Artisans30 Flink SQL Flink APIs Stream/Batch Processing Runtime Distributed Streaming Data Flow Java/Scala SQL “NO CODING REQUIRED” Source/Sink definition in YAML Configuration in YAML SQL commandline User-defined functions Streaming and Batch Event time and processing time *since Flink 0.9.0 (June 2015)
  • 31. © 2018 data Artisans31
  • 32. © 2018 data Artisans32 “Join” me for some trading buy buy sell buy Join $ 17 £ 42 12.5 buy sell ₪
  • 33. © 2018 data Artisans33 Introducing Time-versioned Table Joins buy buy sell buy Join buy sell curr rate time £ 42 3 £ 12 17 1453 31753 14 event time
  • 34. © 2018 data Artisans34 SQL for pattern analysis? SELECT * from ?
  • 35. © 2018 data Artisans35 Introducing MATCH_RECOGNIZE SELECT * FROM TaxiRides MATCH_RECOGNIZE ( PARTITION BY driverId ORDER BY rideTime MEASURES S.rideId as sRideId AFTER MATCH SKIP PAST LAST ROW PATTERN (S M{2,} E) DEFINE S AS S.isStart = true, M AS M.rideId <> S.rideId, E AS E.isStart = false AND E.rideId = S.rideId )
  • 36. © 2018 data Artisans36 Todays processing landscape Streaming Batch
  • 37. © 2018 data Artisans37 Batch/streaming unification
  • 38. © 2018 data Artisans38 Into the Future Big state SQL
  • 39. © 2018 data Artisans39 Thank s!