SlideShare a Scribd company logo
Apache Flink® 1.7 and Beyond
公司:data Artisans
职位:Engineering Lead
演讲者:Till Rohrmann
@stsffap
1
2
Original creators
of
Apache Flink®
dA Platform
Stream Processing
for the Enterprise
3
What is Apache Flink?
Batch
Processing
process static and
historic data
Data Stream
Processing
realtime results
from data streams
Event-driven
Applications
data-driven actions
and services
Stateful Computations Over
Data Streams
Flink 1.7: What happened so
far?
4
• Contributors: 112
• Resolved issues: 430
• Commits: 970
• Changes LOC: +103824/-63124
5
Flink 1.7.0 in Numbers
• E.g. changing requirements, new algorithms, better serializers,
bug fixes, etc.
• Expensive to restart application from scratch (maintain state)
6
Flink Applications Need to Evolve
• Support for changing state schema
• Adding/Removing fields
• Changing type of fields
• Currently fully supported when using Avro
types
7
State Schema Evolution
“Upgrading Stateful Flink
Streaming Applications:
State of the Union” by
Tzu-Li Tai Today @
5:20 pm Room 2
8
Converting Currencies
7:12pm 9:37am 8:45am
€ 1
$ 1.13
CN¥ 7.8
9
Temporal Tables and Joins
13 11 7
Currency Rate Time
CN¥ 7.8 3
CN¥ 7.89 5
CN¥ 7.75 915 14 12
7 4
10
SQL for Pattern Analysis
SELECT * from ?
11
MATCH_RECOGNIZESELECT *
FROM TaxiRides
MATCH_RECOGNIZE (
PARTITION BY driverId
ORDER BY rideTime
MEASURES
S.rideId as sRideId
AFTER MATCH SKIP PAST LAST ROW
PATTERN (S M{2,} E)
DEFINE
S AS S.isStart = true,
M AS M.rideId <> S.rideId,
E AS E.isStart = false
AND E.rideId = S.rideId
)
• ElasticSearch 6 Table Sink
• Support for views in SQL Client
• More built-in functions: TO_BASE64, LOG2, REPLACE, COSH,…
12
More SQL Improvements
“Flink Streaming SQL 2018”
by Piotr Nowojski Today @
4:00 pm Room 2
• Scala 2.12 Support
• Exactly-once S3 StreamingFileSink
• Kafka 2.0 connector
• Versioned REST API
• Removal of legacy mode
13
Other Notable Features
Flink 1.8+: What is happening
next?
14
15
Capability Spectrum
offline real time
Batch
Event-driven
applications
Streaming
analytics
Strict SLA
applications
Flink
• Deploying Flink applications should be as easy as starting a process
• Bundle application code and Flink into a single image
• Process connects to other application processes and figures out its role
• Removing the cluster out of the equation
16
Flink as a Library
P1
P2 P3 P4
New process
• Active mode
• Flink is aware of underlying cluster framework
• Flink allocate resources
• E.g. existing YARN and Mesos integration
• Reactive mode
• Flink is oblivious to its runtime environment
• External system allocates and releases resources
• Flink scales with respect to available resources
• Relevant for environments: Kubernetes, Docker, as a library
17
Reactive vs. Active
18
Dynamic Scaling
• Latency
• Throughput
• Resource utilization
• Connector signals
• No fundamental difference between batch and stream processing
• Batch allows optimizations because data is bounded and
”complete”
• Batch and streaming still separately treated from task level upwards
• Working toward a single runtime for batch and streaming workloads
19
Batch-Streaming Unification
• Lazy scheduling (batch case)
• Deploy tasks starting from the
sources
• Whenever data is produced
start consumers
• Scheduling of idling tasks 
resource under-utilization
20
Flink Scheduler
src
src
join join
src
build side
build side
prob
e
side
probe
side
• More efficient scheduling by
taking dependencies into account
• E.g. probe side is only scheduled
after build side has been
processed
21
Batch Scheduler
src
src
join join
src
build side
build side
prob
e
side
probe
side
(1)
(2)
(2)
(3)
• Make Flink’s scheduler extendable &
pluggable
• Scheduler considers dependencies and
reacts to signals from ExecutionGraph
• Specialized scheduler for different use
cases
22
Extendable Scheduler
Scheduler
Streaming
Scheduler
Batch
Scheduler
Speculative
Scheduler
• Tasks own produced result
partitions
• Containers cannot be freed
until result is consumed
• One implementation for
streaming and batch loads
23
Flink’s Shuffle Service
Result partitionContainer
• Result partitions are written to
an external shuffle service
• Containers can be freed early
• Different implementations
based on use case
24
External & Persistent Shuffle Service
External shuffle
service (e.g. Yarn,
DFS)
• Support for external catalogs
(Confluent Schema Registry, Hive
Meta Store)
• Data definition language (DDL)
25
End-to-end SQL Only Pipelines
Hive Meta Store
Table
Source
Table
Sink
Output schema
information
Input schema
information
SQL
Query
• Flink 1.7.0 added many new features around SQL, connectors and state evolution
• A lot of new features in the pipeline
• Join the community!
• Subscribe to mailing lists
• Participate in Flink development
• Become active
26
TL;DL
谢谢
THANKS
27

More Related Content

What's hot

Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Till Rohrmann
 
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords   The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
Stephan Ewen
 
dA Platform Overview
dA Platform OverviewdA Platform Overview
dA Platform Overview
Robert Metzger
 
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
Flink Forward
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
Ververica
 
The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®
Aljoscha Krettek
 
Apache Flink and More @ MesosCon Asia 2017
Apache Flink and More @ MesosCon Asia 2017Apache Flink and More @ MesosCon Asia 2017
Apache Flink and More @ MesosCon Asia 2017
Till Rohrmann
 
From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4
Till Rohrmann
 
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Till Rohrmann
 
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Till Rohrmann
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.
Fabian Hueske
 
Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processin...
Flink Forward San Francisco 2018 keynote:  Srikanth Satya - "Stream Processin...Flink Forward San Francisco 2018 keynote:  Srikanth Satya - "Stream Processin...
Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processin...
Flink Forward
 
Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016
Stephan Ewen
 
Flink Community Update December 2015: Year in Review
Flink Community Update December 2015: Year in ReviewFlink Community Update December 2015: Year in Review
Flink Community Update December 2015: Year in Review
Robert Metzger
 
QCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkQCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache Flink
Robert Metzger
 
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward
 
Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...
Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...
Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...
Flink Forward
 
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
Flink Forward
 
Flink 1.0-slides
Flink 1.0-slidesFlink 1.0-slides
Flink 1.0-slides
Jamie Grier
 
A look at Flink 1.2
A look at Flink 1.2A look at Flink 1.2
A look at Flink 1.2
Stefan Richter
 

What's hot (20)

Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
 
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords   The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
 
dA Platform Overview
dA Platform OverviewdA Platform Overview
dA Platform Overview
 
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®
 
Apache Flink and More @ MesosCon Asia 2017
Apache Flink and More @ MesosCon Asia 2017Apache Flink and More @ MesosCon Asia 2017
Apache Flink and More @ MesosCon Asia 2017
 
From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4
 
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
 
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.
 
Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processin...
Flink Forward San Francisco 2018 keynote:  Srikanth Satya - "Stream Processin...Flink Forward San Francisco 2018 keynote:  Srikanth Satya - "Stream Processin...
Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processin...
 
Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016
 
Flink Community Update December 2015: Year in Review
Flink Community Update December 2015: Year in ReviewFlink Community Update December 2015: Year in Review
Flink Community Update December 2015: Year in Review
 
QCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkQCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache Flink
 
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
 
Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...
Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...
Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...
 
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
 
Flink 1.0-slides
Flink 1.0-slidesFlink 1.0-slides
Flink 1.0-slides
 
A look at Flink 1.2
A look at Flink 1.2A look at Flink 1.2
A look at Flink 1.2
 

Similar to Apache flink 1.7 and Beyond

data Artisans Product Announcement
data Artisans Product Announcementdata Artisans Product Announcement
data Artisans Product Announcement
Flink Forward
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
Fabian Hueske
 
OpenDaylight Openflow & OVSDB use cases ODL summit 2016
OpenDaylight Openflow & OVSDB use cases ODL summit 2016OpenDaylight Openflow & OVSDB use cases ODL summit 2016
OpenDaylight Openflow & OVSDB use cases ODL summit 2016
abhijit2511
 
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIsFabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Flink Forward
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
Apache Apex
 
Flink September 2015 Community Update
Flink September 2015 Community UpdateFlink September 2015 Community Update
Flink September 2015 Community Update
Robert Metzger
 
Apache Flink: Past, Present and Future
Apache Flink: Past, Present and FutureApache Flink: Past, Present and Future
Apache Flink: Past, Present and Future
Gyula Fóra
 
Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices   Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices
ZalandoHayley
 
Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices  Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices
Zalando Technology
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Apache Apex
 
Distributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleDistributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola Scale
Apache Kafka TLV
 
Berlin Apache Flink Meetup May 2015, Community Update
Berlin Apache Flink Meetup May 2015, Community UpdateBerlin Apache Flink Meetup May 2015, Community Update
Berlin Apache Flink Meetup May 2015, Community Update
Robert Metzger
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
Apache Flink Online Training
Apache Flink Online TrainingApache Flink Online Training
Apache Flink Online Training
Learntek1
 
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Training Workshop @ HadoopCon2016 - #1 System OverviewApache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Taiwan User Group
 
OpenStack Nova - Developer Introduction
OpenStack Nova - Developer IntroductionOpenStack Nova - Developer Introduction
OpenStack Nova - Developer Introduction
John Garbutt
 
Apache flink
Apache flinkApache flink
Apache flink
Janu Jahnavi
 
Apache flink
Apache flinkApache flink
Apache flink
Janu Jahnavi
 
Streaming sql and druid
Streaming sql and druid Streaming sql and druid
Streaming sql and druid
arupmalakar
 

Similar to Apache flink 1.7 and Beyond (20)

data Artisans Product Announcement
data Artisans Product Announcementdata Artisans Product Announcement
data Artisans Product Announcement
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
OpenDaylight Openflow & OVSDB use cases ODL summit 2016
OpenDaylight Openflow & OVSDB use cases ODL summit 2016OpenDaylight Openflow & OVSDB use cases ODL summit 2016
OpenDaylight Openflow & OVSDB use cases ODL summit 2016
 
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIsFabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
 
Flink September 2015 Community Update
Flink September 2015 Community UpdateFlink September 2015 Community Update
Flink September 2015 Community Update
 
Apache Flink: Past, Present and Future
Apache Flink: Past, Present and FutureApache Flink: Past, Present and Future
Apache Flink: Past, Present and Future
 
Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices   Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices
 
Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices  Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Distributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleDistributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola Scale
 
Berlin Apache Flink Meetup May 2015, Community Update
Berlin Apache Flink Meetup May 2015, Community UpdateBerlin Apache Flink Meetup May 2015, Community Update
Berlin Apache Flink Meetup May 2015, Community Update
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Apache Flink Online Training
Apache Flink Online TrainingApache Flink Online Training
Apache Flink Online Training
 
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Training Workshop @ HadoopCon2016 - #1 System OverviewApache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
 
OpenStack Nova - Developer Introduction
OpenStack Nova - Developer IntroductionOpenStack Nova - Developer Introduction
OpenStack Nova - Developer Introduction
 
Apache flink
Apache flinkApache flink
Apache flink
 
Apache flink
Apache flinkApache flink
Apache flink
 
Streaming sql and druid
Streaming sql and druid Streaming sql and druid
Streaming sql and druid
 

More from Till Rohrmann

Apache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup Berlin
Apache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup BerlinApache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup Berlin
Apache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup Berlin
Till Rohrmann
 
Apache Flink® Meets Apache Mesos® and DC/OS
Apache Flink® Meets Apache Mesos® and DC/OSApache Flink® Meets Apache Mesos® and DC/OS
Apache Flink® Meets Apache Mesos® and DC/OS
Till Rohrmann
 
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...
Till Rohrmann
 
Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?
Till Rohrmann
 
Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016
Till Rohrmann
 
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Till Rohrmann
 
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Till Rohrmann
 
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in BerlinInteractive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Till Rohrmann
 
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Till Rohrmann
 
Machine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning GroupMachine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning Group
Till Rohrmann
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 

More from Till Rohrmann (11)

Apache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup Berlin
Apache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup BerlinApache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup Berlin
Apache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup Berlin
 
Apache Flink® Meets Apache Mesos® and DC/OS
Apache Flink® Meets Apache Mesos® and DC/OSApache Flink® Meets Apache Mesos® and DC/OS
Apache Flink® Meets Apache Mesos® and DC/OS
 
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...
 
Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?
 
Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016
 
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
 
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
 
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in BerlinInteractive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
 
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
 
Machine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning GroupMachine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning Group
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 

Recently uploaded

By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 

Recently uploaded (20)

By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 

Apache flink 1.7 and Beyond

  • 1. Apache Flink® 1.7 and Beyond 公司:data Artisans 职位:Engineering Lead 演讲者:Till Rohrmann @stsffap 1
  • 2. 2 Original creators of Apache Flink® dA Platform Stream Processing for the Enterprise
  • 3. 3 What is Apache Flink? Batch Processing process static and historic data Data Stream Processing realtime results from data streams Event-driven Applications data-driven actions and services Stateful Computations Over Data Streams
  • 4. Flink 1.7: What happened so far? 4
  • 5. • Contributors: 112 • Resolved issues: 430 • Commits: 970 • Changes LOC: +103824/-63124 5 Flink 1.7.0 in Numbers
  • 6. • E.g. changing requirements, new algorithms, better serializers, bug fixes, etc. • Expensive to restart application from scratch (maintain state) 6 Flink Applications Need to Evolve
  • 7. • Support for changing state schema • Adding/Removing fields • Changing type of fields • Currently fully supported when using Avro types 7 State Schema Evolution “Upgrading Stateful Flink Streaming Applications: State of the Union” by Tzu-Li Tai Today @ 5:20 pm Room 2
  • 8. 8 Converting Currencies 7:12pm 9:37am 8:45am € 1 $ 1.13 CN¥ 7.8
  • 9. 9 Temporal Tables and Joins 13 11 7 Currency Rate Time CN¥ 7.8 3 CN¥ 7.89 5 CN¥ 7.75 915 14 12 7 4
  • 10. 10 SQL for Pattern Analysis SELECT * from ?
  • 11. 11 MATCH_RECOGNIZESELECT * FROM TaxiRides MATCH_RECOGNIZE ( PARTITION BY driverId ORDER BY rideTime MEASURES S.rideId as sRideId AFTER MATCH SKIP PAST LAST ROW PATTERN (S M{2,} E) DEFINE S AS S.isStart = true, M AS M.rideId <> S.rideId, E AS E.isStart = false AND E.rideId = S.rideId )
  • 12. • ElasticSearch 6 Table Sink • Support for views in SQL Client • More built-in functions: TO_BASE64, LOG2, REPLACE, COSH,… 12 More SQL Improvements “Flink Streaming SQL 2018” by Piotr Nowojski Today @ 4:00 pm Room 2
  • 13. • Scala 2.12 Support • Exactly-once S3 StreamingFileSink • Kafka 2.0 connector • Versioned REST API • Removal of legacy mode 13 Other Notable Features
  • 14. Flink 1.8+: What is happening next? 14
  • 15. 15 Capability Spectrum offline real time Batch Event-driven applications Streaming analytics Strict SLA applications Flink
  • 16. • Deploying Flink applications should be as easy as starting a process • Bundle application code and Flink into a single image • Process connects to other application processes and figures out its role • Removing the cluster out of the equation 16 Flink as a Library P1 P2 P3 P4 New process
  • 17. • Active mode • Flink is aware of underlying cluster framework • Flink allocate resources • E.g. existing YARN and Mesos integration • Reactive mode • Flink is oblivious to its runtime environment • External system allocates and releases resources • Flink scales with respect to available resources • Relevant for environments: Kubernetes, Docker, as a library 17 Reactive vs. Active
  • 18. 18 Dynamic Scaling • Latency • Throughput • Resource utilization • Connector signals
  • 19. • No fundamental difference between batch and stream processing • Batch allows optimizations because data is bounded and ”complete” • Batch and streaming still separately treated from task level upwards • Working toward a single runtime for batch and streaming workloads 19 Batch-Streaming Unification
  • 20. • Lazy scheduling (batch case) • Deploy tasks starting from the sources • Whenever data is produced start consumers • Scheduling of idling tasks  resource under-utilization 20 Flink Scheduler src src join join src build side build side prob e side probe side
  • 21. • More efficient scheduling by taking dependencies into account • E.g. probe side is only scheduled after build side has been processed 21 Batch Scheduler src src join join src build side build side prob e side probe side (1) (2) (2) (3)
  • 22. • Make Flink’s scheduler extendable & pluggable • Scheduler considers dependencies and reacts to signals from ExecutionGraph • Specialized scheduler for different use cases 22 Extendable Scheduler Scheduler Streaming Scheduler Batch Scheduler Speculative Scheduler
  • 23. • Tasks own produced result partitions • Containers cannot be freed until result is consumed • One implementation for streaming and batch loads 23 Flink’s Shuffle Service Result partitionContainer
  • 24. • Result partitions are written to an external shuffle service • Containers can be freed early • Different implementations based on use case 24 External & Persistent Shuffle Service External shuffle service (e.g. Yarn, DFS)
  • 25. • Support for external catalogs (Confluent Schema Registry, Hive Meta Store) • Data definition language (DDL) 25 End-to-end SQL Only Pipelines Hive Meta Store Table Source Table Sink Output schema information Input schema information SQL Query
  • 26. • Flink 1.7.0 added many new features around SQL, connectors and state evolution • A lot of new features in the pipeline • Join the community! • Subscribe to mailing lists • Participate in Flink development • Become active 26 TL;DL

Editor's Notes

  1. State evolution was missing piece to properly evolve Flink applications (topology changes were already possible) State evolution is particularly important for users who have amassed a lot of state which they cannot discard/recompute because it would be too expensive Evolving the state is a cheap solution by adding/removing fields to capture new features
  2. State schema evolution is currently supported when using Avro types Limited by what Avro supports wrt schema changes In the future, state transformations are conceivable  Mapping state to a completely different type A => B Dedicated talk available
  3. Example to motivate temporal tables and temporal joins: Converting currencies of buys and sales which are executed in different currencies Stream A are the incoming buys/sales, when they happened, in which currency they were executed and the amount of transferred money Stream B are the currency rates Goal join buys/sales with the latest currency rate to calculate exact costs/income in some given currency
  4. Temporal tables are tables which have an additional time/version column The table can contain multiple entries for the same key originating from different times Temporal table allows to join with latest row for a given key wrt some timestamp Perfect for joining buys/sales with their respective currency rate for conversion
  5. How do you use SQL if you want to extract temporal patterns (e.g. you want to find out how many of your warehouse orders have not been processed correctly) Easy to do with complex event processing which has been developed for this purpose CEP allows you to define patterns via a regular expression like language which can then be extracted from your data For example, let’s assume we have a stream of taxi ride events (passenger X starts ride, passenger Y stops ride) We want to find rides where the taxi picks up two other passengers after the first passenger starts its ride How do we express this with SQL?
  6. MATCH_RECOGNIZE comes to our rescue MATCH_RECOGNIZE allows to define temporal patterns in your query Support for this key word was added with Flink 1.7 which is also compliant to SQL:2016 Purple defines the pattern: First a start event where a passenger X enters the taxi, then two events where another ride starts (new passengers enter the taxi) and last the event where passenger X leaves the taxi The events of the purple pattern are defined in the DEFINE clause Passengers are identified by the rideId
  7. Many more SQL improvements where added with Flink 1.7.0 SQL views define virtual tables from a SQL query Refer to dedicated talk by Piotr
  8. Scala 2.12 is now supported  Users can now use newer Scala version and more powerful Scala 2.12 ecosystem The StreamingFileSink which has been added in 1.6.0 now also supports writing to S3 with exactly once processing guarantees Flink has now a Kafka 2.0 connector being able to read from Kafka 2.0 Flink’s REST API is versioned  No more breaking changes for third party integrations which use the REST API The legacy mode has been removed  Flip-6 only
  9. If we look at the capability spectrum, Flink shines most in the domain of streaming analytics, continuous processing, event driven applications and with some limitations in batch However, in order to process batch workloads as good as streaming only workloads Flink needs still to improve a bit (e.g. scheduling, dynamic memory management, persistent intermediate results, …) The same applies for the near real-time applications which have low latency SLAs (Flink needs to recover faster in order to fulfill the SLAs, auto-scaling, …) The goal is for Flink to become the primary choice when trying to solve a problem from any of these domains Flink 1.8 will take some incremental steps to extend Flink’s capability spectrum
  10. More and more people use Flink to develop event-driven applications These applications should be as easy to deploy as starting a local application Atm users need to manage a Flink cluster to deploy an application to The job mode has eased the situation a bit but you still need to operate a cluster Idea: Completely remove the cluster out of the equation by by bundling Flink and application code into one image A process started from this image will automatically connect to the other Flink processes and form a cluster on its own User only needs to start and stop processes that’s all
  11. In order to make deployment as easy as just described, the community is working on a new execution mode The existing execution mode is the active mode In the active mode, Flink is aware of its underlying resource manager and can talk to it in order to allocate/free resources (e.g. Yarn, Mesos) The community is currently working on adding support for running Flink in active mode on Kubernetes (needs to implement a KubernetesResourceManager) The reactive mode differs in the sense that Flink does not know about its runtime environment Thus, Flink cannot allocate/free resources. Allocation/Freeing needs to be done by an external process New resources (TaskManagers) once started will join the cluster and Flink will automatically make use of these new resources Depending on the job requirements Flink will automatically scale up/down the job wrt to the available resources
  12. Enable auto-scaling Based on some metrics and wrt to the available resources Flink will decide when and how to rescale a Flink job The community will add a RescalingPolicy interface which allows the operators to define when to signal a rescaling request Flink will make sure that all resources are used efficiently The reactive mode will enable dynamic scaling also in container environments where an external system controls the resources
  13. A long sought-after goal is the unification of batch and stream processing There are no fundamental differences between batch and stream processing which would prevent the unification Atm Flink handles batch and streaming on the API level separately (exception is the Table API) Big parts of the runtime are shared by batch and streaming workloads (network stack, distributed components) However from the Task level upwards (StreamTask, StreamOperator, BatchTask) there are still separate code paths for batch and streaming Flink 1.8 will help paving the way for a truly unified runtime for batch and streaming workloads
  14. Scheduling of batch jobs is currently sub-optimal Flink uses lazy scheduling starting from the sources Depending on the intermediate result type it schedules the consumers either when the first data has been produced or when the intermediate result has been completed Often different branches have dependencies and should be scheduled accordingly hash-join has a build and probe side, the build side needs to be processed before the probe side is consumed  No need to schedule the probe side operators Atm, these dependencies are not respected which leads to resource under-utilization
  15. Better scheduling would be by taking dependencies into account First schedule build side Second schedule probe side and build side of second join operator Last schedule probe side of second join operator
  16. In order to support different schedulers, the community is currently working on making the scheduler extendable and pluggable Schedulers will receive topology (including operators dependencies) and signals from the ExecutionGraph (e.g. task finished, data available, etc) Based on these signals the scheduler can make scheduling decisions Specialized schedulers for different use cases conceivable (e.g. streaming scheduler which schedules all operators vs. batch scheduler which schedules the topology stage by stage)
  17. Another way to improve Flink’s batch capabilities is to make Flink’s shuffle service pluggable Atm Flink’s shuffle service is coupled to the lifetime of the TaskManager The shuffle service either keeps results in memory or spills to disk (depending on the exchange mode) The problem is that results are bound to the lifetime of the TaskManager and, thus, it is not possible to release containers after tasks have finished Moreover, there is only one shuffle service implementation which has to work for streaming and batch workloads
  18. Idea is to make the shuffle service pluggable so that an external and persistent shuffle service can be supplied Such a shuffle service allows to decouple the result partitions from the underlying containers and, thus, containers can be released earlier Possible shuffle service implementations could be based on persisting results to DFS or implemeting a Yarn based shuffle service Making the shuffle service pluggable will make Flink more flexible wrt supported use cases
  19. Two important features to allow users to define end-to-end SQL only pipelines (without writing any (Java/Scala) code) are in the making Support for external catalogs to easily read from and write to SQL tables Examples: Hive meta store for batch SQL jobs or the confluent schema registry for streaming SQL Prerequisite is that the SQL sources/sinks have been registered at the catalog If this has not happened, then support for a data definition language (DDL) could be helpful With the DDL it will be possible to define SQL tables using a SQL query (schema + meta information) These two features will simplify the usage of SQL with Flink tremendously