SlideShare a Scribd company logo
1 of 39
Download to read offline
Build a Flink AI Ecosystem
Jiangjie (Becket) Qin
Flink Forward Berlin 2019
Agenda
• Why AI Ecosystem on Flink?
• Flink ML Pipeline & Flink ML Libs
• Deep learning on Flink
• Enhanced Iteration & Dynamic Model Serving
• Better Python support
2
Lambda - what’s everyone doing
HDFS
Message Queue
Batch Processing
Stream Processing
Combine the
results
Query Result
Offline path
Online path
3
Batch Layer
Speed Layer
Serving Layer
Lambda - what’s everyone doing
HDFS
Message Queue
Batch Processing
(Spark/M-R)
Stream Processing
(Flink/Storm)
Combine the
results
Query Result
• Two code bases for online and offline processing logic
• High maintenance cost
• Difficult to ensure consistent processing logic
Offline path
Online path
4
Batch-Stream Processing Unification
• Use the same engine for online and offline processing
• Spark
• Flink
HDFS
Message Queue
Batch Processing
(Flink/Spark)
Stream Processing
(Flink/Spark Streaming)
Combine the
results
Query Result
Offline path
Online path
5
So what about ML?
• A typical ML scenario
• Offline training (TF, PyTorch, etc)
• Static models
• Online inference (Flink)
• The data preprocessing logic in training and inference are often two
code bases
HDFS Offline Training Inference
Static model
Preprocessing
PreprocessingOffline path Online path
6
So what about ML?
• Online training is gaining popularity
• More prompt model update
• Dynamic model and continuous training
• Progressive validation
• More sophisticated monitoring and model deployment / rollback
Message Queue Online Training Inference
dynamic model
PreprocessingOffline path Online path
7
“Lambda” architecture for ML
• Offline training: a static base model
• Online training: incremental updates to the base model
• Users have to deal with different systems / code bases
Message Queue
Offline Training
Online Training
Inference
Dynamic model
Static
model
Preprocessing
HDFS Preprocessing
Offline path
Online path
Static model
8
Value of Flink
• The inference is latency sensitive online / nearline processing
• Flink is a good option in this case
Message Queue
Offline Training
Online Training
Inference
Dynamic model
Static
model
Preprocessing
HDFS Preprocessing
Offline path
Online path
Static model
9
Batch-Stream Unification in ML
• The online inference is latency sensitive online / nearline processing
• Flink is a good option in this case
• Use Flink everywhere to avoid maintaining different code bases.
Message Queue
Offline Training
Online Training
Inference
Dynamic model
Static
model
Preprocessing
HDFS Preprocessing
Offline path
Online path
Static model
10
Additional Values
• One-stop data processing solution
• Shared dataset management
• Switch processing APIs freely
Dataset Management
DataStreamSQL ML CEP
Flink AI Ecosystem By ML Stages
Rich connector
support &
Dataset
management
Stream-Batch unification
Strong SQL support
Enhanced Iteration
Flink ML Lib
DL on Flink (TF, PyTorch)
Dynamic model serving
Model Management
Rollout / Rollback
Online monitoring
Online evaluation
Message
Queue
Offline Training
Online Training
Inference
Dynamicmodel
Static
model
Preprocessing
HDFS Preprocessing
Offline path Online path
Static
model
Model Validation
Flink ML Pipeline,
Python support
12
Data
Acquisition
Model Training Model Validation &
Serving
InferencePreprocessing
Efforts&RequirementsAIFlowMLStage
Flink AI Ecosystem By ML Stages
Rich connector
support &
Dataset
management
Stream-Batch unification
Strong SQL support
Enhanced Iteration
Flink ML Lib
DL on Flink (TF, PyTorch)
Dynamic model serving
Model Management
Rollout / Rollback
Online monitoring
Online evaluation
Message
Queue
Offline Training
Online Training
Inference
Dynamicmodel
Static
model
Preprocessing
HDFS Preprocessing
Offline path Online path
Static
model
Model Validation
Flink ML Pipeline,
Python support
13
Data
Acquisition
Model Training Model Validation &
Serving
InferencePreprocessing
Efforts&RequirementsAIFlowMLStage
Agenda
• Why AI Ecosystem on Flink?
• Flink ML Pipeline & Flink ML Libs
• Deep learning on Flink
• Enhanced Iteration & Dynamic Model Serving
• Better Python support
14
Flink ML Pipeline - Overview
PipelineStage
EstimatorTransformer
Model
K-Means
NaiveBayes
Linear
regression
DecisionTree
RandomForest
GBDT
Table based ML Pipeline
EstimatorTransformer
table2=Transformer.
transform(table1) Estimator.fit(table2)
ML Lib Developers ML Lib Users
……
Input
Table
Output
Table
15
Data -> Data transition
(Preprocessing, Inference)
Data -> Model transition
(Model Training)
K-Means
NaiveBayes
Linear
regression
GBDT
DecisionTree
PCA
Random
Forest
Correlation
ML libs
……
Rewrite Flink ML Libs
• ML pipeline based
• Table API based
• Battle tested algorithms
Flink ML Libs
16
Training
Inference
Estimator Model
Estimator.fit(input1)
Input1: Table
Model
Result
Table
Model.transform(input2)
Input2: Table
pipeline.fit(input1)
pipeline.transform(input2)
ML Pipeline - Simple Case
17
EstimatorTransformer
output1=Transformer.
transform(input1)
Estimator Pipeline
pipeline.fit(input1)
Estimator.fit(output1)
pipeline.transform(input2)
Model.transform(output2)
Result Table
ModelTransformerInput1: TableTraining
ModelTransformer
output2=Transformer.
transform(input2)
Model Pipeline
Input2: Table
Model Pipeline
Inference
ML Pipeline
18
Value of Flink ML Pipeline
• Unify APIs of Model Training and Inference for the end users
• End users only needs to deal with either Estimators or Transformers
• Ensure consistent logic between training and inference
• The same pipeline topology in training will be persisted and used for inference
19
Agenda
• Why AI Ecosystem on Flink?
• Flink ML Pipeline & Flink ML Libs
• Deep learning on Flink
• Enhanced Iteration & Dynamic Model Serving
• Better Python support
20
Data Acquisition
Data Process and
Transformation
Model Training Test and Validation Model Serving
Model or Params
Tuning
Deep Learning Pipeline
21
Distributed TF framework in a Cluster/Environment
WORKER WORKER WORKER
PS PS
Resulting
Model
One Flink job in Cluster/Environment
SOURCE
SOURCE
JOIN UDTF
External
Storage
Queue
>>> >>>
Data Acquisition
Data Process and
Transformation
Model Training
Deep Learning Pipeline
22
Data Acquisition
Data Process and
Transformation
Model Training Test and Validation Model Serving
Model or Params
Tuning
Deep Learning Pipeline
23
One single Flink job in a Cluster/Environment
Distributed TF framework in a Cluster/Environment
WORKER WORKER WORKER
PS PS
Resulting
Model
SOURCE
SOURCE
JOIN UDTF WORKER
PS PS
WORKER WORKER
One Flink job in Cluster/Environment
SOURCE
SOURCE
JOIN UDTF
External
Storage
Queue
>>> >>>
Resulting
Model
TensorFlow-Flink Integration
24
DL on Flink and ML Pipeline integration
One single Flink job in a Cluster/Environment
SOURCE
SOURCE
JOIN UDTF WORKER
PS PS
WORKER WORKER
Resulting
Model
Transformer Estimator
The ML Pipeline API could be used for both traditional ML and deep learning.
25
Agenda
• Why AI Ecosystem on Flink?
• Flink ML Pipeline & Flink ML Libs
• Deep learning on Flink
• Enhanced Iteration & Dynamic Model Serving
• Better Python support
26
• Native iteration implemented by the processing engine
• Feedback edge on the processing DAG
• Improve the caveats in DataSet / DataStream iterations
Flink Cluster
Partition 1
Partition 2
Partition 3
Partition N
…
map
map
map
map
…
Enhance Iteration in Flink
27
{
val a: Table = ...
val b: Table = ...
val resultSeq = Table.iterate(a, b) {
val next_a = b.select('v_b + 1 as 'v_a)
val next_b = next_a.select('v_a * 2 as 'v_b)
Seq(next_a, next_b)
}.times(10)
}
Iteration variables
Step function
Termination condition
Multi-variable iteration
28
{
val a: Table = ...
val b: Table = ...
val resultSeq = iterate(a, b) {
val next_a = iterate(a) {
Seq(a.select(‘v_a + 1 as 'v_a))
}.times(100).head
val next_b = next_a.select('v_a * 2 as 'v_b)
Seq(next_a, next_b)
}.times(10)
}
Nested Iteration
29
Mini-batch iteration
• A stream is chunked in to multiple mini-batches
• Each mini-batch iterates independently in the iteration loop
• The results are emitted in the mini-batch order
MB3
MB2
MB1
Flink Cluster
Partition 1
Partition 2
Partition 3
Partition N
…
map
map
map
map
…
MB2 MB1
30
Mini-batch iteration
• Native support for Stochastic Gradient Descendent (SGD)
• Native support for online learning
31
Iteration and Dynamic Model Update
Model
Initial model
Samples
Gradient
Computing
Gradient
Reduce
Model_V1
Model_V2
Model_V3
…
Final Model
32
Iteration and Dynamic Model Update
Model
Initial model
Samples
Gradient
Computing
Gradient
Reduce
Model_V1
Model_V2
Model_V3
…
Final Model
33
Dynamic Model Serving
Message
Queue
Offline Training
Online Training
Dynamicmodel
Static
model
Preprocessing
HDFS Preprocessing Static model
Model Validation
Samples
Inference
Model_V1
Model_V2
Model_V3
…
The exact same mechanism of native iteration could be used for dynamic model serving.
34
Agenda
• Why AI Ecosystem on Flink?
• Flink ML Pipeline & Flink ML Libs
• Deep learning on Flink
• Enhanced Iteration & Dynamic Model Serving
• Better Python support
35
Python
process
Java process
input
Python Table API Python UDF
Python
TableAPI
Java
gateway
Server
RPC (Py4j)
Python
gateway
Python VM
DAGGragh
upstream input
downstream output
output
Flink Python Table API
36
Working with Apache Beam Community
More Python API Support
• Flink ML Pipeline
• Flink-AI-Extended
• DataStream
37
Summary
• Flink has unique values in AI use case
• Flink suits very well in the “lambda” ML architecture
• Multiple ongoing works to make Flink more AI friendly
• Flink ML Pipeline
• Flink ML Libs
• Deep learning on Flink
• Iteration enhancement
• Python API
• …
38
Q & A
We are hiring!!
becket.qin@gmail.com

More Related Content

What's hot

Deep dive into highly available open stack architecture openstack summit va...
Deep dive into highly available open stack architecture   openstack summit va...Deep dive into highly available open stack architecture   openstack summit va...
Deep dive into highly available open stack architecture openstack summit va...Arthur Berezin
 
Combining logs, metrics, and traces for unified observability
Combining logs, metrics, and traces for unified observabilityCombining logs, metrics, and traces for unified observability
Combining logs, metrics, and traces for unified observabilityElasticsearch
 
AI made easy with Flink AI Flow
AI made easy with Flink AI FlowAI made easy with Flink AI Flow
AI made easy with Flink AI FlowJiangjie Qin
 
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022HostedbyConfluent
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flinkmxmxm
 
Monitoring MySQL with Prometheus and Grafana
Monitoring MySQL with Prometheus and GrafanaMonitoring MySQL with Prometheus and Grafana
Monitoring MySQL with Prometheus and GrafanaJulien Pivotto
 
Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Araf Karsh Hamid
 
RedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ TwitterRedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ TwitterRedis Labs
 
MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks EDB
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Slim Baltagi
 
Dynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisationDynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisationOri Reshef
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache icebergAlluxio, Inc.
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoopclairvoyantllc
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
 
Prometheus – a next-gen Monitoring System
Prometheus – a next-gen Monitoring SystemPrometheus – a next-gen Monitoring System
Prometheus – a next-gen Monitoring SystemFabian Reinartz
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Hyperloglog Project
Hyperloglog ProjectHyperloglog Project
Hyperloglog ProjectKendrick Lo
 
Getting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaGetting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaSyah Dwi Prihatmoko
 

What's hot (20)

Deep dive into highly available open stack architecture openstack summit va...
Deep dive into highly available open stack architecture   openstack summit va...Deep dive into highly available open stack architecture   openstack summit va...
Deep dive into highly available open stack architecture openstack summit va...
 
Prometheus + Grafana = Awesome Monitoring
Prometheus + Grafana = Awesome MonitoringPrometheus + Grafana = Awesome Monitoring
Prometheus + Grafana = Awesome Monitoring
 
Combining logs, metrics, and traces for unified observability
Combining logs, metrics, and traces for unified observabilityCombining logs, metrics, and traces for unified observability
Combining logs, metrics, and traces for unified observability
 
AI made easy with Flink AI Flow
AI made easy with Flink AI FlowAI made easy with Flink AI Flow
AI made easy with Flink AI Flow
 
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
 
Monitoring MySQL with Prometheus and Grafana
Monitoring MySQL with Prometheus and GrafanaMonitoring MySQL with Prometheus and Grafana
Monitoring MySQL with Prometheus and Grafana
 
Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics
 
RedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ TwitterRedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ Twitter
 
MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink
 
Dynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisationDynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisation
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
Prometheus – a next-gen Monitoring System
Prometheus – a next-gen Monitoring SystemPrometheus – a next-gen Monitoring System
Prometheus – a next-gen Monitoring System
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Hyperloglog Project
Hyperloglog ProjectHyperloglog Project
Hyperloglog Project
 
Getting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaGetting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and Grafana
 

Similar to [FFE19] Build a Flink AI Ecosystem

Machine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning GroupMachine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning GroupTill Rohrmann
 
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsMachine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsLightbend
 
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen LiTowards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen LiBowen Li
 
Operationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML ModelsOperationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML ModelsLightbend
 
K. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteK. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteFlink Forward
 
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...Stephan Ewen
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Robert Metzger
 
FlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache FlinkFlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache FlinkTheodoros Vasiloudis
 
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data ProcessingApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data ProcessingFabian Hueske
 
Chicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureChicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureRobert Metzger
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Chris Fregly
 
First Flink Bay Area meetup
First Flink Bay Area meetupFirst Flink Bay Area meetup
First Flink Bay Area meetupKostas Tzoumas
 
running Tensorflow in Production
running Tensorflow in Productionrunning Tensorflow in Production
running Tensorflow in ProductionMatthias Feys
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonStephan Ewen
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Jim Dowling
 
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemDatabricks
 
Clipper at UC Berkeley RISECamp 2017
Clipper at UC Berkeley RISECamp 2017Clipper at UC Berkeley RISECamp 2017
Clipper at UC Berkeley RISECamp 2017Dan Crankshaw
 
Scaling Machine Learning to Billions of Parameters - Spark Summit 2016
Scaling Machine Learning to Billions of Parameters - Spark Summit 2016Scaling Machine Learning to Billions of Parameters - Spark Summit 2016
Scaling Machine Learning to Billions of Parameters - Spark Summit 2016Badri Narayan Bhaskar
 
Scaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersScaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersJen Aman
 

Similar to [FFE19] Build a Flink AI Ecosystem (20)

Machine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning GroupMachine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning Group
 
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsMachine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
 
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen LiTowards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
 
Operationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML ModelsOperationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML Models
 
K. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteK. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward Keynote
 
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
 
FlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache FlinkFlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache Flink
 
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data ProcessingApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
 
Chicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureChicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architecture
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
 
First Flink Bay Area meetup
First Flink Bay Area meetupFirst Flink Bay Area meetup
First Flink Bay Area meetup
 
running Tensorflow in Production
running Tensorflow in Productionrunning Tensorflow in Production
running Tensorflow in Production
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World London
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks
 
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving System
 
Clipper at UC Berkeley RISECamp 2017
Clipper at UC Berkeley RISECamp 2017Clipper at UC Berkeley RISECamp 2017
Clipper at UC Berkeley RISECamp 2017
 
Scaling Machine Learning to Billions of Parameters - Spark Summit 2016
Scaling Machine Learning to Billions of Parameters - Spark Summit 2016Scaling Machine Learning to Billions of Parameters - Spark Summit 2016
Scaling Machine Learning to Billions of Parameters - Spark Summit 2016
 
Scaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersScaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of Parameters
 

Recently uploaded

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 

Recently uploaded (20)

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

[FFE19] Build a Flink AI Ecosystem

  • 1. Build a Flink AI Ecosystem Jiangjie (Becket) Qin Flink Forward Berlin 2019
  • 2. Agenda • Why AI Ecosystem on Flink? • Flink ML Pipeline & Flink ML Libs • Deep learning on Flink • Enhanced Iteration & Dynamic Model Serving • Better Python support 2
  • 3. Lambda - what’s everyone doing HDFS Message Queue Batch Processing Stream Processing Combine the results Query Result Offline path Online path 3 Batch Layer Speed Layer Serving Layer
  • 4. Lambda - what’s everyone doing HDFS Message Queue Batch Processing (Spark/M-R) Stream Processing (Flink/Storm) Combine the results Query Result • Two code bases for online and offline processing logic • High maintenance cost • Difficult to ensure consistent processing logic Offline path Online path 4
  • 5. Batch-Stream Processing Unification • Use the same engine for online and offline processing • Spark • Flink HDFS Message Queue Batch Processing (Flink/Spark) Stream Processing (Flink/Spark Streaming) Combine the results Query Result Offline path Online path 5
  • 6. So what about ML? • A typical ML scenario • Offline training (TF, PyTorch, etc) • Static models • Online inference (Flink) • The data preprocessing logic in training and inference are often two code bases HDFS Offline Training Inference Static model Preprocessing PreprocessingOffline path Online path 6
  • 7. So what about ML? • Online training is gaining popularity • More prompt model update • Dynamic model and continuous training • Progressive validation • More sophisticated monitoring and model deployment / rollback Message Queue Online Training Inference dynamic model PreprocessingOffline path Online path 7
  • 8. “Lambda” architecture for ML • Offline training: a static base model • Online training: incremental updates to the base model • Users have to deal with different systems / code bases Message Queue Offline Training Online Training Inference Dynamic model Static model Preprocessing HDFS Preprocessing Offline path Online path Static model 8
  • 9. Value of Flink • The inference is latency sensitive online / nearline processing • Flink is a good option in this case Message Queue Offline Training Online Training Inference Dynamic model Static model Preprocessing HDFS Preprocessing Offline path Online path Static model 9
  • 10. Batch-Stream Unification in ML • The online inference is latency sensitive online / nearline processing • Flink is a good option in this case • Use Flink everywhere to avoid maintaining different code bases. Message Queue Offline Training Online Training Inference Dynamic model Static model Preprocessing HDFS Preprocessing Offline path Online path Static model 10
  • 11. Additional Values • One-stop data processing solution • Shared dataset management • Switch processing APIs freely Dataset Management DataStreamSQL ML CEP
  • 12. Flink AI Ecosystem By ML Stages Rich connector support & Dataset management Stream-Batch unification Strong SQL support Enhanced Iteration Flink ML Lib DL on Flink (TF, PyTorch) Dynamic model serving Model Management Rollout / Rollback Online monitoring Online evaluation Message Queue Offline Training Online Training Inference Dynamicmodel Static model Preprocessing HDFS Preprocessing Offline path Online path Static model Model Validation Flink ML Pipeline, Python support 12 Data Acquisition Model Training Model Validation & Serving InferencePreprocessing Efforts&RequirementsAIFlowMLStage
  • 13. Flink AI Ecosystem By ML Stages Rich connector support & Dataset management Stream-Batch unification Strong SQL support Enhanced Iteration Flink ML Lib DL on Flink (TF, PyTorch) Dynamic model serving Model Management Rollout / Rollback Online monitoring Online evaluation Message Queue Offline Training Online Training Inference Dynamicmodel Static model Preprocessing HDFS Preprocessing Offline path Online path Static model Model Validation Flink ML Pipeline, Python support 13 Data Acquisition Model Training Model Validation & Serving InferencePreprocessing Efforts&RequirementsAIFlowMLStage
  • 14. Agenda • Why AI Ecosystem on Flink? • Flink ML Pipeline & Flink ML Libs • Deep learning on Flink • Enhanced Iteration & Dynamic Model Serving • Better Python support 14
  • 15. Flink ML Pipeline - Overview PipelineStage EstimatorTransformer Model K-Means NaiveBayes Linear regression DecisionTree RandomForest GBDT Table based ML Pipeline EstimatorTransformer table2=Transformer. transform(table1) Estimator.fit(table2) ML Lib Developers ML Lib Users …… Input Table Output Table 15 Data -> Data transition (Preprocessing, Inference) Data -> Model transition (Model Training)
  • 16. K-Means NaiveBayes Linear regression GBDT DecisionTree PCA Random Forest Correlation ML libs …… Rewrite Flink ML Libs • ML pipeline based • Table API based • Battle tested algorithms Flink ML Libs 16
  • 17. Training Inference Estimator Model Estimator.fit(input1) Input1: Table Model Result Table Model.transform(input2) Input2: Table pipeline.fit(input1) pipeline.transform(input2) ML Pipeline - Simple Case 17
  • 18. EstimatorTransformer output1=Transformer. transform(input1) Estimator Pipeline pipeline.fit(input1) Estimator.fit(output1) pipeline.transform(input2) Model.transform(output2) Result Table ModelTransformerInput1: TableTraining ModelTransformer output2=Transformer. transform(input2) Model Pipeline Input2: Table Model Pipeline Inference ML Pipeline 18
  • 19. Value of Flink ML Pipeline • Unify APIs of Model Training and Inference for the end users • End users only needs to deal with either Estimators or Transformers • Ensure consistent logic between training and inference • The same pipeline topology in training will be persisted and used for inference 19
  • 20. Agenda • Why AI Ecosystem on Flink? • Flink ML Pipeline & Flink ML Libs • Deep learning on Flink • Enhanced Iteration & Dynamic Model Serving • Better Python support 20
  • 21. Data Acquisition Data Process and Transformation Model Training Test and Validation Model Serving Model or Params Tuning Deep Learning Pipeline 21
  • 22. Distributed TF framework in a Cluster/Environment WORKER WORKER WORKER PS PS Resulting Model One Flink job in Cluster/Environment SOURCE SOURCE JOIN UDTF External Storage Queue >>> >>> Data Acquisition Data Process and Transformation Model Training Deep Learning Pipeline 22
  • 23. Data Acquisition Data Process and Transformation Model Training Test and Validation Model Serving Model or Params Tuning Deep Learning Pipeline 23
  • 24. One single Flink job in a Cluster/Environment Distributed TF framework in a Cluster/Environment WORKER WORKER WORKER PS PS Resulting Model SOURCE SOURCE JOIN UDTF WORKER PS PS WORKER WORKER One Flink job in Cluster/Environment SOURCE SOURCE JOIN UDTF External Storage Queue >>> >>> Resulting Model TensorFlow-Flink Integration 24
  • 25. DL on Flink and ML Pipeline integration One single Flink job in a Cluster/Environment SOURCE SOURCE JOIN UDTF WORKER PS PS WORKER WORKER Resulting Model Transformer Estimator The ML Pipeline API could be used for both traditional ML and deep learning. 25
  • 26. Agenda • Why AI Ecosystem on Flink? • Flink ML Pipeline & Flink ML Libs • Deep learning on Flink • Enhanced Iteration & Dynamic Model Serving • Better Python support 26
  • 27. • Native iteration implemented by the processing engine • Feedback edge on the processing DAG • Improve the caveats in DataSet / DataStream iterations Flink Cluster Partition 1 Partition 2 Partition 3 Partition N … map map map map … Enhance Iteration in Flink 27
  • 28. { val a: Table = ... val b: Table = ... val resultSeq = Table.iterate(a, b) { val next_a = b.select('v_b + 1 as 'v_a) val next_b = next_a.select('v_a * 2 as 'v_b) Seq(next_a, next_b) }.times(10) } Iteration variables Step function Termination condition Multi-variable iteration 28
  • 29. { val a: Table = ... val b: Table = ... val resultSeq = iterate(a, b) { val next_a = iterate(a) { Seq(a.select(‘v_a + 1 as 'v_a)) }.times(100).head val next_b = next_a.select('v_a * 2 as 'v_b) Seq(next_a, next_b) }.times(10) } Nested Iteration 29
  • 30. Mini-batch iteration • A stream is chunked in to multiple mini-batches • Each mini-batch iterates independently in the iteration loop • The results are emitted in the mini-batch order MB3 MB2 MB1 Flink Cluster Partition 1 Partition 2 Partition 3 Partition N … map map map map … MB2 MB1 30
  • 31. Mini-batch iteration • Native support for Stochastic Gradient Descendent (SGD) • Native support for online learning 31
  • 32. Iteration and Dynamic Model Update Model Initial model Samples Gradient Computing Gradient Reduce Model_V1 Model_V2 Model_V3 … Final Model 32
  • 33. Iteration and Dynamic Model Update Model Initial model Samples Gradient Computing Gradient Reduce Model_V1 Model_V2 Model_V3 … Final Model 33
  • 34. Dynamic Model Serving Message Queue Offline Training Online Training Dynamicmodel Static model Preprocessing HDFS Preprocessing Static model Model Validation Samples Inference Model_V1 Model_V2 Model_V3 … The exact same mechanism of native iteration could be used for dynamic model serving. 34
  • 35. Agenda • Why AI Ecosystem on Flink? • Flink ML Pipeline & Flink ML Libs • Deep learning on Flink • Enhanced Iteration & Dynamic Model Serving • Better Python support 35
  • 36. Python process Java process input Python Table API Python UDF Python TableAPI Java gateway Server RPC (Py4j) Python gateway Python VM DAGGragh upstream input downstream output output Flink Python Table API 36 Working with Apache Beam Community
  • 37. More Python API Support • Flink ML Pipeline • Flink-AI-Extended • DataStream 37
  • 38. Summary • Flink has unique values in AI use case • Flink suits very well in the “lambda” ML architecture • Multiple ongoing works to make Flink more AI friendly • Flink ML Pipeline • Flink ML Libs • Deep learning on Flink • Iteration enhancement • Python API • … 38
  • 39. Q & A We are hiring!! becket.qin@gmail.com