This document discusses building an AI ecosystem on Apache Flink. It proposes unifying batch and stream processing with Flink to avoid maintaining separate code bases. It also proposes using Flink throughout the machine learning pipeline for data acquisition, model training, validation, and serving. This includes enhancing Flink's support for deep learning, iteration, dynamic model serving, and Python APIs. The goal is to provide a one-stop solution for all data and machine learning processing needs within a single system and code base.
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
[FFE19] Build a Flink AI Ecosystem
1. Build a Flink AI Ecosystem
Jiangjie (Becket) Qin
Flink Forward Berlin 2019
2. Agenda
• Why AI Ecosystem on Flink?
• Flink ML Pipeline & Flink ML Libs
• Deep learning on Flink
• Enhanced Iteration & Dynamic Model Serving
• Better Python support
2
4. Lambda - what’s everyone doing
HDFS
Message Queue
Batch Processing
(Spark/M-R)
Stream Processing
(Flink/Storm)
Combine the
results
Query Result
• Two code bases for online and offline processing logic
• High maintenance cost
• Difficult to ensure consistent processing logic
Offline path
Online path
4
5. Batch-Stream Processing Unification
• Use the same engine for online and offline processing
• Spark
• Flink
HDFS
Message Queue
Batch Processing
(Flink/Spark)
Stream Processing
(Flink/Spark Streaming)
Combine the
results
Query Result
Offline path
Online path
5
6. So what about ML?
• A typical ML scenario
• Offline training (TF, PyTorch, etc)
• Static models
• Online inference (Flink)
• The data preprocessing logic in training and inference are often two
code bases
HDFS Offline Training Inference
Static model
Preprocessing
PreprocessingOffline path Online path
6
7. So what about ML?
• Online training is gaining popularity
• More prompt model update
• Dynamic model and continuous training
• Progressive validation
• More sophisticated monitoring and model deployment / rollback
Message Queue Online Training Inference
dynamic model
PreprocessingOffline path Online path
7
8. “Lambda” architecture for ML
• Offline training: a static base model
• Online training: incremental updates to the base model
• Users have to deal with different systems / code bases
Message Queue
Offline Training
Online Training
Inference
Dynamic model
Static
model
Preprocessing
HDFS Preprocessing
Offline path
Online path
Static model
8
9. Value of Flink
• The inference is latency sensitive online / nearline processing
• Flink is a good option in this case
Message Queue
Offline Training
Online Training
Inference
Dynamic model
Static
model
Preprocessing
HDFS Preprocessing
Offline path
Online path
Static model
9
10. Batch-Stream Unification in ML
• The online inference is latency sensitive online / nearline processing
• Flink is a good option in this case
• Use Flink everywhere to avoid maintaining different code bases.
Message Queue
Offline Training
Online Training
Inference
Dynamic model
Static
model
Preprocessing
HDFS Preprocessing
Offline path
Online path
Static model
10
12. Flink AI Ecosystem By ML Stages
Rich connector
support &
Dataset
management
Stream-Batch unification
Strong SQL support
Enhanced Iteration
Flink ML Lib
DL on Flink (TF, PyTorch)
Dynamic model serving
Model Management
Rollout / Rollback
Online monitoring
Online evaluation
Message
Queue
Offline Training
Online Training
Inference
Dynamicmodel
Static
model
Preprocessing
HDFS Preprocessing
Offline path Online path
Static
model
Model Validation
Flink ML Pipeline,
Python support
12
Data
Acquisition
Model Training Model Validation &
Serving
InferencePreprocessing
Efforts&RequirementsAIFlowMLStage
13. Flink AI Ecosystem By ML Stages
Rich connector
support &
Dataset
management
Stream-Batch unification
Strong SQL support
Enhanced Iteration
Flink ML Lib
DL on Flink (TF, PyTorch)
Dynamic model serving
Model Management
Rollout / Rollback
Online monitoring
Online evaluation
Message
Queue
Offline Training
Online Training
Inference
Dynamicmodel
Static
model
Preprocessing
HDFS Preprocessing
Offline path Online path
Static
model
Model Validation
Flink ML Pipeline,
Python support
13
Data
Acquisition
Model Training Model Validation &
Serving
InferencePreprocessing
Efforts&RequirementsAIFlowMLStage
14. Agenda
• Why AI Ecosystem on Flink?
• Flink ML Pipeline & Flink ML Libs
• Deep learning on Flink
• Enhanced Iteration & Dynamic Model Serving
• Better Python support
14
15. Flink ML Pipeline - Overview
PipelineStage
EstimatorTransformer
Model
K-Means
NaiveBayes
Linear
regression
DecisionTree
RandomForest
GBDT
Table based ML Pipeline
EstimatorTransformer
table2=Transformer.
transform(table1) Estimator.fit(table2)
ML Lib Developers ML Lib Users
……
Input
Table
Output
Table
15
Data -> Data transition
(Preprocessing, Inference)
Data -> Model transition
(Model Training)
19. Value of Flink ML Pipeline
• Unify APIs of Model Training and Inference for the end users
• End users only needs to deal with either Estimators or Transformers
• Ensure consistent logic between training and inference
• The same pipeline topology in training will be persisted and used for inference
19
20. Agenda
• Why AI Ecosystem on Flink?
• Flink ML Pipeline & Flink ML Libs
• Deep learning on Flink
• Enhanced Iteration & Dynamic Model Serving
• Better Python support
20
21. Data Acquisition
Data Process and
Transformation
Model Training Test and Validation Model Serving
Model or Params
Tuning
Deep Learning Pipeline
21
22. Distributed TF framework in a Cluster/Environment
WORKER WORKER WORKER
PS PS
Resulting
Model
One Flink job in Cluster/Environment
SOURCE
SOURCE
JOIN UDTF
External
Storage
Queue
>>> >>>
Data Acquisition
Data Process and
Transformation
Model Training
Deep Learning Pipeline
22
23. Data Acquisition
Data Process and
Transformation
Model Training Test and Validation Model Serving
Model or Params
Tuning
Deep Learning Pipeline
23
24. One single Flink job in a Cluster/Environment
Distributed TF framework in a Cluster/Environment
WORKER WORKER WORKER
PS PS
Resulting
Model
SOURCE
SOURCE
JOIN UDTF WORKER
PS PS
WORKER WORKER
One Flink job in Cluster/Environment
SOURCE
SOURCE
JOIN UDTF
External
Storage
Queue
>>> >>>
Resulting
Model
TensorFlow-Flink Integration
24
25. DL on Flink and ML Pipeline integration
One single Flink job in a Cluster/Environment
SOURCE
SOURCE
JOIN UDTF WORKER
PS PS
WORKER WORKER
Resulting
Model
Transformer Estimator
The ML Pipeline API could be used for both traditional ML and deep learning.
25
26. Agenda
• Why AI Ecosystem on Flink?
• Flink ML Pipeline & Flink ML Libs
• Deep learning on Flink
• Enhanced Iteration & Dynamic Model Serving
• Better Python support
26
27. • Native iteration implemented by the processing engine
• Feedback edge on the processing DAG
• Improve the caveats in DataSet / DataStream iterations
Flink Cluster
Partition 1
Partition 2
Partition 3
Partition N
…
map
map
map
map
…
Enhance Iteration in Flink
27
28. {
val a: Table = ...
val b: Table = ...
val resultSeq = Table.iterate(a, b) {
val next_a = b.select('v_b + 1 as 'v_a)
val next_b = next_a.select('v_a * 2 as 'v_b)
Seq(next_a, next_b)
}.times(10)
}
Iteration variables
Step function
Termination condition
Multi-variable iteration
28
29. {
val a: Table = ...
val b: Table = ...
val resultSeq = iterate(a, b) {
val next_a = iterate(a) {
Seq(a.select(‘v_a + 1 as 'v_a))
}.times(100).head
val next_b = next_a.select('v_a * 2 as 'v_b)
Seq(next_a, next_b)
}.times(10)
}
Nested Iteration
29
30. Mini-batch iteration
• A stream is chunked in to multiple mini-batches
• Each mini-batch iterates independently in the iteration loop
• The results are emitted in the mini-batch order
MB3
MB2
MB1
Flink Cluster
Partition 1
Partition 2
Partition 3
Partition N
…
map
map
map
map
…
MB2 MB1
30
31. Mini-batch iteration
• Native support for Stochastic Gradient Descendent (SGD)
• Native support for online learning
31
32. Iteration and Dynamic Model Update
Model
Initial model
Samples
Gradient
Computing
Gradient
Reduce
Model_V1
Model_V2
Model_V3
…
Final Model
32
33. Iteration and Dynamic Model Update
Model
Initial model
Samples
Gradient
Computing
Gradient
Reduce
Model_V1
Model_V2
Model_V3
…
Final Model
33
34. Dynamic Model Serving
Message
Queue
Offline Training
Online Training
Dynamicmodel
Static
model
Preprocessing
HDFS Preprocessing Static model
Model Validation
Samples
Inference
Model_V1
Model_V2
Model_V3
…
The exact same mechanism of native iteration could be used for dynamic model serving.
34
35. Agenda
• Why AI Ecosystem on Flink?
• Flink ML Pipeline & Flink ML Libs
• Deep learning on Flink
• Enhanced Iteration & Dynamic Model Serving
• Better Python support
35
36. Python
process
Java process
input
Python Table API Python UDF
Python
TableAPI
Java
gateway
Server
RPC (Py4j)
Python
gateway
Python VM
DAGGragh
upstream input
downstream output
output
Flink Python Table API
36
Working with Apache Beam Community
37. More Python API Support
• Flink ML Pipeline
• Flink-AI-Extended
• DataStream
37
38. Summary
• Flink has unique values in AI use case
• Flink suits very well in the “lambda” ML architecture
• Multiple ongoing works to make Flink more AI friendly
• Flink ML Pipeline
• Flink ML Libs
• Deep learning on Flink
• Iteration enhancement
• Python API
• …
38