SlideShare a Scribd company logo
1 of 13
Powering Machine Learning
EXPERIENCES WITH STREAMING & MICRO-BATCH FOR ONLINE LEARNING
Swaminathan Sundararaman
FlinkForward 2017
2
The Challenge of Today’s Analytics Trajectory
Edges benefit from real-time online learning and/or inference
IoT is Driving Explosive Growth in Data Volume
“Things” Edge and network Datacenter/Cloud
Data lake
3
• Real-world data is unpredictable and bursty
o Data behavior changes (different time of day, special events, flash crowds, etc.)
• Data behavior changes require retraining & model updates
o Updating models offline can be expensive (compute, retraining)
• Online algorithms retrain on the fly with real-time data
o Lightweight, low compute and memory requirements
o Better accuracy through continuous learning
• Online algorithms are more accurate, especially with data
behavior changes
Real-Time Intelligence: Online Algorithm Advantages
4
Experience Building ML Algorithms on Flink 1.0
• Built both Offline(Batch) and Online algorithms
o Batch Algorithms (Examples: KMeans, PCA, and Random Forest)
o Online Algorithms (Examples: Online KMeans, Online SVM)
• Uses many of the Flink DataStream primitives:
o DataStream APIs are sufficient and primitives are generic for ML algorithms.
o CoFlatMaps, Windows, Collect, Iterations, etc.
• We have also added Python Streaming API support in Flink and
are working with dataArtisans to contribute it to upstream Flink.
5
Example: Online SVM Algorithm
/* Co-map to update local model(s) when new data arrives and also
create the shared model when a pre-defined threshold is met */
private case class SVMModelCoMap(...) {
/* flatMap1 processes new elements and updates local model*/
def flatMap1(data: LabeledVector[Double],
out: Collector[Model]) {
. . .
}
/* flatMap2 accumulates local models and creates a new model
(with decay) once all local models are received */
def flatMap2(currentModel: Model, out: Collector[Model]) {
. . .
}
}
object OnlineSVM {
. . .
def main(args: Array[String]): Unit = {
// initialize input arguments and connectors
. . .
}
}
DataStream
FM1 FM1
FM2 FM2
M M
Task Slot Task Slot
M MM M
M M
Aggregated and local models
combined with decay factor
6
• A server providing VoD services to VLC (i.e., media player) clients
o Clients request videos of different sizes at different times
o Server statistics used to predict violations
• SLA violation: service level drops below predetermined threshold
Telco Example: Measuring SLA Violations
Dataset
Labels for
training
7
• Load patterns – Flashcrowd, Periodic
• Delivered to Flink and Spark as live stream in experiments
Dataset
(https://arxiv.org/pdf/1509.01386.pdf)
CPU
Utilization
Memory/Swap I/O
Transactions
Block I/O
operations
Process
Statistics
Network
Statistics
CPU Idle Mem Used Read
transactions/s
Block
Reads/s
New
Processes/s
Received
packets/s
CPU User Mem
Committed
Write
transactions/s
Block
Writes/s
Context
Switches/s
Transmitted
Packets/s
CPU System Swap Used Bytes Read/s Received Data
(KB)/s
CPU IO_Wait Swap Cached Bytes Written/s Transmitted Data
(KB)/s
Interface
Utilization %
8
When load pattern remains static (unchanged),
Online algorithms can be as accurate as Offline algorithms
Fixed workloads – Online vs Offline (Batch)
Load Scenario Offline (LibSVM)
Accuracy
Offline (Pegasos)
Accuracy
Online SVM
Accuracy
flashcrowd_load 0.843 0.915 0.943
periodic_load 0.788 0.867 0.927
constant_load 0.999 0.999 0.999
poisson_load 0.963 0.963 0.971
9
Online SVM vs Batch (Offline) SVM – both in FlinkAccumulatedErrorRate
Time
Load Change
Until retraining occurs, changing data
results in lower accuracy model
Training Workload
Real-World
Workload
0%
5%
10%
15%
20%
25%
Network SLA Violation Prediction
Offline SVM
Online SVM
Online Algorithm
retrains on the fly
reduces error rate
Online algorithms quickly adapt to workload changes
10
Throughput: Online SVM in Streams and Micro-Batch
23.32 26.58
46.29 39.4444.69
85.11
173.91
333.33
0.00
50.00
100.00
150.00
200.00
250.00
300.00
350.00
1 2 4 8
ThousandsofOperationsperSecond
Number of Nodes
Throughput for processing samples with 256 attributes from Kafka
Spark 2.0 Flink 1.0.3
1.9x
3.2x
3.8x
8.5x
Notable performance improvement over micro-batch based solution
11
Latency: Online SVM in Streams & Micro-batch
1.9x
3.2x
3.8x
8.5x
0.01
0.1
1
10
100
Latency(secs)
Time
Spark - 1s ubatch Spark - 0.1s ubatch Spark 0.01s ubatch Spark 10s ubatch Flink 1.0.3
Low and predictable latency as needed in Edge
12
Edge computing & Online learning are needed for real-time analytics
• Edge Computing: minimizes the excessive latencies, reaction time
• Online learning: can dynamically adapt to changing data / behavior
Online machine learning with streaming on Flink
• Supports low latency processing with scaling across multiple nodes
• Using real world data, demonstrate improved accuracy over offline
algorithms
Conclusions
13
Parallel Machines
The Machine Learning Management Solution
info@parallelmachines.com

More Related Content

More from Flink Forward

More from Flink Forward (20)

Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
 
Large Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior DetectionLarge Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior Detection
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 

Recently uploaded

Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
ptikerjasaptiker
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
vexqp
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
vexqp
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
wsppdmt
 

Recently uploaded (20)

Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 

Flink Forward SF 2017: Swaminathan Sundararaman - Experiences with Streaming vs Micro-Batch for Online Learning

  • 1. Powering Machine Learning EXPERIENCES WITH STREAMING & MICRO-BATCH FOR ONLINE LEARNING Swaminathan Sundararaman FlinkForward 2017
  • 2. 2 The Challenge of Today’s Analytics Trajectory Edges benefit from real-time online learning and/or inference IoT is Driving Explosive Growth in Data Volume “Things” Edge and network Datacenter/Cloud Data lake
  • 3. 3 • Real-world data is unpredictable and bursty o Data behavior changes (different time of day, special events, flash crowds, etc.) • Data behavior changes require retraining & model updates o Updating models offline can be expensive (compute, retraining) • Online algorithms retrain on the fly with real-time data o Lightweight, low compute and memory requirements o Better accuracy through continuous learning • Online algorithms are more accurate, especially with data behavior changes Real-Time Intelligence: Online Algorithm Advantages
  • 4. 4 Experience Building ML Algorithms on Flink 1.0 • Built both Offline(Batch) and Online algorithms o Batch Algorithms (Examples: KMeans, PCA, and Random Forest) o Online Algorithms (Examples: Online KMeans, Online SVM) • Uses many of the Flink DataStream primitives: o DataStream APIs are sufficient and primitives are generic for ML algorithms. o CoFlatMaps, Windows, Collect, Iterations, etc. • We have also added Python Streaming API support in Flink and are working with dataArtisans to contribute it to upstream Flink.
  • 5. 5 Example: Online SVM Algorithm /* Co-map to update local model(s) when new data arrives and also create the shared model when a pre-defined threshold is met */ private case class SVMModelCoMap(...) { /* flatMap1 processes new elements and updates local model*/ def flatMap1(data: LabeledVector[Double], out: Collector[Model]) { . . . } /* flatMap2 accumulates local models and creates a new model (with decay) once all local models are received */ def flatMap2(currentModel: Model, out: Collector[Model]) { . . . } } object OnlineSVM { . . . def main(args: Array[String]): Unit = { // initialize input arguments and connectors . . . } } DataStream FM1 FM1 FM2 FM2 M M Task Slot Task Slot M MM M M M Aggregated and local models combined with decay factor
  • 6. 6 • A server providing VoD services to VLC (i.e., media player) clients o Clients request videos of different sizes at different times o Server statistics used to predict violations • SLA violation: service level drops below predetermined threshold Telco Example: Measuring SLA Violations Dataset Labels for training
  • 7. 7 • Load patterns – Flashcrowd, Periodic • Delivered to Flink and Spark as live stream in experiments Dataset (https://arxiv.org/pdf/1509.01386.pdf) CPU Utilization Memory/Swap I/O Transactions Block I/O operations Process Statistics Network Statistics CPU Idle Mem Used Read transactions/s Block Reads/s New Processes/s Received packets/s CPU User Mem Committed Write transactions/s Block Writes/s Context Switches/s Transmitted Packets/s CPU System Swap Used Bytes Read/s Received Data (KB)/s CPU IO_Wait Swap Cached Bytes Written/s Transmitted Data (KB)/s Interface Utilization %
  • 8. 8 When load pattern remains static (unchanged), Online algorithms can be as accurate as Offline algorithms Fixed workloads – Online vs Offline (Batch) Load Scenario Offline (LibSVM) Accuracy Offline (Pegasos) Accuracy Online SVM Accuracy flashcrowd_load 0.843 0.915 0.943 periodic_load 0.788 0.867 0.927 constant_load 0.999 0.999 0.999 poisson_load 0.963 0.963 0.971
  • 9. 9 Online SVM vs Batch (Offline) SVM – both in FlinkAccumulatedErrorRate Time Load Change Until retraining occurs, changing data results in lower accuracy model Training Workload Real-World Workload 0% 5% 10% 15% 20% 25% Network SLA Violation Prediction Offline SVM Online SVM Online Algorithm retrains on the fly reduces error rate Online algorithms quickly adapt to workload changes
  • 10. 10 Throughput: Online SVM in Streams and Micro-Batch 23.32 26.58 46.29 39.4444.69 85.11 173.91 333.33 0.00 50.00 100.00 150.00 200.00 250.00 300.00 350.00 1 2 4 8 ThousandsofOperationsperSecond Number of Nodes Throughput for processing samples with 256 attributes from Kafka Spark 2.0 Flink 1.0.3 1.9x 3.2x 3.8x 8.5x Notable performance improvement over micro-batch based solution
  • 11. 11 Latency: Online SVM in Streams & Micro-batch 1.9x 3.2x 3.8x 8.5x 0.01 0.1 1 10 100 Latency(secs) Time Spark - 1s ubatch Spark - 0.1s ubatch Spark 0.01s ubatch Spark 10s ubatch Flink 1.0.3 Low and predictable latency as needed in Edge
  • 12. 12 Edge computing & Online learning are needed for real-time analytics • Edge Computing: minimizes the excessive latencies, reaction time • Online learning: can dynamically adapt to changing data / behavior Online machine learning with streaming on Flink • Supports low latency processing with scaling across multiple nodes • Using real world data, demonstrate improved accuracy over offline algorithms Conclusions
  • 13. 13 Parallel Machines The Machine Learning Management Solution info@parallelmachines.com