SlideShare a Scribd company logo
1 of 26
Efficient Migration of Very Large
Distributed State for Scalable
Stream Processing
PhD Candidate: Bonaventura Del Monte
Advisors: Prof. Dr. Volker Markl, Prof. Dr. Tilmann Rabl
PhD Workshop, VLDB 2017
This work has been partially funded by the European Union’s Horizon 2020 research and innovation program under grant agreement n° 687691
Outline
• Research Goal
• Problem Statement
• Proposed Solution
• Research Issues
• Evaluation Plan
• Conclusion and future directions
S1
OP1
S2
OP3
OP2
Distributed Stateful Stream Processing
STATE
STORAGE
UDF
STREAM
PROCESSOR
• State is co-partitioned with the input
stream by key
• State is internally stored and managed
1
State Management In Current Systems
• Fault-tolerance
• Resource elasticity
• Queries maintenance
• Load balancing
• Partitioned State
• Partially Distributed State
• Hundreds of gigabytes
2
• Many analytics executed at same time:
• Machine Learning models, e.g., collaborative filtering, fraud
detection, NLP 100s of GB per model
• Different types of temporal aggregations/joins 100s GB
Motivational Example: a Real-World Deployment
3
Motivational Example: a Real-World Deployment
S1
OP1
S2
OP3
OP2
SINK1
SINK 2
STATE
STATE
STATE
STATE
STATE
STATE
STATE
4
CLUSTER A
Motivational Example: a Real-World Deployment
S1
OP1
S2
OP3
OP2
SINK1
SINK 2
STATE
STATE
STATE
STATE
STATESTATE
STATE
CLUSTER B
COMPUTING
RESOURCES
Add/Remove
Handle failures
Balance load
Migrate
5
Research Goal
• Fault-tolerance
• Resource Elasticity
• Queries Maintenance
• Load Balancing
• Distributed
State a.k.a.
Shared Mutable
State
• Terabyte Sizes
6
Problem Statement
• Fault-tolerance
• Resource Elasticity
• Queries Maintenance
• Load Balancing
• State Transfer
• Consistent state for
exactly once processing
• Robust Query
Performance
7
Proposed Solution
S1
OP1
S2
OP3
OP2
SINK1
SINK 2
• Replication protocol through
incremental checkpoints
8
Proposed Solution
• Replication protocol through
incremental checkpoints
• Optimal placement of replica
groups to minimize migration cost
9
S1
OP1
S2
OP3
OP2
SINK1
SINK 2
Proposed Solution
• Replication protocol through
incremental checkpoints
• Optimal placement of replica
groups to minimize migration cost
• Hand-over protocol
10
S1
OP1
S2
OP3
OP2
SINK1
SINK 2
Hand-Over Protocol
S1
OP1
S2
OP3
OP2
OP4
Primary state
Incremental
Checkpoint
Replica
Group
11
Hand-Over Protocol
S1
OP1
S2
OP3
OP2
OP4
Keys-Move
Marker
12
Hand-Over Protocol
S1
OP1
S2
OP3
OP2
OP4
13
Hand-Over Protocol
S1
OP1
S2
OP3
OP2
OP4
14
Evaluation Plan KPIs
• Protocols with negligible effects on query processing
• Improve resource utilization and prevent bottlenecks
• Consistent exactly-once processing
15
Future directions
• “True” continuous stream processing
• Scaling shared mutable state on HTAP workloads
• New storage and network hardware (e.g., NVRAM, RDMA)
• Data compression and approximation
16
Thank You!
Q&A
• Overall feedback
• Tradeoff: deal with large state by replicating it
• Need of shared mutable state
Back-up Slides
The protocols in action 1/3
The protocols in action 2/3
The protocols in action 3/3
Optimal Placement of Keys Ranges
• Dynamic Hungarian Method
• Why Dynamic? To handle Resource Elasticity
• Rescalable Keys Range as the smallest unit
State of the Art
SEEP Apache Spark AIM Ding et al. Apache Flink Naiad Chrono Stream System X
State
Distribution
Pattern
Distributed Partitioned Distributed Partitioned Partitioned Partitioned Partitioned Distributed
Fault Tolerance
Async local
Checkp. w/ log
recovery
RDD lineage &
RDD interm.
checkp.
Log-based
Periodic
checkpoint.
Upstream
backup & global
Async-
Check.
Sync global
checkpoint
Slice recostr. w/
async delta
checkpoint.
Upstream
backup & Async
incr. Checkp. &
handover
Job Rescaling Threshold Manual Manual N/A Manual Manual
Horizontal
Vertical
Dynamic
Horizontal
Vertical
Load
Balancing
Hash Hash Hash Hash Hash Hash Hash
Hybrid: Hash w/
dynamic repart.

More Related Content

What's hot

Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Flink Forward
 
Flink Forward San Francisco 2019: Developing and operating real-time applicat...
Flink Forward San Francisco 2019: Developing and operating real-time applicat...Flink Forward San Francisco 2019: Developing and operating real-time applicat...
Flink Forward San Francisco 2019: Developing and operating real-time applicat...Flink Forward
 
Apache Flink Training Workshop @ HadoopCon2016 - #4 Advanced Stream Processing
Apache Flink Training Workshop @ HadoopCon2016 - #4 Advanced Stream ProcessingApache Flink Training Workshop @ HadoopCon2016 - #4 Advanced Stream Processing
Apache Flink Training Workshop @ HadoopCon2016 - #4 Advanced Stream ProcessingApache Flink Taiwan User Group
 
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's NextKostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's NextVerverica
 
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...Ververica
 
Second Layer Execution Markets, 7/17/18
Second Layer Execution Markets, 7/17/18Second Layer Execution Markets, 7/17/18
Second Layer Execution Markets, 7/17/18ChronoLogic
 
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]Noam Elfanbaum
 
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...HostedbyConfluent
 

What's hot (8)

Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
 
Flink Forward San Francisco 2019: Developing and operating real-time applicat...
Flink Forward San Francisco 2019: Developing and operating real-time applicat...Flink Forward San Francisco 2019: Developing and operating real-time applicat...
Flink Forward San Francisco 2019: Developing and operating real-time applicat...
 
Apache Flink Training Workshop @ HadoopCon2016 - #4 Advanced Stream Processing
Apache Flink Training Workshop @ HadoopCon2016 - #4 Advanced Stream ProcessingApache Flink Training Workshop @ HadoopCon2016 - #4 Advanced Stream Processing
Apache Flink Training Workshop @ HadoopCon2016 - #4 Advanced Stream Processing
 
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's NextKostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
 
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...
 
Second Layer Execution Markets, 7/17/18
Second Layer Execution Markets, 7/17/18Second Layer Execution Markets, 7/17/18
Second Layer Execution Markets, 7/17/18
 
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
 
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
 

Similar to Efficient Migration of Very Large Distributed State for Scalable Stream Processing

Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingApache Apex
 
BigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache ApexBigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache ApexThomas Weise
 
Stream data from Apache Kafka for processing with Apache Apex
Stream data from Apache Kafka for processing with Apache ApexStream data from Apache Kafka for processing with Apache Apex
Stream data from Apache Kafka for processing with Apache ApexApache Apex
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingIntro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingApache Apex
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...balmanme
 
DevoFlow - Scaling Flow Management for High-Performance Networks
DevoFlow - Scaling Flow Management for High-Performance NetworksDevoFlow - Scaling Flow Management for High-Performance Networks
DevoFlow - Scaling Flow Management for High-Performance NetworksJason TC HOU (侯宗成)
 
Evaluating Cloud vs On-Premises for NGS Clinical Workflows
Evaluating Cloud vs On-Premises for NGS Clinical WorkflowsEvaluating Cloud vs On-Premises for NGS Clinical Workflows
Evaluating Cloud vs On-Premises for NGS Clinical WorkflowsGolden Helix
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...eXascale Infolab
 
OpenDaylight Openflow & OVSDB use cases ODL summit 2016
OpenDaylight Openflow & OVSDB use cases ODL summit 2016OpenDaylight Openflow & OVSDB use cases ODL summit 2016
OpenDaylight Openflow & OVSDB use cases ODL summit 2016abhijit2511
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 
The Search for Gravitational Waves
The Search for Gravitational WavesThe Search for Gravitational Waves
The Search for Gravitational Wavesinside-BigData.com
 
Exascale Deep Learning for Climate Analytics
Exascale Deep Learning for Climate AnalyticsExascale Deep Learning for Climate Analytics
Exascale Deep Learning for Climate Analyticsinside-BigData.com
 
Documented Requirements are not Useless After All!
Documented Requirements are not Useless After All!Documented Requirements are not Useless After All!
Documented Requirements are not Useless After All!Lionel Briand
 
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...balmanme
 
Making_Good_Enough...Better-Addressing_the_Multiple_Objectives_of_High-Perfor...
Making_Good_Enough...Better-Addressing_the_Multiple_Objectives_of_High-Perfor...Making_Good_Enough...Better-Addressing_the_Multiple_Objectives_of_High-Perfor...
Making_Good_Enough...Better-Addressing_the_Multiple_Objectives_of_High-Perfor...John Gunnels
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and ApplicationsApache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and ApplicationsThomas Weise
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications Comsysto Reply GmbH
 

Similar to Efficient Migration of Very Large Distributed State for Scalable Stream Processing (20)

Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark Streaming
 
BigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache ApexBigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache Apex
 
Stream data from Apache Kafka for processing with Apache Apex
Stream data from Apache Kafka for processing with Apache ApexStream data from Apache Kafka for processing with Apache Apex
Stream data from Apache Kafka for processing with Apache Apex
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingIntro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
 
DevoFlow - Scaling Flow Management for High-Performance Networks
DevoFlow - Scaling Flow Management for High-Performance NetworksDevoFlow - Scaling Flow Management for High-Performance Networks
DevoFlow - Scaling Flow Management for High-Performance Networks
 
Evaluating Cloud vs On-Premises for NGS Clinical Workflows
Evaluating Cloud vs On-Premises for NGS Clinical WorkflowsEvaluating Cloud vs On-Premises for NGS Clinical Workflows
Evaluating Cloud vs On-Premises for NGS Clinical Workflows
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
 
OpenDaylight Openflow & OVSDB use cases ODL summit 2016
OpenDaylight Openflow & OVSDB use cases ODL summit 2016OpenDaylight Openflow & OVSDB use cases ODL summit 2016
OpenDaylight Openflow & OVSDB use cases ODL summit 2016
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
 
Play With Streams
Play With StreamsPlay With Streams
Play With Streams
 
The Search for Gravitational Waves
The Search for Gravitational WavesThe Search for Gravitational Waves
The Search for Gravitational Waves
 
Exascale Deep Learning for Climate Analytics
Exascale Deep Learning for Climate AnalyticsExascale Deep Learning for Climate Analytics
Exascale Deep Learning for Climate Analytics
 
Documented Requirements are not Useless After All!
Documented Requirements are not Useless After All!Documented Requirements are not Useless After All!
Documented Requirements are not Useless After All!
 
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
 
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
 
Making_Good_Enough...Better-Addressing_the_Multiple_Objectives_of_High-Perfor...
Making_Good_Enough...Better-Addressing_the_Multiple_Objectives_of_High-Perfor...Making_Good_Enough...Better-Addressing_the_Multiple_Objectives_of_High-Perfor...
Making_Good_Enough...Better-Addressing_the_Multiple_Objectives_of_High-Perfor...
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and ApplicationsApache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications
 

Recently uploaded

Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictJack Cole
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyRafigAliyev2
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfMichaelSenkow
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Jon Hansen
 
how can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoinhow can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like BitcoinDOT TECH
 
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptxMALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptxNidaFaviankaNawawi
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonPayment Village
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理cyebo
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理pyhepag
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?DOT TECH
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdfvyankatesh1
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp onlinebalibahu1313
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxStephen266013
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsCEPTES Software Inc
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理pyhepag
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group MeetingAlison Pitt
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Calllward7
 

Recently uploaded (20)

Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
how can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoinhow can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoin
 
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptxMALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 

Efficient Migration of Very Large Distributed State for Scalable Stream Processing

Editor's Notes

  1. This talk is structured as follows: I will give you a first insight of the core aspects of my proposal, then I will walk you through the research issues, and how I intend to proceed in order to assess my work
  2. Before, we may dig into details, I need to explain how distributed stateful stream processing is done today. A streaming job is defined as a weakly connected DAG. Streams are ingested into a stream processing system through source vertexes. Each input tuple is optionally keyed. In order to exploit parallelism, we use hash-partitioning to shuffle the input tuples on downstream operators. Each of those operators may be stateful, they contain some state according to their logic. Each parallel operator process a range of keys depending to how the input is partitioned. State is internally stored and manager. Each parallel operator has a stream operator, processing an UDF. They both read and write from the state storage. Each parallel instance can work only its internal state. The global state of the topology is checkpointed time to time.
  3. State introduced new challenges, indeed we need state management techniques to ensure support fault-tolerance, resource elasticity, queries maintenance, and load balancing while keeping processing input streams. Currently, there are systems and research paper addressing a subset of those techniques, yet they constrained their focus to partitioned state or partially distributed state. Also the sizes of the state is in the size of gigabytes.
  4. In my opinion, those assumptions limit the capabilities and supported use cases. Let’s consider the following real-word deployment scenario. We run an online marketplace and we need always up-to-date analytics about our platform, we want to perform on-the fly recommendations (thus, we use collaborative filtering), we want to perform fraud detections, and natural language processing to improve the user-experience within our platform. The sizes of those models grow with the number of items and users. Furthermore, we want to calculate not-ML analytics, e.g., heavy hitters, temporal aggregations/joins. This adds more data to our global state.
  5. We end up with a fairly complex topology, where we have internal parallel operators holding internal state. ML algorithms require mutable shared state, one parallel instance while processing its substream might trigger an update to a partition of the state that is held by another parallel instance. Moreover, as we want to perform stream processing with exact-once processing guarantees, we need stateful sources and stateful sinks.
  6. Therefore, we need to address spikes in the ingestion rate, meaning we need to add or remove computing resources, we need to perform load balancing because there could be skewness in the keys distribution, so parallel instances could end up with larger state shards. We need to address fault-tolerance issues as we perform all these computations in an online fashion. As last but not least, we may need to migrate state among different operational environment. Indeed, we might have many development environments, staging envs, and production. We might need at some point to migrate state from one cluster to another in order to hand over the computation among them..
  7. To support these use cases, my research goal is to focus on improving the aforementioned state management techniques when shared mutable state is involved and when we reach terabyte sizes.
  8. The problems behind providing those state management technique in the presence of very large distributed state deal with state transfer, because in order to scale up or do load balancing we need to copy state from one node to another, which is not very feasible when large state is involved. The shared mutable nature of the state should not undermine the consistency of the state when performing exactly once processing. Moreover, a streaming system has to provide robust query performance, and the main kpis here are high throughput as well as low latency.
  9. To address those aforementioned problems, the solution, I propose, deals with defining a replication protocol (à la Hadoop) that creates replica groups of each keys range and it replicates them Q times. The replica groups are kept in-sync through incremental checkpoints. EXPLAIN ON THE PICTURE
  10. We need optimal placement schema for those replica groups to minimize the migration cost. Here we plan to use the dynamic hungarian method to also support dynamic operator parallelism rescaling. For those of you who are not familiar with the hungarian method, it is a way to solve maximum weights matchings on bipartite graphs. Here, our bipartite graph models how we place replica groups onto the parallel instances.
  11. And then as last but not least, we need an handover protocol that enables smoothly moving the computation of a keys range between the primary operator instance and one of its replica groups. I will describe how this handover protocol works in the next slides, but before moving forward, I must quickly summarize this protocol leverages on the optimal placed replica groups to move the processing of a key group from an overloaded instance, for example, or from a failing node, or if we provision a new instance.
  12. I am going to show how to the handover protocol works. To make the explanation easier, we ll consider a scenario with an case of load unbalancing. Also let’s assume that each colour marks a different key ranges, thus the tuples with same colour will influence the state for the same key range. For instance we see the yellow tuples flowing from s1 and s2… to op1. Replication factor is set to 1, primary state is incrementally migrated to its replica group. Same for green, brown, and blue.
  13. Suppose the system detects according to some load balancing policy that the instance number two is overloaded. Overloaded here means either a parallel instance cannot keep up with its ingestion rate (leading to backpressure) or the sizes of the state of its key ranges is hitting the instance physical storage limit. When the system detects such a scenario, it decides according to some policy that it has to migrate the green keys processing from the 2nd to the 3rd instance. How the system does that could be either through a centralized entity or consensus. Then it tells the sources to inject a KeyMove marker, which informs the instances to migrate the processing. Please, note that after the markers flowing on the channels to the 2nd instance, there are no green tuples. Vice versa, the opposite happens on the channels to instance three as green tuples starts flowing from the sources after the markers are injected.
  14. Upon receiving the markers parallel instance number two generates a new incremental checkpoint and sends it to the third instance. According to some user-defined state merging policy, there could be two scenarios on the third instance. If the state has associative property, it will update directly the the replica group, if it does not, then the incoming tuples are buffered, like in this case, then it will merge the old replica with the last incremental checkpoint and the buffered tuples. This guarantees an eventual consistency of the state after the handover is complete.
  15. Finally we have the green keys processing completely migrate to the third instance from the second one. A new replica is going to be create on the 4th instance and the previous state is going to be discarded from the instance number two. As there is no experimental evaluation yet, when and how to perform this last step might require some further investigation in order to achieve robust and consistent processing.
  16. Now the next question is “how to assess the proposed protocols and how to declare success?” To this end, I plan to define a set of metrics that will stress some critical aspects of the system. Indeed, the protocols should have a negligible effects on the query processing as well as improving cluster resources utilization and prevent bottlenecks, such as, backpressure. Furthermore, those protocols should never undermine the consistency and the exactly-once processing guarantees of the system.
  17. Since this is just a proposal and I have no experimental results, I think it is to early to provide a conclusion, therefore, I would like to point out some future directions that my phd could take once the above protocols are in-place. First of all, we would finally have a system providing true continuous stream processing, because as of today, there is no open system that fully achieves such features. Furthermore, I am assuming the system has shared mutable state in place, yet as there is no complete system providing such type of state, I will probably need to spend some research effort on it. Nevertheless, shared mutable state might open some new challenges, such as, how to scale it in the presence of streaming HTAP workloads. Investigating new hardware trends might also be an interesting research activity as well as how to apply data compression and approximation to reduce state size. Of course, I do not plan to do all of them in my phd, only the most interesting researchwise.