SlideShare a Scribd company logo
Concurrency Control for 
Parallel Machine Learning 
Dimitris Papailiopoulos 
Xinghao Pan, Joseph Gonzalez, Stefanie Jegelka, Tamara Broderick, Dimitris Papailiopoulos, Joseph 
Bradley, Michael I. Jordan
Model 
State 
Data 
Serial Inference
Model 
State 
Parallel Inference 
Processor 1 
Processor 2 
Data
Model 
State 
Data 
Parallel Inference 
Processor 1 
Processor 2 
Concurrency: 
more machines = less time 
Correctness: 
serial equivalence 
?
Model 
State 
Data 
Coordination Free Parallel 
Inference 
Processor 1 
Processor 2 
? 
Ignore collisions 
Concurrency: 
(almost) free 
+ 
Speedup = #CPU 
Correctness? 
Not always...
Correctness 
Serial 
Low High
Correctness 
Concurrency 
Coordination-free 
Serial 
High 
Low High 
Low
Correctness 
Concurrency 
Coordination-free 
Serial 
High 
Low High 
Low 
Concurrency 
Control 
Database mechanisms 
o Guarantee correctness 
o Maximize concurrency 
 Mutual exclusion 
 Optimistic CC
Model 
State 
Data 
Mutual Exclusion Through 
Locking 
Processor 1 
Processor 2 
Introduce locking (scheduling) protocols to prevent 
conflicts.
Mutual Exclusion Through 
Model 
State 
Data 
Processor 1 
Processor 2 
Locking 
✗ 
Enforce local serialization to avoid conflicts.
Optimistic Concurrency Control 
Model 
State 
Data 
Processor 1 
Processor 2 
Allow computation to proceed without blocking. 
Kung & Robinson. On optimistic methods for concurrency 
control.
Optimistic Concurrency Control 
Model 
State 
Data 
Invalid Outcome 
✗ ✗ 
Processor 1 
Processor 2 
Validate potential conflicts. 
Kung & Robinson. On optimistic methods for concurrency 
control.
Optimistic Concurrency Control 
Model 
State 
Data 
✗ ✗ 
Processor 1 
Processor 2 
Rollback and Redo 
Take a compensating action. 
Kung & Robinson. On optimistic methods for concurrency 
control.
Concurrency Control 
14 
Coordination Free: 
Provably fast and correct under key assumptions. 
Concurrency Control: 
Provably correct and fast under key assumptions. 
Systems Ideas to 
Improve Efficiency
Machine Learning + Concurrency 
Clusteri 
ng 
Online 
Facility 
Location 
Control 
(Xinghao Pan et al.) 
Submodular 
Maximization 
Subset selection, diminishing 
marginal gains 
Max Graph 
Cut 
Set 
Cover 
Sensor Placement 
Social Network 
Influence 
Propagation 
Document 
Summarization 
Sports 
Football 
Word Series 
Giants 
Cardinals 
Politics 
Midterm 
Obama 
Democrat 
Tea 
Finance 
QE 
market 
interest 
Dow 
Topic Modelling 
Correlation 
Clustering 
Deduplication 
Community 
Detection
Machine Learning + Concurrency 
Clusteri 
ng 
Online 
Facility 
Location 
Control 
(Xinghao Pan et al.) 
Submodular 
Maximization 
Subset selection, diminishing 
marginal gains 
Max Graph 
Cut 
Set 
Cover 
Sensor Placement 
Social Network 
Influence 
Propagation 
Document 
Summarization 
Sports 
Football 
Word Series 
Giants 
Cardinals 
Politics 
Midterm 
Obama 
Democrat 
Tea 
Finance 
QE 
market 
interest 
Dow 
Topic Modelling 
Correlation 
Clustering 
Deduplication 
Community 
Detection 
Serial ML 
algorithm 
Sequence of 
transactions 
Identify potential 
conflicts 
Apply Concurrency 
Control 
mechanisms 
Parallel ML 
algorithm
Application: Deduplication 
Computer Science 
Division – University of 
California Berkeley CA 
University of California at Berkeley 
Department of 
Physics Stanford 
University California 
Lawrence Berkeley National 
Labs <ref>California</ref>
Application: Deduplication
Serial Correlation Clustering 
Nir Ailon, Moses Charikar, and Alantha Newman. 
Aggregating inconsistent information: ranking and clustering. 
Journal of the ACM (JACM), 55(5):23, 2008. 
Serially process vertices
Serial Correlation Clustering 
Nir Ailon, Moses Charikar, and Alantha Newman. 
Aggregating inconsistent information: ranking and clustering. 
Journal of the ACM (JACM), 55(5):23, 2008. 
Serially process vertices
Serial Correlation Clustering 
Nir Ailon, Moses Charikar, and Alantha Newman. 
Aggregating inconsistent information: ranking and clustering. 
Journal of the ACM (JACM), 55(5):23, 2008. 
Serially process vertices
Serial Correlation Clustering 
Nir Ailon, Moses Charikar, and Alantha Newman. 
Aggregating inconsistent information: ranking and clustering. 
Journal of the ACM (JACM), 55(5):23, 2008. 
Serially process vertices
Serial Correlation Clustering 
Nir Ailon, Moses Charikar, and Alantha Newman. 
Aggregating inconsistent information: ranking and clustering. 
Journal of the ACM (JACM), 55(5):23, 2008. 
Serially process vertices 
Approximation 3 OPT (in expectation)
Parallel Correlation Clustering
Concurrency Control Correlation Clustering 
(C4) Parallel Correlation Clustering 
Cannot Resolve introduce 
by 
Mutual adjacent Exclusion 
cluster 
centers
Concurrency Control Correlation Clustering 
(C4) 
Common Resolve neighbor by 
must be 
assigned Optimistic to Concurrency 
earliest center 
Control 
? 
Optimistic Assumption 
No other new cluster created 
Resolution 
Assign common neighbor to earliest cluster
Properties of C4 
(Concurrency Control Correlation Clustering) 
Theorem: C4 is correct. 
C4 preserves same guarantees as serial algorithm (3 
OPT). 
Concurren Correctness 
Theorem: C4 has provably small overheads. 
cy 
= almost linear speedup 
Expected #blocked transactions < 2τ |E| / |V|. 
τ ≡ diff in parallel cpu’s progress
Empirical Validation on Billion Edge 
Graphs 
Amazon EC2 r3.8xlarge instances 
Multicore up to 16 threads 
Real and synthetic graphs 
100 runs (10 random orderings x 10 runs) 
Graph Vertices Edges 
IT-2004 Italian web-graph 41 Million 1.14 Billion 
Webbase-2001 WebBase crawl 118 Million 1.02 Billion 
Erdos-Renyi Synthetic random 100 Million ≈ 1.0 Billion
C4: Cost of Coordination 
< 0.02% blocked
C4: Speed-up 
Ideal 
10x 
speedu 
p
Conclusion 
Concurrency Control 
for Parallel ML 
o Guarantee 
correctness 
o Maximize 
concurrency 
Code release in the works! 
https://amplab.cs.berkeley.edu/projects/cc 
ml/ 
xinghao@berkeley.edu 
Applications 
Correlation Clustering 
Submodular Maximization 
Clustering 
Online Facility Location 
Feature Modeling

More Related Content

Similar to Concurrency Control for Parallel Machine Learning

Black-box Behavioral Model Inference for Autopilot Software Systems
Black-box Behavioral Model Inference for Autopilot Software SystemsBlack-box Behavioral Model Inference for Autopilot Software Systems
Black-box Behavioral Model Inference for Autopilot Software Systems
Mohammad Jafar Mashhadi
 
October 26, Optimization
October 26, OptimizationOctober 26, Optimization
October 26, Optimization
University of Colorado at Boulder
 
Approaches to online quantile estimation
Approaches to online quantile estimationApproaches to online quantile estimation
Approaches to online quantile estimation
Data Con LA
 
2016 bioinformatics i_alignments_wim_vancriekinge
2016 bioinformatics i_alignments_wim_vancriekinge2016 bioinformatics i_alignments_wim_vancriekinge
2016 bioinformatics i_alignments_wim_vancriekinge
Prof. Wim Van Criekinge
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformaticsAbhishek Vatsa
 
2015 bioinformatics alignments_wim_vancriekinge
2015 bioinformatics alignments_wim_vancriekinge2015 bioinformatics alignments_wim_vancriekinge
2015 bioinformatics alignments_wim_vancriekinge
Prof. Wim Van Criekinge
 
Ashg2014 grc workshop_schneider
Ashg2014 grc workshop_schneiderAshg2014 grc workshop_schneider
Ashg2014 grc workshop_schneider
Genome Reference Consortium
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
University of Huddersfield
 
Computational Approaches to Systems Biology
Computational Approaches to Systems BiologyComputational Approaches to Systems Biology
Computational Approaches to Systems Biology
Mike Hucka
 
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
Salford Systems
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..butest
 
2005: A Matlab Tour on Artificial Immune Systems
2005: A Matlab Tour on Artificial Immune Systems2005: A Matlab Tour on Artificial Immune Systems
2005: A Matlab Tour on Artificial Immune Systems
Leandro de Castro
 
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automata
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular AutomataCost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automata
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automata
ijait
 
Josh Patterson MLconf slides
Josh Patterson MLconf slidesJosh Patterson MLconf slides
Josh Patterson MLconf slides
MLconf
 
20131019 生物物理若手 Journal Club
20131019 生物物理若手 Journal Club20131019 生物物理若手 Journal Club
20131019 生物物理若手 Journal Club
Med_KU
 
IMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYOND
IMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYONDIMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYOND
IMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYOND
Rabi Das
 
Advances in Bayesian Learning
Advances in Bayesian LearningAdvances in Bayesian Learning
Advances in Bayesian Learningbutest
 
Introduction Of Artificial neural network
Introduction Of Artificial neural networkIntroduction Of Artificial neural network
Introduction Of Artificial neural network
Nagarajan
 

Similar to Concurrency Control for Parallel Machine Learning (20)

Black-box Behavioral Model Inference for Autopilot Software Systems
Black-box Behavioral Model Inference for Autopilot Software SystemsBlack-box Behavioral Model Inference for Autopilot Software Systems
Black-box Behavioral Model Inference for Autopilot Software Systems
 
October 26, Optimization
October 26, OptimizationOctober 26, Optimization
October 26, Optimization
 
Approaches to online quantile estimation
Approaches to online quantile estimationApproaches to online quantile estimation
Approaches to online quantile estimation
 
My
MyMy
My
 
2016 bioinformatics i_alignments_wim_vancriekinge
2016 bioinformatics i_alignments_wim_vancriekinge2016 bioinformatics i_alignments_wim_vancriekinge
2016 bioinformatics i_alignments_wim_vancriekinge
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
2015 bioinformatics alignments_wim_vancriekinge
2015 bioinformatics alignments_wim_vancriekinge2015 bioinformatics alignments_wim_vancriekinge
2015 bioinformatics alignments_wim_vancriekinge
 
Ashg2014 grc workshop_schneider
Ashg2014 grc workshop_schneiderAshg2014 grc workshop_schneider
Ashg2014 grc workshop_schneider
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
 
Computational Approaches to Systems Biology
Computational Approaches to Systems BiologyComputational Approaches to Systems Biology
Computational Approaches to Systems Biology
 
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
 
PPT
PPTPPT
PPT
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
 
2005: A Matlab Tour on Artificial Immune Systems
2005: A Matlab Tour on Artificial Immune Systems2005: A Matlab Tour on Artificial Immune Systems
2005: A Matlab Tour on Artificial Immune Systems
 
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automata
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular AutomataCost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automata
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automata
 
Josh Patterson MLconf slides
Josh Patterson MLconf slidesJosh Patterson MLconf slides
Josh Patterson MLconf slides
 
20131019 生物物理若手 Journal Club
20131019 生物物理若手 Journal Club20131019 生物物理若手 Journal Club
20131019 生物物理若手 Journal Club
 
IMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYOND
IMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYONDIMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYOND
IMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYOND
 
Advances in Bayesian Learning
Advances in Bayesian LearningAdvances in Bayesian Learning
Advances in Bayesian Learning
 
Introduction Of Artificial neural network
Introduction Of Artificial neural networkIntroduction Of Artificial neural network
Introduction Of Artificial neural network
 

More from jeykottalam

AMP Camp 5 Intro
AMP Camp 5 IntroAMP Camp 5 Intro
AMP Camp 5 Intro
jeykottalam
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQL
jeykottalam
 
MLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning LibraryMLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning Library
jeykottalam
 
SparkR: Enabling Interactive Data Science at Scale
SparkR: Enabling Interactive Data Science at ScaleSparkR: Enabling Interactive Data Science at Scale
SparkR: Enabling Interactive Data Science at Scale
jeykottalam
 
SampleClean: Bringing Data Cleaning into the BDAS Stack
SampleClean: Bringing Data Cleaning into the BDAS StackSampleClean: Bringing Data Cleaning into the BDAS Stack
SampleClean: Bringing Data Cleaning into the BDAS Stack
jeykottalam
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelines
jeykottalam
 
COCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate AscentCOCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate Ascent
jeykottalam
 
The BDAS Open Source Community
The BDAS Open Source CommunityThe BDAS Open Source Community
The BDAS Open Source Community
jeykottalam
 

More from jeykottalam (8)

AMP Camp 5 Intro
AMP Camp 5 IntroAMP Camp 5 Intro
AMP Camp 5 Intro
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQL
 
MLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning LibraryMLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning Library
 
SparkR: Enabling Interactive Data Science at Scale
SparkR: Enabling Interactive Data Science at ScaleSparkR: Enabling Interactive Data Science at Scale
SparkR: Enabling Interactive Data Science at Scale
 
SampleClean: Bringing Data Cleaning into the BDAS Stack
SampleClean: Bringing Data Cleaning into the BDAS StackSampleClean: Bringing Data Cleaning into the BDAS Stack
SampleClean: Bringing Data Cleaning into the BDAS Stack
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelines
 
COCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate AscentCOCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate Ascent
 
The BDAS Open Source Community
The BDAS Open Source CommunityThe BDAS Open Source Community
The BDAS Open Source Community
 

Recently uploaded

GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
Srikant77
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 

Recently uploaded (20)

GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 

Concurrency Control for Parallel Machine Learning

  • 1. Concurrency Control for Parallel Machine Learning Dimitris Papailiopoulos Xinghao Pan, Joseph Gonzalez, Stefanie Jegelka, Tamara Broderick, Dimitris Papailiopoulos, Joseph Bradley, Michael I. Jordan
  • 2. Model State Data Serial Inference
  • 3. Model State Parallel Inference Processor 1 Processor 2 Data
  • 4. Model State Data Parallel Inference Processor 1 Processor 2 Concurrency: more machines = less time Correctness: serial equivalence ?
  • 5. Model State Data Coordination Free Parallel Inference Processor 1 Processor 2 ? Ignore collisions Concurrency: (almost) free + Speedup = #CPU Correctness? Not always...
  • 7. Correctness Concurrency Coordination-free Serial High Low High Low
  • 8. Correctness Concurrency Coordination-free Serial High Low High Low Concurrency Control Database mechanisms o Guarantee correctness o Maximize concurrency  Mutual exclusion  Optimistic CC
  • 9. Model State Data Mutual Exclusion Through Locking Processor 1 Processor 2 Introduce locking (scheduling) protocols to prevent conflicts.
  • 10. Mutual Exclusion Through Model State Data Processor 1 Processor 2 Locking ✗ Enforce local serialization to avoid conflicts.
  • 11. Optimistic Concurrency Control Model State Data Processor 1 Processor 2 Allow computation to proceed without blocking. Kung & Robinson. On optimistic methods for concurrency control.
  • 12. Optimistic Concurrency Control Model State Data Invalid Outcome ✗ ✗ Processor 1 Processor 2 Validate potential conflicts. Kung & Robinson. On optimistic methods for concurrency control.
  • 13. Optimistic Concurrency Control Model State Data ✗ ✗ Processor 1 Processor 2 Rollback and Redo Take a compensating action. Kung & Robinson. On optimistic methods for concurrency control.
  • 14. Concurrency Control 14 Coordination Free: Provably fast and correct under key assumptions. Concurrency Control: Provably correct and fast under key assumptions. Systems Ideas to Improve Efficiency
  • 15. Machine Learning + Concurrency Clusteri ng Online Facility Location Control (Xinghao Pan et al.) Submodular Maximization Subset selection, diminishing marginal gains Max Graph Cut Set Cover Sensor Placement Social Network Influence Propagation Document Summarization Sports Football Word Series Giants Cardinals Politics Midterm Obama Democrat Tea Finance QE market interest Dow Topic Modelling Correlation Clustering Deduplication Community Detection
  • 16. Machine Learning + Concurrency Clusteri ng Online Facility Location Control (Xinghao Pan et al.) Submodular Maximization Subset selection, diminishing marginal gains Max Graph Cut Set Cover Sensor Placement Social Network Influence Propagation Document Summarization Sports Football Word Series Giants Cardinals Politics Midterm Obama Democrat Tea Finance QE market interest Dow Topic Modelling Correlation Clustering Deduplication Community Detection Serial ML algorithm Sequence of transactions Identify potential conflicts Apply Concurrency Control mechanisms Parallel ML algorithm
  • 17. Application: Deduplication Computer Science Division – University of California Berkeley CA University of California at Berkeley Department of Physics Stanford University California Lawrence Berkeley National Labs <ref>California</ref>
  • 19. Serial Correlation Clustering Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: ranking and clustering. Journal of the ACM (JACM), 55(5):23, 2008. Serially process vertices
  • 20. Serial Correlation Clustering Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: ranking and clustering. Journal of the ACM (JACM), 55(5):23, 2008. Serially process vertices
  • 21. Serial Correlation Clustering Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: ranking and clustering. Journal of the ACM (JACM), 55(5):23, 2008. Serially process vertices
  • 22. Serial Correlation Clustering Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: ranking and clustering. Journal of the ACM (JACM), 55(5):23, 2008. Serially process vertices
  • 23. Serial Correlation Clustering Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: ranking and clustering. Journal of the ACM (JACM), 55(5):23, 2008. Serially process vertices Approximation 3 OPT (in expectation)
  • 25. Concurrency Control Correlation Clustering (C4) Parallel Correlation Clustering Cannot Resolve introduce by Mutual adjacent Exclusion cluster centers
  • 26. Concurrency Control Correlation Clustering (C4) Common Resolve neighbor by must be assigned Optimistic to Concurrency earliest center Control ? Optimistic Assumption No other new cluster created Resolution Assign common neighbor to earliest cluster
  • 27. Properties of C4 (Concurrency Control Correlation Clustering) Theorem: C4 is correct. C4 preserves same guarantees as serial algorithm (3 OPT). Concurren Correctness Theorem: C4 has provably small overheads. cy = almost linear speedup Expected #blocked transactions < 2τ |E| / |V|. τ ≡ diff in parallel cpu’s progress
  • 28. Empirical Validation on Billion Edge Graphs Amazon EC2 r3.8xlarge instances Multicore up to 16 threads Real and synthetic graphs 100 runs (10 random orderings x 10 runs) Graph Vertices Edges IT-2004 Italian web-graph 41 Million 1.14 Billion Webbase-2001 WebBase crawl 118 Million 1.02 Billion Erdos-Renyi Synthetic random 100 Million ≈ 1.0 Billion
  • 29. C4: Cost of Coordination < 0.02% blocked
  • 30. C4: Speed-up Ideal 10x speedu p
  • 31. Conclusion Concurrency Control for Parallel ML o Guarantee correctness o Maximize concurrency Code release in the works! https://amplab.cs.berkeley.edu/projects/cc ml/ xinghao@berkeley.edu Applications Correlation Clustering Submodular Maximization Clustering Online Facility Location Feature Modeling