Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Concurrency Control for 
Parallel Machine Learning 
Dimitris Papailiopoulos 
Xinghao Pan, Joseph Gonzalez, Stefanie Jegelk...
Model 
State 
Data 
Serial Inference
Model 
State 
Parallel Inference 
Processor 1 
Processor 2 
Data
Model 
State 
Data 
Parallel Inference 
Processor 1 
Processor 2 
Concurrency: 
more machines = less time 
Correctness: 
s...
Model 
State 
Data 
Coordination Free Parallel 
Inference 
Processor 1 
Processor 2 
? 
Ignore collisions 
Concurrency: 
(...
Correctness 
Serial 
Low High
Correctness 
Concurrency 
Coordination-free 
Serial 
High 
Low High 
Low
Correctness 
Concurrency 
Coordination-free 
Serial 
High 
Low High 
Low 
Concurrency 
Control 
Database mechanisms 
o Gua...
Model 
State 
Data 
Mutual Exclusion Through 
Locking 
Processor 1 
Processor 2 
Introduce locking (scheduling) protocols ...
Mutual Exclusion Through 
Model 
State 
Data 
Processor 1 
Processor 2 
Locking 
✗ 
Enforce local serialization to avoid c...
Optimistic Concurrency Control 
Model 
State 
Data 
Processor 1 
Processor 2 
Allow computation to proceed without blockin...
Optimistic Concurrency Control 
Model 
State 
Data 
Invalid Outcome 
✗ ✗ 
Processor 1 
Processor 2 
Validate potential con...
Optimistic Concurrency Control 
Model 
State 
Data 
✗ ✗ 
Processor 1 
Processor 2 
Rollback and Redo 
Take a compensating ...
Concurrency Control 
14 
Coordination Free: 
Provably fast and correct under key assumptions. 
Concurrency Control: 
Prova...
Machine Learning + Concurrency 
Clusteri 
ng 
Online 
Facility 
Location 
Control 
(Xinghao Pan et al.) 
Submodular 
Maxim...
Machine Learning + Concurrency 
Clusteri 
ng 
Online 
Facility 
Location 
Control 
(Xinghao Pan et al.) 
Submodular 
Maxim...
Application: Deduplication 
Computer Science 
Division – University of 
California Berkeley CA 
University of California a...
Application: Deduplication
Serial Correlation Clustering 
Nir Ailon, Moses Charikar, and Alantha Newman. 
Aggregating inconsistent information: ranki...
Serial Correlation Clustering 
Nir Ailon, Moses Charikar, and Alantha Newman. 
Aggregating inconsistent information: ranki...
Serial Correlation Clustering 
Nir Ailon, Moses Charikar, and Alantha Newman. 
Aggregating inconsistent information: ranki...
Serial Correlation Clustering 
Nir Ailon, Moses Charikar, and Alantha Newman. 
Aggregating inconsistent information: ranki...
Serial Correlation Clustering 
Nir Ailon, Moses Charikar, and Alantha Newman. 
Aggregating inconsistent information: ranki...
Parallel Correlation Clustering
Concurrency Control Correlation Clustering 
(C4) Parallel Correlation Clustering 
Cannot Resolve introduce 
by 
Mutual adj...
Concurrency Control Correlation Clustering 
(C4) 
Common Resolve neighbor by 
must be 
assigned Optimistic to Concurrency ...
Properties of C4 
(Concurrency Control Correlation Clustering) 
Theorem: C4 is correct. 
C4 preserves same guarantees as s...
Empirical Validation on Billion Edge 
Graphs 
Amazon EC2 r3.8xlarge instances 
Multicore up to 16 threads 
Real and synthe...
C4: Cost of Coordination 
< 0.02% blocked
C4: Speed-up 
Ideal 
10x 
speedu 
p
Conclusion 
Concurrency Control 
for Parallel ML 
o Guarantee 
correctness 
o Maximize 
concurrency 
Code release in the w...
Upcoming SlideShare
Loading in …5
×

Concurrency Control for Parallel Machine Learning

2,352 views

Published on

"Concurrency Control for Parallel Machine Learning" presentation at AMPCamp 5 by Dimitris Papailiopoulos

Published in: Software
  • Be the first to comment

Concurrency Control for Parallel Machine Learning

  1. 1. Concurrency Control for Parallel Machine Learning Dimitris Papailiopoulos Xinghao Pan, Joseph Gonzalez, Stefanie Jegelka, Tamara Broderick, Dimitris Papailiopoulos, Joseph Bradley, Michael I. Jordan
  2. 2. Model State Data Serial Inference
  3. 3. Model State Parallel Inference Processor 1 Processor 2 Data
  4. 4. Model State Data Parallel Inference Processor 1 Processor 2 Concurrency: more machines = less time Correctness: serial equivalence ?
  5. 5. Model State Data Coordination Free Parallel Inference Processor 1 Processor 2 ? Ignore collisions Concurrency: (almost) free + Speedup = #CPU Correctness? Not always...
  6. 6. Correctness Serial Low High
  7. 7. Correctness Concurrency Coordination-free Serial High Low High Low
  8. 8. Correctness Concurrency Coordination-free Serial High Low High Low Concurrency Control Database mechanisms o Guarantee correctness o Maximize concurrency  Mutual exclusion  Optimistic CC
  9. 9. Model State Data Mutual Exclusion Through Locking Processor 1 Processor 2 Introduce locking (scheduling) protocols to prevent conflicts.
  10. 10. Mutual Exclusion Through Model State Data Processor 1 Processor 2 Locking ✗ Enforce local serialization to avoid conflicts.
  11. 11. Optimistic Concurrency Control Model State Data Processor 1 Processor 2 Allow computation to proceed without blocking. Kung & Robinson. On optimistic methods for concurrency control.
  12. 12. Optimistic Concurrency Control Model State Data Invalid Outcome ✗ ✗ Processor 1 Processor 2 Validate potential conflicts. Kung & Robinson. On optimistic methods for concurrency control.
  13. 13. Optimistic Concurrency Control Model State Data ✗ ✗ Processor 1 Processor 2 Rollback and Redo Take a compensating action. Kung & Robinson. On optimistic methods for concurrency control.
  14. 14. Concurrency Control 14 Coordination Free: Provably fast and correct under key assumptions. Concurrency Control: Provably correct and fast under key assumptions. Systems Ideas to Improve Efficiency
  15. 15. Machine Learning + Concurrency Clusteri ng Online Facility Location Control (Xinghao Pan et al.) Submodular Maximization Subset selection, diminishing marginal gains Max Graph Cut Set Cover Sensor Placement Social Network Influence Propagation Document Summarization Sports Football Word Series Giants Cardinals Politics Midterm Obama Democrat Tea Finance QE market interest Dow Topic Modelling Correlation Clustering Deduplication Community Detection
  16. 16. Machine Learning + Concurrency Clusteri ng Online Facility Location Control (Xinghao Pan et al.) Submodular Maximization Subset selection, diminishing marginal gains Max Graph Cut Set Cover Sensor Placement Social Network Influence Propagation Document Summarization Sports Football Word Series Giants Cardinals Politics Midterm Obama Democrat Tea Finance QE market interest Dow Topic Modelling Correlation Clustering Deduplication Community Detection Serial ML algorithm Sequence of transactions Identify potential conflicts Apply Concurrency Control mechanisms Parallel ML algorithm
  17. 17. Application: Deduplication Computer Science Division – University of California Berkeley CA University of California at Berkeley Department of Physics Stanford University California Lawrence Berkeley National Labs <ref>California</ref>
  18. 18. Application: Deduplication
  19. 19. Serial Correlation Clustering Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: ranking and clustering. Journal of the ACM (JACM), 55(5):23, 2008. Serially process vertices
  20. 20. Serial Correlation Clustering Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: ranking and clustering. Journal of the ACM (JACM), 55(5):23, 2008. Serially process vertices
  21. 21. Serial Correlation Clustering Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: ranking and clustering. Journal of the ACM (JACM), 55(5):23, 2008. Serially process vertices
  22. 22. Serial Correlation Clustering Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: ranking and clustering. Journal of the ACM (JACM), 55(5):23, 2008. Serially process vertices
  23. 23. Serial Correlation Clustering Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: ranking and clustering. Journal of the ACM (JACM), 55(5):23, 2008. Serially process vertices Approximation 3 OPT (in expectation)
  24. 24. Parallel Correlation Clustering
  25. 25. Concurrency Control Correlation Clustering (C4) Parallel Correlation Clustering Cannot Resolve introduce by Mutual adjacent Exclusion cluster centers
  26. 26. Concurrency Control Correlation Clustering (C4) Common Resolve neighbor by must be assigned Optimistic to Concurrency earliest center Control ? Optimistic Assumption No other new cluster created Resolution Assign common neighbor to earliest cluster
  27. 27. Properties of C4 (Concurrency Control Correlation Clustering) Theorem: C4 is correct. C4 preserves same guarantees as serial algorithm (3 OPT). Concurren Correctness Theorem: C4 has provably small overheads. cy = almost linear speedup Expected #blocked transactions < 2τ |E| / |V|. τ ≡ diff in parallel cpu’s progress
  28. 28. Empirical Validation on Billion Edge Graphs Amazon EC2 r3.8xlarge instances Multicore up to 16 threads Real and synthetic graphs 100 runs (10 random orderings x 10 runs) Graph Vertices Edges IT-2004 Italian web-graph 41 Million 1.14 Billion Webbase-2001 WebBase crawl 118 Million 1.02 Billion Erdos-Renyi Synthetic random 100 Million ≈ 1.0 Billion
  29. 29. C4: Cost of Coordination < 0.02% blocked
  30. 30. C4: Speed-up Ideal 10x speedu p
  31. 31. Conclusion Concurrency Control for Parallel ML o Guarantee correctness o Maximize concurrency Code release in the works! https://amplab.cs.berkeley.edu/projects/cc ml/ xinghao@berkeley.edu Applications Correlation Clustering Submodular Maximization Clustering Online Facility Location Feature Modeling

×