SlideShare a Scribd company logo
COCOA 
Communication-Efficient 
Coordinate Ascent 
Virginia Smith 
! 
Martin Jaggi, Martin Takáč, Jonathan Terhorst, 
Sanjay Krishnan, Thomas Hofmann, & Michael I. Jordan
LARGE-SCALE OPTIMIZATION
LARGE-SCALE OPTIMIZATION 
COCOA
LARGE-SCALE OPTIMIZATION 
COCOA 
RESULTS
LARGE-SCALE OPTIMIZATION! 
COCOA 
RESULTS
Machine Learning with 
Large Datasets
Machine Learning with 
Large Datasets
Machine Learning with 
Large Datasets 
image/music/video tagging 
document categorization 
item recommendation 
click-through rate prediction 
sequence tagging 
protein structure prediction 
sensor data prediction 
spam classification 
fraud detection
Machine Learning Workflow
Machine Learning Workflow 
DATA & PROBLEM 
classification, regression, 
collaborative filtering, …
Machine Learning Workflow 
DATA & PROBLEM 
classification, regression, 
collaborative filtering, … 
MACHINE LEARNING MODEL 
logistic regression, lasso, 
support vector machines, …
Machine Learning Workflow 
DATA & PROBLEM 
classification, regression, 
collaborative filtering, … 
MACHINE LEARNING MODEL 
logistic regression, lasso, 
support vector machines, … 
OPTIMIZATION ALGORITHM 
gradient descent, coordinate 
descent, Newton’s method, …
Example: SVM Classification
Example: SVM Classification
Example: SVM Classification
Example: SVM Classification
Example: SVM Classification
Example: SVM Classification
Example: SVM Classification 
wx-b=1 
wx-b=0 w 
2 / ||w|| 
wx-b=-1
Example: SVM Classification 
wx-b=1 
wx-b=0 w 
2 / ||w|| 
wx-b=-1 
min 
w2Rd 
! 
2 ||w||2 + 
1 
n 
Xn 
i=1 
`hinge(yiwT xi)
Example: SVM Classification 
wx-b=1 
wx-b=0 w 
2 / ||w|| 
wx-b=-1 
min 
w2Rd 
! 
2 ||w||2 + 
1 
n 
Xn 
i=1 
`hinge(yiwT xi) 
Descent algorithms and line search methods 
Acceleration, momentum, and conjugate gradients 
Newton and Quasi-Newton methods 
Coordinate descent 
Stochastic and incremental gradient methods 
SMO 
SVMlight 
LIBLINEAR
Machine Learning Workflow 
DATA & PROBLEM 
classification, regression, 
collaborative filtering, … 
MACHINE LEARNING MODEL 
logistic regression, lasso, 
support vector machines, … 
OPTIMIZATION ALGORITHM 
gradient descent, coordinate 
descent, Newton’s method, …
Machine Learning Workflow 
DATA & PROBLEM 
classification, regression, 
collaborative filtering, … 
MACHINE LEARNING MODEL 
logistic regression, lasso, 
support vector machines, … 
OPTIMIZATION ALGORITHM 
gradient descent, coordinate 
descent, Newton’s method, … 
SYSTEMS SETTING 
multi-core, cluster, cloud, 
supercomputer, …
Machine Learning Workflow 
DATA & PROBLEM 
classification, regression, 
collaborative filtering, … 
Open Problem: 
MACHINE LEARNING MODEL 
logistic regression, lasso, 
support vector machines, … 
efficiently solving objective 
when data is distributed 
OPTIMIZATION ALGORITHM 
gradient descent, coordinate 
descent, Newton’s method, … 
SYSTEMS SETTING 
multi-core, cluster, cloud, 
supercomputer, …
Distributed Optimization
Distributed Optimization
Distributed Optimization 
reduce: w = w  ↵ 
P 
k w
Distributed Optimization 
reduce: w = w  ↵ 
P 
k w
Distributed Optimization 
“always communicate” 
reduce: w = w  ↵ 
P 
k w
Distributed Optimization 
✔ convergence 
guarantees 
“always communicate” 
reduce: w = w  ↵ 
P 
k w
Distributed Optimization 
✔ convergence 
guarantees 
✗ high communication 
“always communicate” 
reduce: w = w  ↵ 
P 
k w
Distributed Optimization 
✔ convergence 
guarantees 
✗ high communication 
“always communicate” 
reduce: w = w  ↵ 
P 
k w
Distributed Optimization 
✔ convergence 
guarantees 
✗ high communication 
average: w := 1K 
P 
k wk 
“always communicate” 
reduce: w = w  ↵ 
P 
k w
Distributed Optimization 
✔ convergence 
guarantees 
✗ high communication 
average: w := 1K 
P 
k wk 
“always communicate” 
“never communicate” 
reduce: w = w  ↵ 
P 
k w
Distributed Optimization 
✔ convergence 
guarantees 
✗ high communication 
✔ low communication 
average: w := 1K 
P 
k wk 
“always communicate” 
“never communicate” 
reduce: w = w  ↵ 
P 
k w
Distributed Optimization 
✔ convergence 
guarantees 
✗ high communication 
✔ low communication 
average: w := 1K 
P 
k wk 
“always communicate” 
“never communicate” 
ZDWJ, 2012 
reduce: w = w  ↵ 
P 
k w
Distributed Optimization 
✔ convergence 
guarantees 
✗ high communication 
✔ low communication 
✗ convergence 
not guaranteed 
average: w := 1K 
P 
k wk 
“always communicate” 
“never communicate” 
ZDWJ, 2012 
reduce: w = w  ↵ 
P 
k w
Mini-batch
Mini-batch
Mini-batch 
reduce: w = w  ↵ 
|b| 
P 
i2b w
Mini-batch 
reduce: w = w  ↵ 
|b| 
P 
i2b w
Mini-batch 
✔ convergence 
guarantees 
reduce: w = w  ↵ 
|b| 
P 
i2b w
Mini-batch 
✔ convergence 
guarantees 
✔ tunable 
communication 
reduce: w = w  ↵ 
|b| 
P 
i2b w
Mini-batch 
✔ convergence 
guarantees 
✔ tunable 
communication 
a natural middle-ground 
reduce: w = w  ↵ 
|b| 
P 
i2b w
Mini-batch 
✔ convergence 
guarantees 
✔ tunable 
communication 
a natural middle-ground 
reduce: w = w  ↵ 
|b| 
P 
i2b w
Are we done?
Mini-batch Limitations
Mini-batch Limitations 
1. METHODS BEYOND SGD
Mini-batch Limitations 
1. METHODS BEYOND SGD 
2. STALE UPDATES
Mini-batch Limitations 
1. METHODS BEYOND SGD 
2. STALE UPDATES 
3. AVERAGE OVER BATCH SIZE
LARGE-SCALE OPTIMIZATION 
COCOA 
RESULTS
LARGE-SCALE OPTIMIZATION 
COCOA! 
RESULTS
Mini-batch Limitations 
1. METHODS BEYOND SGD 
2. STALE UPDATES 
3. AVERAGE OVER BATCH SIZE
Mini-batch Limitations 
1. METHODS BEYOND SGD 
Use Primal-Dual Framework 
2. STALE UPDATES 
Immediately apply local updates 
3. AVERAGE OVER BATCH SIZE 
Average over K  batch size
Communication-Efficient 
Distributed Dual 
Coordinate Ascent (CoCoA) 
1. METHODS BEYOND SGD 
Use Primal-Dual Framework 
2. STALE UPDATES 
Immediately apply local updates 
3. AVERAGE OVER BATCH SIZE 
Average over K  batch size
1. Primal-Dual Framework 
PRIMAL ≥ DUAL
1. Primal-Dual Framework 
PRIMAL ≥ DUAL 
min 
w2Rd 
 
P(w) := 
! 
2 ||w||2 + 
1 
n 
Xn 
i=1 
`i(wT xi) 
#
1. Primal-Dual Framework 
PRIMAL ≥ DUAL 
min 
w2Rd 
 
P(w) := 
! 
2 ||w||2 + 
1 
n 
Xn 
i=1 
`i(wT xi) 
# 
max 
↵2Rn 
 
D(↵) := ||A↵||2  
1 
n 
Xn 
i=1 
`⇤i 
(↵i) 
# 
Ai = 
1 
n 
xi
1. Primal-Dual Framework 
≥ 
PRIMAL DUAL 
Stopping criteria given by duality gap 
min 
w2Rd 
 
P(w) := 
! 
2 ||w||2 + 
1 
n 
Xn 
i=1 
`i(wT xi) 
# 
max 
↵2Rn 
 
D(↵) := ||A↵||2  
1 
n 
Xn 
i=1 
`⇤i 
(↵i) 
# 
Ai = 
1 
n 
xi
1. Primal-Dual Framework 
≥ 
PRIMAL DUAL 
Stopping criteria given by duality gap 
Good performance in practice 
min 
w2Rd 
 
P(w) := 
! 
2 ||w||2 + 
1 
n 
Xn 
i=1 
`i(wT xi) 
# 
max 
↵2Rn 
 
D(↵) := ||A↵||2  
1 
n 
Xn 
i=1 
`⇤i 
(↵i) 
# 
Ai = 
1 
n 
xi
1. Primal-Dual Framework 
≥ 
PRIMAL DUAL 
Stopping criteria given by duality gap 
Good performance in practice 
Default in software packages e.g. liblinear 
min 
w2Rd 
 
P(w) := 
! 
2 ||w||2 + 
1 
n 
Xn 
i=1 
`i(wT xi) 
# 
max 
↵2Rn 
 
D(↵) := ||A↵||2  
1 
n 
Xn 
i=1 
`⇤i 
(↵i) 
# 
Ai = 
1 
n 
xi
2. Immediately Apply Updates
2. Immediately Apply Updates 
for i 2 b 
w w − ↵riP(w) 
end 
w w + w 
STALE
2. Immediately Apply Updates 
for i 2 b 
w w − ↵riP(w) 
end 
w w + w 
STALE 
for i 2 b 
w w − ↵riP(w) 
w w + w 
end 
Output: ↵[k] and w := A[k]↵[k] 
FRESH
3. Average over K
3. Average over K 
reduce: w = w + 1K 
P 
k wk
CoCoA 
ni 
Algorithm 1: CoCoA 
Input: T ! 1, scaling parameter 1  K  K (default: K := 1). 
Data: {(xi, yi)}=1 distributed over K machines 
Initialize: ↵(0) 
[k] 0 for all machines k, and w(0) 0 
for t = 1, 2, . . . ,T 
for all machines k = 1, 2, . . . ,K in parallel 
(↵[k],wk) LocalDualMethod(↵(t1) 
[k] ,w(t1)) 
[k] ↵(t1) 
[k] + K 
↵(t) 
K ↵[k] 
end 
reduce w(t) w(t1) + K 
K 
PK 
k=1 wk 
end 
Procedure A: LocalDualMethod: Dual algorithm on machine k 
Input: Local ↵[k] 2 Rnk , and w 2 Rd consistent with other coordinate 
blocks of ↵ s.t. w = A↵ 
Data: Local {(xi, yi)}nk 
i=1 
Output: ↵[k] and w := A[k]↵[k]
CoCoA 
Algorithm 1: CoCoA 
Input: T ! 1, scaling parameter 1  K  K (default: K := 1). 
Data: {(xi, yi)}ni 
✔ 10 lines of code in Spark 
=1 distributed over K machines 
Initialize: ↵(0) 
[k] 0 for all machines k, and w(0) 0 
for t = 1, 2, . . . ,T 
for all machines k = 1, 2, . . . ,K in parallel 
(↵[k],wk) LocalDualMethod(↵(t1) 
[k] ,w(t1)) 
[k] ↵(t1) 
[k] + K 
↵(t) 
K ↵[k] 
end 
reduce w(t) w(t1) + K 
K 
PK 
k=1 wk 
end 
Procedure A: LocalDualMethod: Dual algorithm on machine k 
Input: Local ↵[k] 2 Rnk , and w 2 Rd consistent with other coordinate 
blocks of ↵ s.t. w = A↵ 
Data: Local {(xi, yi)}nk 
i=1 
Output: ↵[k] and w := A[k]↵[k]
CoCoA 
Algorithm 1: CoCoA 
Input: T ! 1, scaling parameter 1  K  K (default: K := 1). 
Data: {(xi, yi)}ni 
✔ 10 lines of code in Spark 
=1 distributed over K machines 
Initialize: ↵(0) 
[k] 0 for all machines k, and w(0) 0 
✔ primal-dual framework allows for 
any internal optimization method 
for t = 1, 2, . . . ,T 
for all machines k = 1, 2, . . . ,K in parallel 
(↵[k],wk) LocalDualMethod(↵(t1) 
[k] ,w(t1)) 
[k] ↵(t1) 
[k] + K 
↵(t) 
K ↵[k] 
end 
reduce w(t) w(t1) + K 
K 
PK 
k=1 wk 
end 
Procedure A: LocalDualMethod: Dual algorithm on machine k 
Input: Local ↵[k] 2 Rnk , and w 2 Rd consistent with other coordinate 
blocks of ↵ s.t. w = A↵ 
Data: Local {(xi, yi)}nk 
i=1 
Output: ↵[k] and w := A[k]↵[k]
CoCoA 
Algorithm 1: CoCoA 
Input: T ! 1, scaling parameter 1  K  K (default: K := 1). 
Data: {(xi, yi)}ni 
✔ 10 lines of code in Spark 
=1 distributed over K machines 
Initialize: ↵(0) 
[k] 0 for all machines k, and w(0) 0 
✔ primal-dual framework allows for 
any internal optimization method 
for t = 1, 2, . . . ,T 
for all machines k = 1, 2, . . . ,K in parallel 
(↵[k],wk) LocalDualMethod(↵(t1) 
[k] ,w(t1)) 
[k] ↵(t1) 
[k] + K 
↵(t) 
K ↵[k] 
end 
reduce w(t) w(t1) + K 
✔ local updates K 
applied immediately 
PK 
k=1 wk 
end 
Procedure A: LocalDualMethod: Dual algorithm on machine k 
Input: Local ↵[k] 2 Rnk , and w 2 Rd consistent with other coordinate 
blocks of ↵ s.t. w = A↵ 
Data: Local {(xi, yi)}nk 
i=1 
Output: ↵[k] and w := A[k]↵[k]
CoCoA 
Algorithm 1: CoCoA 
Input: T ! 1, scaling parameter 1  K  K (default: K := 1). 
Data: {(xi, yi)}ni 
✔ 10 lines of code in Spark 
=1 distributed over K machines 
Initialize: ↵(0) 
[k] 0 for all machines k, and w(0) 0 
✔ primal-dual framework allows for 
any internal optimization method 
for t = 1, 2, . . . ,T 
for all machines k = 1, 2, . . . ,K in parallel 
(↵[k],wk) LocalDualMethod(↵(t1) 
[k] ,w(t1)) 
[k] ↵(t1) 
[k] + K 
↵(t) 
K ↵[k] 
end 
reduce w(t) w(t1) + K 
✔ local updates K 
applied immediately 
PK 
k=1 wk 
end 
Procedure A: LocalDualMethod: Dual algorithm on machine k 
Input: Local ↵[✔ k] 2 Rnk average , and w 2 Rd consistent over with K 
other coordinate 
blocks of ↵ s.t. w = A↵ 
Data: Local {(xi, yi)}nk 
i=1 
Output: ↵[k] and w := A[k]↵[k]
Convergence 
E[D(↵⇤) − D(↵(T))]  
✓ 
1 − (1 − ⇥) 
1 
K 
!n 
# + !n 
◆T ⇣ 
D(↵⇤) − D(↵(0)) 
⌘ 
Assumptions: 
are -smooth 
LocalDualMethod makes improvement ⇥ 
per step 
e.g. for SDCA 
⇥ = 
✓ 
1  
!n 
1 + !n 
1 
˜n 
◆H 
`i 1/ 
applies also to duality gap 
0    n/! 
K 
measure of difficulty of data partition
Convergence 
✔ it converges! 
E[D(↵⇤) − D(↵(T))]  
✓ 
1 − (1 − ⇥) 
1 
K 
!n 
# + !n 
◆T ⇣ 
D(↵⇤) − D(↵(0)) 
⌘ 
Assumptions: 
are -smooth 
LocalDualMethod makes improvement ⇥ 
per step 
e.g. for SDCA 
⇥ = 
✓ 
1  
!n 
1 + !n 
1 
˜n 
◆H 
`i 1/ 
applies also to duality gap 
0    n/! 
K 
measure of difficulty of data partition
Convergence 
✔ it converges! 
✔ convergence rate matches 
current state-of-the-art mini-batch 
E[D(↵⇤) − D(↵(T))]  
✓ 
1 − (1 − ⇥) 
1 
K 
!n 
# + !n 
◆T ⇣ 
D(↵⇤) − D(↵(0)) 
⌘ 
Assumptions: 
are -smooth 
LocalDualMethod makes improvement ⇥ 
per step 
e.g. for SDCA 
⇥ = 
✓ 
1  
!n 
1 + !n 
1 
˜n 
◆H 
`i 1/ 
methods 
applies also to duality gap 
0    n/! 
K 
measure of difficulty of data partition
Convergence 
✔ it converges! 
✔ convergence rate matches 
current state-of-the-art mini-batch 
E[D(↵⇤) − D(↵(T))]  
✓ 
1 − (1 − ⇥) 
1 
K 
!n 
# + !n 
◆T ⇣ 
D(↵⇤) − D(↵(0)) 
⌘ 
Assumptions: 
are -smooth 
LocalDualMethod makes improvement ⇥ 
per step 
e.g. for SDCA 
⇥ = 
✓ 
1  
!n 
1 + !n 
1 
˜n 
◆H 
`i 1/ 
methods 
applies also to duality gap 
0    n/! 
K 
measure of difficulty of data partition
LARGE-SCALE OPTIMIZATION 
COCOA 
RESULTS!
LARGE-SCALE OPTIMIZATION 
COCOA 
RESULTS!
Empirical Results in 
Dataset Training (n) Features (d) Sparsity Workers (K) 
Cov 522,911 54 22.22% 4 
Rcv1 677,399 47,236 0.16% 8 
Imagenet 32,751 160,000 100% 32
2 
10 
0 
10 
−2 
10 
−4 
10 
−6 
0 200 400 600 800 
10 
Imagenet 
Time (s) 
Log Primal Suboptimality 
COCOA (H=1e3) 
mini−batch−CD (H=1) 
local−SGD (H=1e3) 
mini−batch−SGD (H=10)
2 
10 
0 
10 
−2 
10 
−4 
10 
−6 
0 200 400 600 800 
10 
Imagenet 
Time (s) 
Log Primal Suboptimality 
COCOA (H=1e3) 
mini−batch−CD (H=1) 
local−SGD (H=1e3) 
mini−batch−SGD (H=10) 
2 
10 
0 
10 
−2 
10 
−4 
10 
−6 
2 
10 
0 
10 
−2 
10 
−4 
10 
−6 
0 20 40 60 80 100 
10 
Cov 
Time (s) 
Log Primal Suboptimality 
COCOA (H=1e5) 
minibatch−CD (H=100) 
local−SGD (H=1e5) 
batch−SGD (H=1) 
0 100 200 300 400 
10 
RCV1 
Time (s) 
Log Primal Suboptimality 
COCOA (H=1e5) 
minibatch−CD (H=100) 
local−SGD (H=1e4) 
batch−SGD (H=100)
COCOA Take-Aways
COCOA Take-Aways 
A framework for distributed optimization
COCOA Take-Aways 
A framework for distributed optimization 
Uses state-of-the-art primal-dual methods
COCOA Take-Aways 
A framework for distributed optimization 
Uses state-of-the-art primal-dual methods 
Reduces communication: 
- applies updates immediately 
- averages over # of machines
COCOA Take-Aways 
A framework for distributed optimization 
Uses state-of-the-art primal-dual methods 
Reduces communication: 
- applies updates immediately 
- averages over # of machines 
Strong empirical results  convergence 
guarantees
COCOA Take-Aways 
A framework for distributed optimization 
Uses state-of-the-art primal-dual methods 
Reduces communication: 
- applies updates immediately 
- averages over # of machines 
Strong empirical results  convergence 
guarantees 
To appear in 
NIPS ‘14
COCOA Take-Aways 
A framework for distributed optimization 
Uses state-of-the-art primal-dual methods 
Reduces communication: 
- applies updates immediately 
- averages over # of machines 
Strong empirical results  convergence 
guarantees 
To appear in 
www.berkeley.edu/~vsmith NIPS ‘14

More Related Content

What's hot

Data Intensive Applications with Apache Flink
Data Intensive Applications with Apache FlinkData Intensive Applications with Apache Flink
Data Intensive Applications with Apache Flink
Simone Robutti
 
Balancing Automation and Explanation in Machine Learning
Balancing Automation and Explanation in Machine LearningBalancing Automation and Explanation in Machine Learning
Balancing Automation and Explanation in Machine Learning
Databricks
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles Baker
Databricks
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
SigOpt
 
Dato Keynote
Dato KeynoteDato Keynote
Dato Keynote
Turi, Inc.
 
Machine Learning With Spark
Machine Learning With SparkMachine Learning With Spark
Machine Learning With Spark
Shivaji Dutta
 
Practical Machine Learning Pipelines with MLlib
Practical Machine Learning Pipelines with MLlibPractical Machine Learning Pipelines with MLlib
Practical Machine Learning Pipelines with MLlib
Databricks
 
Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016
Ram Sriharsha
 
Apache Spark Machine Learning
Apache Spark Machine LearningApache Spark Machine Learning
Apache Spark Machine Learning
Carol McDonald
 
Challenges on Distributed Machine Learning
Challenges on Distributed Machine LearningChallenges on Distributed Machine Learning
Challenges on Distributed Machine Learning
jie cao
 
Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!
DataWorks Summit
 
Advanced Hyperparameter Optimization for Deep Learning with MLflow
Advanced Hyperparameter Optimization for Deep Learning with MLflowAdvanced Hyperparameter Optimization for Deep Learning with MLflow
Advanced Hyperparameter Optimization for Deep Learning with MLflow
Databricks
 
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkYggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Jen Aman
 
The Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the MassesThe Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the Masses
Alice Zheng
 
Deploying Machine Learning Models to Production
Deploying Machine Learning Models to ProductionDeploying Machine Learning Models to Production
Deploying Machine Learning Models to Production
Anass Bensrhir - Senior Data Scientist
 
Machine Learning with Azure
Machine Learning with AzureMachine Learning with Azure
Machine Learning with Azure
Barbara Fusinska
 
MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...
  MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...  MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...
MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...
Spark Summit
 
Inside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick ReissInside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick Reiss
Spark Summit
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Arvind Surve
 
What’s New in the Berkeley Data Analytics Stack
What’s New in the Berkeley Data Analytics StackWhat’s New in the Berkeley Data Analytics Stack
What’s New in the Berkeley Data Analytics Stack
Turi, Inc.
 

What's hot (20)

Data Intensive Applications with Apache Flink
Data Intensive Applications with Apache FlinkData Intensive Applications with Apache Flink
Data Intensive Applications with Apache Flink
 
Balancing Automation and Explanation in Machine Learning
Balancing Automation and Explanation in Machine LearningBalancing Automation and Explanation in Machine Learning
Balancing Automation and Explanation in Machine Learning
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles Baker
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
 
Dato Keynote
Dato KeynoteDato Keynote
Dato Keynote
 
Machine Learning With Spark
Machine Learning With SparkMachine Learning With Spark
Machine Learning With Spark
 
Practical Machine Learning Pipelines with MLlib
Practical Machine Learning Pipelines with MLlibPractical Machine Learning Pipelines with MLlib
Practical Machine Learning Pipelines with MLlib
 
Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016
 
Apache Spark Machine Learning
Apache Spark Machine LearningApache Spark Machine Learning
Apache Spark Machine Learning
 
Challenges on Distributed Machine Learning
Challenges on Distributed Machine LearningChallenges on Distributed Machine Learning
Challenges on Distributed Machine Learning
 
Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!
 
Advanced Hyperparameter Optimization for Deep Learning with MLflow
Advanced Hyperparameter Optimization for Deep Learning with MLflowAdvanced Hyperparameter Optimization for Deep Learning with MLflow
Advanced Hyperparameter Optimization for Deep Learning with MLflow
 
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkYggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
 
The Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the MassesThe Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the Masses
 
Deploying Machine Learning Models to Production
Deploying Machine Learning Models to ProductionDeploying Machine Learning Models to Production
Deploying Machine Learning Models to Production
 
Machine Learning with Azure
Machine Learning with AzureMachine Learning with Azure
Machine Learning with Azure
 
MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...
  MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...  MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...
MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...
 
Inside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick ReissInside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick Reiss
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
 
What’s New in the Berkeley Data Analytics Stack
What’s New in the Berkeley Data Analytics StackWhat’s New in the Berkeley Data Analytics Stack
What’s New in the Berkeley Data Analytics Stack
 

Similar to COCOA: Communication-Efficient Coordinate Ascent

Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016
Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016
Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016
MLconf
 
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Pytorch meetup
Pytorch meetupPytorch meetup
Pytorch meetup
Dmitri Azarnyh
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
Ben Mabey
 
Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris.
Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris. Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris.
Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris.
OW2
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Seongwon Hwang
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
MLconf
 
Type and proof structures for concurrency
Type and proof structures for concurrencyType and proof structures for concurrency
Type and proof structures for concurrency
Facultad de Informática UCM
 
Practical and Worst-Case Efficient Apportionment
Practical and Worst-Case Efficient ApportionmentPractical and Worst-Case Efficient Apportionment
Practical and Worst-Case Efficient Apportionment
Raphael Reitzig
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
NVIDIA Japan
 
Distributed Coordinate Descent for Logistic Regression with Regularization
Distributed Coordinate Descent for Logistic Regression with RegularizationDistributed Coordinate Descent for Logistic Regression with Regularization
Distributed Coordinate Descent for Logistic Regression with Regularization
Илья Трофимов
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
Wush Wu
 
Session 4 .pdf
Session 4 .pdfSession 4 .pdf
Session 4 .pdf
ssuser8cda84
 
From grep to BERT
From grep to BERTFrom grep to BERT
From grep to BERT
QAware GmbH
 
pptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacespptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacesbutest
 
pptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacespptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacesbutest
 
Lecture 1
Lecture 1Lecture 1
Lecture 1butest
 
Speeding Up Distributed Machine Learning Using Codes
Speeding Up Distributed Machine Learning Using CodesSpeeding Up Distributed Machine Learning Using Codes
Speeding Up Distributed Machine Learning Using Codes
NAVER Engineering
 
Leveraging R in Big Data of Mobile Ads (R在行動廣告大數據的應用)
Leveraging R in Big Data of Mobile Ads (R在行動廣告大數據的應用)Leveraging R in Big Data of Mobile Ads (R在行動廣告大數據的應用)
Leveraging R in Big Data of Mobile Ads (R在行動廣告大數據的應用)
Craig Chao
 
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์  เอื้อวัฒนามงคลMachine Learning: An introduction โดย รศ.ดร.สุรพงค์  เอื้อวัฒนามงคล
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล
BAINIDA
 

Similar to COCOA: Communication-Efficient Coordinate Ascent (20)

Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016
Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016
Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016
 
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
 
Pytorch meetup
Pytorch meetupPytorch meetup
Pytorch meetup
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
 
Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris.
Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris. Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris.
Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris.
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in Theano
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
 
Type and proof structures for concurrency
Type and proof structures for concurrencyType and proof structures for concurrency
Type and proof structures for concurrency
 
Practical and Worst-Case Efficient Apportionment
Practical and Worst-Case Efficient ApportionmentPractical and Worst-Case Efficient Apportionment
Practical and Worst-Case Efficient Apportionment
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
 
Distributed Coordinate Descent for Logistic Regression with Regularization
Distributed Coordinate Descent for Logistic Regression with RegularizationDistributed Coordinate Descent for Logistic Regression with Regularization
Distributed Coordinate Descent for Logistic Regression with Regularization
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
 
Session 4 .pdf
Session 4 .pdfSession 4 .pdf
Session 4 .pdf
 
From grep to BERT
From grep to BERTFrom grep to BERT
From grep to BERT
 
pptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacespptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspaces
 
pptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacespptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspaces
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
Speeding Up Distributed Machine Learning Using Codes
Speeding Up Distributed Machine Learning Using CodesSpeeding Up Distributed Machine Learning Using Codes
Speeding Up Distributed Machine Learning Using Codes
 
Leveraging R in Big Data of Mobile Ads (R在行動廣告大數據的應用)
Leveraging R in Big Data of Mobile Ads (R在行動廣告大數據的應用)Leveraging R in Big Data of Mobile Ads (R在行動廣告大數據的應用)
Leveraging R in Big Data of Mobile Ads (R在行動廣告大數據的應用)
 
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์  เอื้อวัฒนามงคลMachine Learning: An introduction โดย รศ.ดร.สุรพงค์  เอื้อวัฒนามงคล
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล
 

More from jeykottalam

AMP Camp 5 Intro
AMP Camp 5 IntroAMP Camp 5 Intro
AMP Camp 5 Intro
jeykottalam
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQL
jeykottalam
 
Concurrency Control for Parallel Machine Learning
Concurrency Control for Parallel Machine LearningConcurrency Control for Parallel Machine Learning
Concurrency Control for Parallel Machine Learning
jeykottalam
 
SparkR: Enabling Interactive Data Science at Scale
SparkR: Enabling Interactive Data Science at ScaleSparkR: Enabling Interactive Data Science at Scale
SparkR: Enabling Interactive Data Science at Scale
jeykottalam
 
SampleClean: Bringing Data Cleaning into the BDAS Stack
SampleClean: Bringing Data Cleaning into the BDAS StackSampleClean: Bringing Data Cleaning into the BDAS Stack
SampleClean: Bringing Data Cleaning into the BDAS Stack
jeykottalam
 
The BDAS Open Source Community
The BDAS Open Source CommunityThe BDAS Open Source Community
The BDAS Open Source Community
jeykottalam
 

More from jeykottalam (6)

AMP Camp 5 Intro
AMP Camp 5 IntroAMP Camp 5 Intro
AMP Camp 5 Intro
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQL
 
Concurrency Control for Parallel Machine Learning
Concurrency Control for Parallel Machine LearningConcurrency Control for Parallel Machine Learning
Concurrency Control for Parallel Machine Learning
 
SparkR: Enabling Interactive Data Science at Scale
SparkR: Enabling Interactive Data Science at ScaleSparkR: Enabling Interactive Data Science at Scale
SparkR: Enabling Interactive Data Science at Scale
 
SampleClean: Bringing Data Cleaning into the BDAS Stack
SampleClean: Bringing Data Cleaning into the BDAS StackSampleClean: Bringing Data Cleaning into the BDAS Stack
SampleClean: Bringing Data Cleaning into the BDAS Stack
 
The BDAS Open Source Community
The BDAS Open Source CommunityThe BDAS Open Source Community
The BDAS Open Source Community
 

Recently uploaded

Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Jay Das
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Anthony Dahanne
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 

Recently uploaded (20)

Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 

COCOA: Communication-Efficient Coordinate Ascent

  • 1. COCOA Communication-Efficient Coordinate Ascent Virginia Smith ! Martin Jaggi, Martin Takáč, Jonathan Terhorst, Sanjay Krishnan, Thomas Hofmann, & Michael I. Jordan
  • 2.
  • 7. Machine Learning with Large Datasets
  • 8. Machine Learning with Large Datasets
  • 9. Machine Learning with Large Datasets image/music/video tagging document categorization item recommendation click-through rate prediction sequence tagging protein structure prediction sensor data prediction spam classification fraud detection
  • 11. Machine Learning Workflow DATA & PROBLEM classification, regression, collaborative filtering, …
  • 12. Machine Learning Workflow DATA & PROBLEM classification, regression, collaborative filtering, … MACHINE LEARNING MODEL logistic regression, lasso, support vector machines, …
  • 13. Machine Learning Workflow DATA & PROBLEM classification, regression, collaborative filtering, … MACHINE LEARNING MODEL logistic regression, lasso, support vector machines, … OPTIMIZATION ALGORITHM gradient descent, coordinate descent, Newton’s method, …
  • 20. Example: SVM Classification wx-b=1 wx-b=0 w 2 / ||w|| wx-b=-1
  • 21. Example: SVM Classification wx-b=1 wx-b=0 w 2 / ||w|| wx-b=-1 min w2Rd ! 2 ||w||2 + 1 n Xn i=1 `hinge(yiwT xi)
  • 22. Example: SVM Classification wx-b=1 wx-b=0 w 2 / ||w|| wx-b=-1 min w2Rd ! 2 ||w||2 + 1 n Xn i=1 `hinge(yiwT xi) Descent algorithms and line search methods Acceleration, momentum, and conjugate gradients Newton and Quasi-Newton methods Coordinate descent Stochastic and incremental gradient methods SMO SVMlight LIBLINEAR
  • 23. Machine Learning Workflow DATA & PROBLEM classification, regression, collaborative filtering, … MACHINE LEARNING MODEL logistic regression, lasso, support vector machines, … OPTIMIZATION ALGORITHM gradient descent, coordinate descent, Newton’s method, …
  • 24. Machine Learning Workflow DATA & PROBLEM classification, regression, collaborative filtering, … MACHINE LEARNING MODEL logistic regression, lasso, support vector machines, … OPTIMIZATION ALGORITHM gradient descent, coordinate descent, Newton’s method, … SYSTEMS SETTING multi-core, cluster, cloud, supercomputer, …
  • 25. Machine Learning Workflow DATA & PROBLEM classification, regression, collaborative filtering, … Open Problem: MACHINE LEARNING MODEL logistic regression, lasso, support vector machines, … efficiently solving objective when data is distributed OPTIMIZATION ALGORITHM gradient descent, coordinate descent, Newton’s method, … SYSTEMS SETTING multi-core, cluster, cloud, supercomputer, …
  • 30. Distributed Optimization “always communicate” reduce: w = w ↵ P k w
  • 31. Distributed Optimization ✔ convergence guarantees “always communicate” reduce: w = w ↵ P k w
  • 32. Distributed Optimization ✔ convergence guarantees ✗ high communication “always communicate” reduce: w = w ↵ P k w
  • 33. Distributed Optimization ✔ convergence guarantees ✗ high communication “always communicate” reduce: w = w ↵ P k w
  • 34. Distributed Optimization ✔ convergence guarantees ✗ high communication average: w := 1K P k wk “always communicate” reduce: w = w ↵ P k w
  • 35. Distributed Optimization ✔ convergence guarantees ✗ high communication average: w := 1K P k wk “always communicate” “never communicate” reduce: w = w ↵ P k w
  • 36. Distributed Optimization ✔ convergence guarantees ✗ high communication ✔ low communication average: w := 1K P k wk “always communicate” “never communicate” reduce: w = w ↵ P k w
  • 37. Distributed Optimization ✔ convergence guarantees ✗ high communication ✔ low communication average: w := 1K P k wk “always communicate” “never communicate” ZDWJ, 2012 reduce: w = w ↵ P k w
  • 38. Distributed Optimization ✔ convergence guarantees ✗ high communication ✔ low communication ✗ convergence not guaranteed average: w := 1K P k wk “always communicate” “never communicate” ZDWJ, 2012 reduce: w = w ↵ P k w
  • 41. Mini-batch reduce: w = w ↵ |b| P i2b w
  • 42. Mini-batch reduce: w = w ↵ |b| P i2b w
  • 43. Mini-batch ✔ convergence guarantees reduce: w = w ↵ |b| P i2b w
  • 44. Mini-batch ✔ convergence guarantees ✔ tunable communication reduce: w = w ↵ |b| P i2b w
  • 45. Mini-batch ✔ convergence guarantees ✔ tunable communication a natural middle-ground reduce: w = w ↵ |b| P i2b w
  • 46. Mini-batch ✔ convergence guarantees ✔ tunable communication a natural middle-ground reduce: w = w ↵ |b| P i2b w
  • 49. Mini-batch Limitations 1. METHODS BEYOND SGD
  • 50. Mini-batch Limitations 1. METHODS BEYOND SGD 2. STALE UPDATES
  • 51. Mini-batch Limitations 1. METHODS BEYOND SGD 2. STALE UPDATES 3. AVERAGE OVER BATCH SIZE
  • 54. Mini-batch Limitations 1. METHODS BEYOND SGD 2. STALE UPDATES 3. AVERAGE OVER BATCH SIZE
  • 55. Mini-batch Limitations 1. METHODS BEYOND SGD Use Primal-Dual Framework 2. STALE UPDATES Immediately apply local updates 3. AVERAGE OVER BATCH SIZE Average over K batch size
  • 56. Communication-Efficient Distributed Dual Coordinate Ascent (CoCoA) 1. METHODS BEYOND SGD Use Primal-Dual Framework 2. STALE UPDATES Immediately apply local updates 3. AVERAGE OVER BATCH SIZE Average over K batch size
  • 57. 1. Primal-Dual Framework PRIMAL ≥ DUAL
  • 58. 1. Primal-Dual Framework PRIMAL ≥ DUAL min w2Rd P(w) := ! 2 ||w||2 + 1 n Xn i=1 `i(wT xi) #
  • 59. 1. Primal-Dual Framework PRIMAL ≥ DUAL min w2Rd P(w) := ! 2 ||w||2 + 1 n Xn i=1 `i(wT xi) # max ↵2Rn D(↵) := ||A↵||2 1 n Xn i=1 `⇤i (↵i) # Ai = 1 n xi
  • 60. 1. Primal-Dual Framework ≥ PRIMAL DUAL Stopping criteria given by duality gap min w2Rd P(w) := ! 2 ||w||2 + 1 n Xn i=1 `i(wT xi) # max ↵2Rn D(↵) := ||A↵||2 1 n Xn i=1 `⇤i (↵i) # Ai = 1 n xi
  • 61. 1. Primal-Dual Framework ≥ PRIMAL DUAL Stopping criteria given by duality gap Good performance in practice min w2Rd P(w) := ! 2 ||w||2 + 1 n Xn i=1 `i(wT xi) # max ↵2Rn D(↵) := ||A↵||2 1 n Xn i=1 `⇤i (↵i) # Ai = 1 n xi
  • 62. 1. Primal-Dual Framework ≥ PRIMAL DUAL Stopping criteria given by duality gap Good performance in practice Default in software packages e.g. liblinear min w2Rd P(w) := ! 2 ||w||2 + 1 n Xn i=1 `i(wT xi) # max ↵2Rn D(↵) := ||A↵||2 1 n Xn i=1 `⇤i (↵i) # Ai = 1 n xi
  • 64. 2. Immediately Apply Updates for i 2 b w w − ↵riP(w) end w w + w STALE
  • 65. 2. Immediately Apply Updates for i 2 b w w − ↵riP(w) end w w + w STALE for i 2 b w w − ↵riP(w) w w + w end Output: ↵[k] and w := A[k]↵[k] FRESH
  • 67. 3. Average over K reduce: w = w + 1K P k wk
  • 68. CoCoA ni Algorithm 1: CoCoA Input: T ! 1, scaling parameter 1  K  K (default: K := 1). Data: {(xi, yi)}=1 distributed over K machines Initialize: ↵(0) [k] 0 for all machines k, and w(0) 0 for t = 1, 2, . . . ,T for all machines k = 1, 2, . . . ,K in parallel (↵[k],wk) LocalDualMethod(↵(t1) [k] ,w(t1)) [k] ↵(t1) [k] + K ↵(t) K ↵[k] end reduce w(t) w(t1) + K K PK k=1 wk end Procedure A: LocalDualMethod: Dual algorithm on machine k Input: Local ↵[k] 2 Rnk , and w 2 Rd consistent with other coordinate blocks of ↵ s.t. w = A↵ Data: Local {(xi, yi)}nk i=1 Output: ↵[k] and w := A[k]↵[k]
  • 69. CoCoA Algorithm 1: CoCoA Input: T ! 1, scaling parameter 1  K  K (default: K := 1). Data: {(xi, yi)}ni ✔ 10 lines of code in Spark =1 distributed over K machines Initialize: ↵(0) [k] 0 for all machines k, and w(0) 0 for t = 1, 2, . . . ,T for all machines k = 1, 2, . . . ,K in parallel (↵[k],wk) LocalDualMethod(↵(t1) [k] ,w(t1)) [k] ↵(t1) [k] + K ↵(t) K ↵[k] end reduce w(t) w(t1) + K K PK k=1 wk end Procedure A: LocalDualMethod: Dual algorithm on machine k Input: Local ↵[k] 2 Rnk , and w 2 Rd consistent with other coordinate blocks of ↵ s.t. w = A↵ Data: Local {(xi, yi)}nk i=1 Output: ↵[k] and w := A[k]↵[k]
  • 70. CoCoA Algorithm 1: CoCoA Input: T ! 1, scaling parameter 1  K  K (default: K := 1). Data: {(xi, yi)}ni ✔ 10 lines of code in Spark =1 distributed over K machines Initialize: ↵(0) [k] 0 for all machines k, and w(0) 0 ✔ primal-dual framework allows for any internal optimization method for t = 1, 2, . . . ,T for all machines k = 1, 2, . . . ,K in parallel (↵[k],wk) LocalDualMethod(↵(t1) [k] ,w(t1)) [k] ↵(t1) [k] + K ↵(t) K ↵[k] end reduce w(t) w(t1) + K K PK k=1 wk end Procedure A: LocalDualMethod: Dual algorithm on machine k Input: Local ↵[k] 2 Rnk , and w 2 Rd consistent with other coordinate blocks of ↵ s.t. w = A↵ Data: Local {(xi, yi)}nk i=1 Output: ↵[k] and w := A[k]↵[k]
  • 71. CoCoA Algorithm 1: CoCoA Input: T ! 1, scaling parameter 1  K  K (default: K := 1). Data: {(xi, yi)}ni ✔ 10 lines of code in Spark =1 distributed over K machines Initialize: ↵(0) [k] 0 for all machines k, and w(0) 0 ✔ primal-dual framework allows for any internal optimization method for t = 1, 2, . . . ,T for all machines k = 1, 2, . . . ,K in parallel (↵[k],wk) LocalDualMethod(↵(t1) [k] ,w(t1)) [k] ↵(t1) [k] + K ↵(t) K ↵[k] end reduce w(t) w(t1) + K ✔ local updates K applied immediately PK k=1 wk end Procedure A: LocalDualMethod: Dual algorithm on machine k Input: Local ↵[k] 2 Rnk , and w 2 Rd consistent with other coordinate blocks of ↵ s.t. w = A↵ Data: Local {(xi, yi)}nk i=1 Output: ↵[k] and w := A[k]↵[k]
  • 72. CoCoA Algorithm 1: CoCoA Input: T ! 1, scaling parameter 1  K  K (default: K := 1). Data: {(xi, yi)}ni ✔ 10 lines of code in Spark =1 distributed over K machines Initialize: ↵(0) [k] 0 for all machines k, and w(0) 0 ✔ primal-dual framework allows for any internal optimization method for t = 1, 2, . . . ,T for all machines k = 1, 2, . . . ,K in parallel (↵[k],wk) LocalDualMethod(↵(t1) [k] ,w(t1)) [k] ↵(t1) [k] + K ↵(t) K ↵[k] end reduce w(t) w(t1) + K ✔ local updates K applied immediately PK k=1 wk end Procedure A: LocalDualMethod: Dual algorithm on machine k Input: Local ↵[✔ k] 2 Rnk average , and w 2 Rd consistent over with K other coordinate blocks of ↵ s.t. w = A↵ Data: Local {(xi, yi)}nk i=1 Output: ↵[k] and w := A[k]↵[k]
  • 73. Convergence E[D(↵⇤) − D(↵(T))]  ✓ 1 − (1 − ⇥) 1 K !n # + !n ◆T ⇣ D(↵⇤) − D(↵(0)) ⌘ Assumptions: are -smooth LocalDualMethod makes improvement ⇥ per step e.g. for SDCA ⇥ = ✓ 1 !n 1 + !n 1 ˜n ◆H `i 1/ applies also to duality gap 0   n/! K measure of difficulty of data partition
  • 74. Convergence ✔ it converges! E[D(↵⇤) − D(↵(T))]  ✓ 1 − (1 − ⇥) 1 K !n # + !n ◆T ⇣ D(↵⇤) − D(↵(0)) ⌘ Assumptions: are -smooth LocalDualMethod makes improvement ⇥ per step e.g. for SDCA ⇥ = ✓ 1 !n 1 + !n 1 ˜n ◆H `i 1/ applies also to duality gap 0   n/! K measure of difficulty of data partition
  • 75. Convergence ✔ it converges! ✔ convergence rate matches current state-of-the-art mini-batch E[D(↵⇤) − D(↵(T))]  ✓ 1 − (1 − ⇥) 1 K !n # + !n ◆T ⇣ D(↵⇤) − D(↵(0)) ⌘ Assumptions: are -smooth LocalDualMethod makes improvement ⇥ per step e.g. for SDCA ⇥ = ✓ 1 !n 1 + !n 1 ˜n ◆H `i 1/ methods applies also to duality gap 0   n/! K measure of difficulty of data partition
  • 76. Convergence ✔ it converges! ✔ convergence rate matches current state-of-the-art mini-batch E[D(↵⇤) − D(↵(T))]  ✓ 1 − (1 − ⇥) 1 K !n # + !n ◆T ⇣ D(↵⇤) − D(↵(0)) ⌘ Assumptions: are -smooth LocalDualMethod makes improvement ⇥ per step e.g. for SDCA ⇥ = ✓ 1 !n 1 + !n 1 ˜n ◆H `i 1/ methods applies also to duality gap 0   n/! K measure of difficulty of data partition
  • 79. Empirical Results in Dataset Training (n) Features (d) Sparsity Workers (K) Cov 522,911 54 22.22% 4 Rcv1 677,399 47,236 0.16% 8 Imagenet 32,751 160,000 100% 32
  • 80. 2 10 0 10 −2 10 −4 10 −6 0 200 400 600 800 10 Imagenet Time (s) Log Primal Suboptimality COCOA (H=1e3) mini−batch−CD (H=1) local−SGD (H=1e3) mini−batch−SGD (H=10)
  • 81. 2 10 0 10 −2 10 −4 10 −6 0 200 400 600 800 10 Imagenet Time (s) Log Primal Suboptimality COCOA (H=1e3) mini−batch−CD (H=1) local−SGD (H=1e3) mini−batch−SGD (H=10) 2 10 0 10 −2 10 −4 10 −6 2 10 0 10 −2 10 −4 10 −6 0 20 40 60 80 100 10 Cov Time (s) Log Primal Suboptimality COCOA (H=1e5) minibatch−CD (H=100) local−SGD (H=1e5) batch−SGD (H=1) 0 100 200 300 400 10 RCV1 Time (s) Log Primal Suboptimality COCOA (H=1e5) minibatch−CD (H=100) local−SGD (H=1e4) batch−SGD (H=100)
  • 83. COCOA Take-Aways A framework for distributed optimization
  • 84. COCOA Take-Aways A framework for distributed optimization Uses state-of-the-art primal-dual methods
  • 85. COCOA Take-Aways A framework for distributed optimization Uses state-of-the-art primal-dual methods Reduces communication: - applies updates immediately - averages over # of machines
  • 86. COCOA Take-Aways A framework for distributed optimization Uses state-of-the-art primal-dual methods Reduces communication: - applies updates immediately - averages over # of machines Strong empirical results convergence guarantees
  • 87. COCOA Take-Aways A framework for distributed optimization Uses state-of-the-art primal-dual methods Reduces communication: - applies updates immediately - averages over # of machines Strong empirical results convergence guarantees To appear in NIPS ‘14
  • 88. COCOA Take-Aways A framework for distributed optimization Uses state-of-the-art primal-dual methods Reduces communication: - applies updates immediately - averages over # of machines Strong empirical results convergence guarantees To appear in www.berkeley.edu/~vsmith NIPS ‘14