SlideShare a Scribd company logo
Research and Technology Center North America | Sauptik Dhar, Mohak Shah | 5/21/2017
© 2017 Robert Bosch LLC and affiliates. All rights reserved.
1
ADMM based Scalable Machine Learning on Apache Spark
Sauptik Dhar, Mohak Shah
Bosch AI Research
Data is transformational
Research and Technology Center North America | Sauptik Dhar, Mohak Shah | 5/21/2017
© 2017 Robert Bosch LLC and affiliates. All rights reserved.
2
Image: https://www.slideshare.net/mongodb/internet-of-things-and-big-data-vision-and-concrete-use-cases
Big data, Spark and Status-quo
Research and Technology Center North America | Sauptik Dhar, Mohak Shah | 5/21/2017
© 2017 Robert Bosch LLC and affiliates. All rights reserved.
3
 Challenges
 Learning  (Convex) Optimization
 Current solutions (MLLib/ML packages) adopt
‒ SGD: convergence dependent on step-size, conditionality
‒ LBFGS: Adapting to non-differentiable functions non-trivial
 ADMM (Alternating Direction Method of Multipliers)
 Large problem  (simpler) sub-problems
 Guaranteed convergence and robustness to step-size selection
 Robust to ill-conditioned problems
https://www.simplilearn.com/apache-spark-guide-
for-newbies-article
 ADMM for Spark
 Coverage (ML algorithms)
 Go beyond MLLib/ML for Python community
 Address sub-optimality leading from internal normalization (MLLib/ML)
ADMM advantages
Research and Technology Center North America | Sauptik Dhar, Mohak Shah | 5/21/2017
© 2017 Robert Bosch LLC and affiliates. All rights reserved.
4
 Generic ADMM based formulation: Coverage (ML algorithms)
 Robust Guarantees on Convergence and Accuracy
 Python API’s give accessibility to users, developers and designer
ADMML Package
Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017
© 2017 Robert Bosch LLC and affiliates. All rights reserved.
5
Developers
ADMM
engine
Loss +
Regularizer
ML
Algorithms
Designers
Users
PythonAPIs
Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017
© 2017 Robert Bosch LLC and affiliates. All rights reserved.
6
 Given: training data where
 Solve: where,
1
( , )N
i i i
y 
x Dx 
, ( )
{ 1, 1} , ( )
y
regression
classification



 

,,
min ( ( ), ) ( )bb
L f y Rww
x w
,
( ) T
b
f b w
x w x
Methods Loss Function Regularizer
Classification Elastic-net
Group
Logistic Regression
LS-SVM
Squared-Hinge SVM
Regression
Linear Regression
Huber
Pseudo-Huber
{ 1, 1}y  
y
,( ( ), )bL f yw x ( )R w
( )
1
(1 ) log(1 )
T
i i
N y b
i
N e 

 w x
2
1
(1 2 ) (1 ( ))
N T
i ii
N y b
  w x
2
1
(1 2 ) (max(1 ( )))
N T
i ii
N y b
  w x
2
1
(1 2 ) ( ( ))
N T
i ii
N y b
  w x
2
1 2
1 2( ( )) , if ( )
(1 )
( ) (1 2) , else
T T
i i i iN
i T
i i
y b y b
N
y b

 

     

  

w x w x
w x
2 2
1
(1 ) ( ( ))
N T
i ii
N y b 
    w x
2
1
(1 )
2
D j
jj
j
w
w  
 
 
 
 
 
  
2g g
g G
w


0 (L2-reg) 
1 (L1-reg) 
Generic ML formulation
Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017
© 2017 Robert Bosch LLC and affiliates. All rights reserved.
7
Solve smaller sub-problems which can be easily distributed.
min ( ( ), ) ( )L f y Rww
x w
,
min ( ( ), ) ( ) s.t.L f y R  ww z
x z w z
2: ( ( ), ) ( ) ( 2) ( )L f y R constw
x z w z u
      
1k 
21
argmin L( ( ), ) ( 2)k k k
f y 
   w
w
w x w z u
21 1
argmin ( ) ( 2)k k k
R  
   
z
z z w z u
1 1
( )k k k k 
  u u w z
ADMM algorithm
Augmented Lagrangian,
ADMM Steps at each iteration
 w - update:
 z - update:
 u – update:
Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017
© 2017 Robert Bosch LLC and affiliates. All rights reserved.
8
 Given with and
 Solve
 ADMM updates,
1
( , )N
i i i
y 
x Dx  y
21 2
2
1argmin ( )
2 21
k T k k
i i
N
y
N i
     

w x w w z u
w
2
21 1
21
1
1 =
argmin
2
1
(1 )
2
( )
; and (1 )
(1 )
j
D jk k k
jj
j
k k
j jk
ji j
j
S
Sz t
t
w z u
z
z
z z
w u

   

   
  
 
    
 
 



  

 

  

1
1 1
;
1 1
and ( )
N NT k k
i i i iD
i i
P P I q y
N N
 
 
     q x x x z u
2
2
1 1
min
1
(1 ) + ( )
2 2
ND Tj
j i ij
j i
y
Nw
w
w x w   
 
 
 
  
 
   
Example (Linear Regression with Elastic-Net)
Example (Logistic Regression with Elastic-Net)
Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017
© 2017 Robert Bosch LLC and affiliates. All rights reserved.
9
 Given with and
 Solve
 ADMM updates,
 Gradient
 Hessian
1
( , )N
i i i
y 
x Dx  { 1, 1}y  
2
1 1
min
1
(1 ) + log(1 )
2
T
i i
ND yj
jj
j i
e
Nw
w xw
w   
 
 
 
  
 
   
1
2
2
1argmin log(1 )
1
2
Tyk i i
k k
N
e
N i

  

  
x w
w
w
w z u
1 ( )(1 ) k k
i i iiN
y p     w z ux
1 (1 ) T
i i i ii I
N
p p  x x
Algorithm 1: Iterative Algorithm for
Input:
Output :
initialize
while not converged do
return
1k w
, ,k k k
w z u
1k 
w
(0)
, 0k
j wv
( )
( )
( )
( )
( 1) ( ) ( ) 1 ( )
1 (1 )
1
1
1 (1 )
(1 ) ( )
( )
T j
ij
i
Tj
i i i
j k k
i i i
j j j j
p e
P
N
j j
p p IiN
y pi
P



 
 


 
 
    
 
x v
q
xx
x w z u
v v q
1 ( )k j
 vw
Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017
© 2017 Robert Bosch LLC and affiliates. All rights reserved.
10
• rho_initialize
• rho_adjustment
• w_update
• z_update
admml.py
• mapHv
• _mapHv_logistic_binary
• _mapHv_sq_hinge_binary
• _mapHv_pseudo_huber_regression
• _mapHv_huber_regression
mappers.py
• combineHv
• log1pexpx
• sigmoid
• append_one
• rdd_to_df
• df_to_rdd
• parse_vector
• scale_data
• uniform_scale
• normalize_scale
utils.py
• _ADMM_ML_Algorithm
• ADMMLogistic
• ADMML2SVM
• ADMMLSSVM
• ADMMLeastSquares
• ADMMPseudoHuber
• ADMMHuber
• functionVals
• predict
mlalgs.py
examples.py
users
designers
developers
optimization
Code Structure (UML diagram)
Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017
© 2017 Robert Bosch LLC and affiliates. All rights reserved.
11
System Configuration:
– No. of nodes = 6 (Hortonworks 2.7)
– No. of cores (per node) = 12 (@ 3.20GHz)
– RAM size (per node) = 64 GB.
– Hard disk size (per node) = 2 TB.
Data : and
 Small Data: No. of samples = 10000000 ( ~5GB )
 Big Data: No. of samples = 100000000 ( ~50GB )
Spark Configuration:
– No. of executors = 15.
– Cores per executor = 2.
– Executor memory = 10 GB.
– Driver memory = 2GB.
20
[ 1,1]U x
1 2 3 4 5 6 7 8 9 10 205( ) 1( ) 5( ) 1( ) ... 2y x x x x x x x x x x x              
(0,1)  
Experiment (Regression)
Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017
© 2017 Robert Bosch LLC and affiliates. All rights reserved.
12
MLLIB (SGD) ML (L-BFGS) ADMM
Small Data: No. of samples = 10000000 ( ~5GB )
1584.6 (9.6) 290.8 (4.4)* 653.5(0.75)
Big Data: No. of samples = 100000000 ( ~50GB )
2597 (8.5) 5811 (7.6)
Table. Time comparisons ADMM vs. MLLib (in sec)

 ADMM provides competitive computation speeds compared to L-BFGS.
 Initial - selection and advanced adaptive strategies can lead to faster convergence.
For example: following [4] convergence in 5 iterations ~ 485 sec.
 For un-normalized data ML / MLLib internal normalization may lead to sub-optimal
solutions. For example: ML / MLLib : 15.07 and ADMML: 12.34

optf  optf 
* convergence in 1 iteration
convergence in 7 iteration†
†
Experiment (Regression)
 Generic ADMM based formulation: Coverage (ML algorithms)
 Robust Guarantees on Convergence and Accuracy
 Python API’s give accessibility to users, developers and designer
ADMML Package
Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017
© 2017 Robert Bosch LLC and affiliates. All rights reserved.
13
Developers
ADMM
engine
Loss +
Regularizer
ML
Algorithms
Designers
Users
PythonAPIs
Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017
© 2017 Robert Bosch LLC and affiliates. All rights reserved.
14
Please cite: Dhar S, Yi C, Ramakrishnan N, Shah M. ADMM based scalable machine learning on Spark. IEEE Big Data 2015.
https://github.com/DL-Benchmarks/ADMML
Contributors: Sauptik Dhar, Naveen Ramakrishnan, Jeff Irion, Jiayi Liu, Unmesh Kurup, Mohak Shah
Current Algorithms
Classification (Elastic/Group - Regularized)
- Logistic Regression
- LS-SVM
- L2-SVM
Regression (Elastic/Group - Regularized)
- Least Squares (i.e. Ridge Regression, Lasso, Elastic-net, Group-Lasso etc.)
- Huber
- Pseudo-Huber
Future Roadmap
- Initial step-size selection.
- Consensus ADMM (coverage for various discontinuous loss functions, like SVMs, alpha- trimmed functions etc.)
- Multiclass algorithms (like, multinomial logistic regression, C&S-SVM etc.)

More Related Content

What's hot

Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...
Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...
Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...
Accumulo Summit
 
Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...
Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...
Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...
Spark Summit
 
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Spark Summit
 
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Spark Summit
 
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Databricks
 
Demystifying DataFrame and Dataset
Demystifying DataFrame and DatasetDemystifying DataFrame and Dataset
Demystifying DataFrame and Dataset
Kazuaki Ishizaki
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
MLconf
 
Ray: A Cluster Computing Engine for Reinforcement Learning Applications with ...
Ray: A Cluster Computing Engine for Reinforcement Learning Applications with ...Ray: A Cluster Computing Engine for Reinforcement Learning Applications with ...
Ray: A Cluster Computing Engine for Reinforcement Learning Applications with ...
Databricks
 
Generalized Linear Models with H2O
Generalized Linear Models with H2O Generalized Linear Models with H2O
Generalized Linear Models with H2O
Sri Ambati
 
Deep Dive Into Catalyst: Apache Spark 2.0’s Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0’s OptimizerDeep Dive Into Catalyst: Apache Spark 2.0’s Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0’s Optimizer
Databricks
 
Designing Distributed Machine Learning on Apache Spark
Designing Distributed Machine Learning on Apache SparkDesigning Distributed Machine Learning on Apache Spark
Designing Distributed Machine Learning on Apache Spark
Databricks
 
Inside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick ReissInside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick Reiss
Spark Summit
 
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Databricks
 
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S OptimizerDeep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Spark Summit
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Spark Summit
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache Spark
Cloudera, Inc.
 
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Spark Summit
 
GraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQLGraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQL
Spark Summit
 
Scaling up data science applications
Scaling up data science applicationsScaling up data science applications
Scaling up data science applications
Kexin Xie
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
Databricks
 

What's hot (20)

Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...
Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...
Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...
 
Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...
Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...
Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...
 
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
 
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
 
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
 
Demystifying DataFrame and Dataset
Demystifying DataFrame and DatasetDemystifying DataFrame and Dataset
Demystifying DataFrame and Dataset
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
 
Ray: A Cluster Computing Engine for Reinforcement Learning Applications with ...
Ray: A Cluster Computing Engine for Reinforcement Learning Applications with ...Ray: A Cluster Computing Engine for Reinforcement Learning Applications with ...
Ray: A Cluster Computing Engine for Reinforcement Learning Applications with ...
 
Generalized Linear Models with H2O
Generalized Linear Models with H2O Generalized Linear Models with H2O
Generalized Linear Models with H2O
 
Deep Dive Into Catalyst: Apache Spark 2.0’s Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0’s OptimizerDeep Dive Into Catalyst: Apache Spark 2.0’s Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0’s Optimizer
 
Designing Distributed Machine Learning on Apache Spark
Designing Distributed Machine Learning on Apache SparkDesigning Distributed Machine Learning on Apache Spark
Designing Distributed Machine Learning on Apache Spark
 
Inside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick ReissInside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick Reiss
 
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
 
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S OptimizerDeep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache Spark
 
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
 
GraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQLGraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQL
 
Scaling up data science applications
Scaling up data science applicationsScaling up data science applications
Scaling up data science applications
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
 

Similar to ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mohak Shah

OPTEX MATHEMATICAL MODELING AND MANAGEMENT SYSTEM
OPTEX MATHEMATICAL MODELING AND MANAGEMENT SYSTEMOPTEX MATHEMATICAL MODELING AND MANAGEMENT SYSTEM
OPTEX MATHEMATICAL MODELING AND MANAGEMENT SYSTEM
Jesus Velasquez
 
Crude-Oil Scheduling Technology: moving from simulation to optimization
Crude-Oil Scheduling Technology: moving from simulation to optimizationCrude-Oil Scheduling Technology: moving from simulation to optimization
Crude-Oil Scheduling Technology: moving from simulation to optimization
Brenno Menezes
 
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Ruairi de Frein
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」
Preferred Networks
 
OPTEX Mathematical Modeling and Management System
OPTEX Mathematical Modeling and Management SystemOPTEX Mathematical Modeling and Management System
OPTEX Mathematical Modeling and Management SystemJesus Velasquez
 
OPTEX - Mathematical Modeling and Management System
OPTEX - Mathematical Modeling and Management System OPTEX - Mathematical Modeling and Management System
OPTEX - Mathematical Modeling and Management System
Jesus Velasquez
 
The Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingThe Other HPC: High Productivity Computing
The Other HPC: High Productivity Computing
University of Washington
 
【論文紹介】Relay: A New IR for Machine Learning Frameworks
【論文紹介】Relay: A New IR for Machine Learning Frameworks【論文紹介】Relay: A New IR for Machine Learning Frameworks
【論文紹介】Relay: A New IR for Machine Learning Frameworks
Takeo Imai
 
UK ATC 2015: Automated Post Processing of Multimodel Optimisation Data
UK ATC 2015: Automated Post Processing of Multimodel Optimisation DataUK ATC 2015: Automated Post Processing of Multimodel Optimisation Data
UK ATC 2015: Automated Post Processing of Multimodel Optimisation Data
Altair
 
6. Implementation
6. Implementation6. Implementation
Smooth-and-Dive Accelerator: A Pre-MILP Primal Heuristic applied to Scheduling
Smooth-and-Dive Accelerator: A Pre-MILP Primal Heuristic applied to SchedulingSmooth-and-Dive Accelerator: A Pre-MILP Primal Heuristic applied to Scheduling
Smooth-and-Dive Accelerator: A Pre-MILP Primal Heuristic applied to Scheduling
Alkis Vazacopoulos
 
Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...
Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...
Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...
Daniel Lemire
 
A Future for R: Parallel and Distributed Processing in R for Everyone
A Future for R: Parallel and Distributed Processing in R for EveryoneA Future for R: Parallel and Distributed Processing in R for Everyone
A Future for R: Parallel and Distributed Processing in R for Everyone
inside-BigData.com
 
Introduction of Online Machine Learning Algorithms
Introduction of Online Machine Learning AlgorithmsIntroduction of Online Machine Learning Algorithms
Introduction of Online Machine Learning Algorithms
Shao-Yen Hung
 
autoTVM
autoTVMautoTVM
autoTVM
Yi-Wen Hung
 
Automatic and Interpretable Machine Learning with H2O and LIME
Automatic and Interpretable Machine Learning with H2O and LIMEAutomatic and Interpretable Machine Learning with H2O and LIME
Automatic and Interpretable Machine Learning with H2O and LIME
Jo-fai Chow
 
Achitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and ExascaleAchitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and Exascale
inside-BigData.com
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUs
Sri Ambati
 
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
NECST Lab @ Politecnico di Milano
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
Impetus Technologies
 

Similar to ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mohak Shah (20)

OPTEX MATHEMATICAL MODELING AND MANAGEMENT SYSTEM
OPTEX MATHEMATICAL MODELING AND MANAGEMENT SYSTEMOPTEX MATHEMATICAL MODELING AND MANAGEMENT SYSTEM
OPTEX MATHEMATICAL MODELING AND MANAGEMENT SYSTEM
 
Crude-Oil Scheduling Technology: moving from simulation to optimization
Crude-Oil Scheduling Technology: moving from simulation to optimizationCrude-Oil Scheduling Technology: moving from simulation to optimization
Crude-Oil Scheduling Technology: moving from simulation to optimization
 
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」
 
OPTEX Mathematical Modeling and Management System
OPTEX Mathematical Modeling and Management SystemOPTEX Mathematical Modeling and Management System
OPTEX Mathematical Modeling and Management System
 
OPTEX - Mathematical Modeling and Management System
OPTEX - Mathematical Modeling and Management System OPTEX - Mathematical Modeling and Management System
OPTEX - Mathematical Modeling and Management System
 
The Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingThe Other HPC: High Productivity Computing
The Other HPC: High Productivity Computing
 
【論文紹介】Relay: A New IR for Machine Learning Frameworks
【論文紹介】Relay: A New IR for Machine Learning Frameworks【論文紹介】Relay: A New IR for Machine Learning Frameworks
【論文紹介】Relay: A New IR for Machine Learning Frameworks
 
UK ATC 2015: Automated Post Processing of Multimodel Optimisation Data
UK ATC 2015: Automated Post Processing of Multimodel Optimisation DataUK ATC 2015: Automated Post Processing of Multimodel Optimisation Data
UK ATC 2015: Automated Post Processing of Multimodel Optimisation Data
 
6. Implementation
6. Implementation6. Implementation
6. Implementation
 
Smooth-and-Dive Accelerator: A Pre-MILP Primal Heuristic applied to Scheduling
Smooth-and-Dive Accelerator: A Pre-MILP Primal Heuristic applied to SchedulingSmooth-and-Dive Accelerator: A Pre-MILP Primal Heuristic applied to Scheduling
Smooth-and-Dive Accelerator: A Pre-MILP Primal Heuristic applied to Scheduling
 
Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...
Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...
Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...
 
A Future for R: Parallel and Distributed Processing in R for Everyone
A Future for R: Parallel and Distributed Processing in R for EveryoneA Future for R: Parallel and Distributed Processing in R for Everyone
A Future for R: Parallel and Distributed Processing in R for Everyone
 
Introduction of Online Machine Learning Algorithms
Introduction of Online Machine Learning AlgorithmsIntroduction of Online Machine Learning Algorithms
Introduction of Online Machine Learning Algorithms
 
autoTVM
autoTVMautoTVM
autoTVM
 
Automatic and Interpretable Machine Learning with H2O and LIME
Automatic and Interpretable Machine Learning with H2O and LIMEAutomatic and Interpretable Machine Learning with H2O and LIME
Automatic and Interpretable Machine Learning with H2O and LIME
 
Achitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and ExascaleAchitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and Exascale
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUs
 
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 

Recently uploaded (20)

Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 

ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mohak Shah

  • 1. Research and Technology Center North America | Sauptik Dhar, Mohak Shah | 5/21/2017 © 2017 Robert Bosch LLC and affiliates. All rights reserved. 1 ADMM based Scalable Machine Learning on Apache Spark Sauptik Dhar, Mohak Shah Bosch AI Research
  • 2. Data is transformational Research and Technology Center North America | Sauptik Dhar, Mohak Shah | 5/21/2017 © 2017 Robert Bosch LLC and affiliates. All rights reserved. 2 Image: https://www.slideshare.net/mongodb/internet-of-things-and-big-data-vision-and-concrete-use-cases
  • 3. Big data, Spark and Status-quo Research and Technology Center North America | Sauptik Dhar, Mohak Shah | 5/21/2017 © 2017 Robert Bosch LLC and affiliates. All rights reserved. 3  Challenges  Learning  (Convex) Optimization  Current solutions (MLLib/ML packages) adopt ‒ SGD: convergence dependent on step-size, conditionality ‒ LBFGS: Adapting to non-differentiable functions non-trivial  ADMM (Alternating Direction Method of Multipliers)  Large problem  (simpler) sub-problems  Guaranteed convergence and robustness to step-size selection  Robust to ill-conditioned problems https://www.simplilearn.com/apache-spark-guide- for-newbies-article
  • 4.  ADMM for Spark  Coverage (ML algorithms)  Go beyond MLLib/ML for Python community  Address sub-optimality leading from internal normalization (MLLib/ML) ADMM advantages Research and Technology Center North America | Sauptik Dhar, Mohak Shah | 5/21/2017 © 2017 Robert Bosch LLC and affiliates. All rights reserved. 4
  • 5.  Generic ADMM based formulation: Coverage (ML algorithms)  Robust Guarantees on Convergence and Accuracy  Python API’s give accessibility to users, developers and designer ADMML Package Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017 © 2017 Robert Bosch LLC and affiliates. All rights reserved. 5 Developers ADMM engine Loss + Regularizer ML Algorithms Designers Users PythonAPIs
  • 6. Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017 © 2017 Robert Bosch LLC and affiliates. All rights reserved. 6  Given: training data where  Solve: where, 1 ( , )N i i i y  x Dx  , ( ) { 1, 1} , ( ) y regression classification       ,, min ( ( ), ) ( )bb L f y Rww x w , ( ) T b f b w x w x Methods Loss Function Regularizer Classification Elastic-net Group Logistic Regression LS-SVM Squared-Hinge SVM Regression Linear Regression Huber Pseudo-Huber { 1, 1}y   y ,( ( ), )bL f yw x ( )R w ( ) 1 (1 ) log(1 ) T i i N y b i N e    w x 2 1 (1 2 ) (1 ( )) N T i ii N y b   w x 2 1 (1 2 ) (max(1 ( ))) N T i ii N y b   w x 2 1 (1 2 ) ( ( )) N T i ii N y b   w x 2 1 2 1 2( ( )) , if ( ) (1 ) ( ) (1 2) , else T T i i i iN i T i i y b y b N y b                w x w x w x 2 2 1 (1 ) ( ( )) N T i ii N y b      w x 2 1 (1 ) 2 D j jj j w w                2g g g G w   0 (L2-reg)  1 (L1-reg)  Generic ML formulation
  • 7. Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017 © 2017 Robert Bosch LLC and affiliates. All rights reserved. 7 Solve smaller sub-problems which can be easily distributed. min ( ( ), ) ( )L f y Rww x w , min ( ( ), ) ( ) s.t.L f y R  ww z x z w z 2: ( ( ), ) ( ) ( 2) ( )L f y R constw x z w z u        1k  21 argmin L( ( ), ) ( 2)k k k f y     w w w x w z u 21 1 argmin ( ) ( 2)k k k R       z z z w z u 1 1 ( )k k k k    u u w z ADMM algorithm Augmented Lagrangian, ADMM Steps at each iteration  w - update:  z - update:  u – update:
  • 8. Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017 © 2017 Robert Bosch LLC and affiliates. All rights reserved. 8  Given with and  Solve  ADMM updates, 1 ( , )N i i i y  x Dx  y 21 2 2 1argmin ( ) 2 21 k T k k i i N y N i        w x w w z u w 2 21 1 21 1 1 = argmin 2 1 (1 ) 2 ( ) ; and (1 ) (1 ) j D jk k k jj j k k j jk ji j j S Sz t t w z u z z z z w u                                       1 1 1 ; 1 1 and ( ) N NT k k i i i iD i i P P I q y N N          q x x x z u 2 2 1 1 min 1 (1 ) + ( ) 2 2 ND Tj j i ij j i y Nw w w x w                   Example (Linear Regression with Elastic-Net)
  • 9. Example (Logistic Regression with Elastic-Net) Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017 © 2017 Robert Bosch LLC and affiliates. All rights reserved. 9  Given with and  Solve  ADMM updates,  Gradient  Hessian 1 ( , )N i i i y  x Dx  { 1, 1}y   2 1 1 min 1 (1 ) + log(1 ) 2 T i i ND yj jj j i e Nw w xw w                   1 2 2 1argmin log(1 ) 1 2 Tyk i i k k N e N i         x w w w w z u 1 ( )(1 ) k k i i iiN y p     w z ux 1 (1 ) T i i i ii I N p p  x x Algorithm 1: Iterative Algorithm for Input: Output : initialize while not converged do return 1k w , ,k k k w z u 1k  w (0) , 0k j wv ( ) ( ) ( ) ( ) ( 1) ( ) ( ) 1 ( ) 1 (1 ) 1 1 1 (1 ) (1 ) ( ) ( ) T j ij i Tj i i i j k k i i i j j j j p e P N j j p p IiN y pi P                     x v q xx x w z u v v q 1 ( )k j  vw
  • 10. Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017 © 2017 Robert Bosch LLC and affiliates. All rights reserved. 10 • rho_initialize • rho_adjustment • w_update • z_update admml.py • mapHv • _mapHv_logistic_binary • _mapHv_sq_hinge_binary • _mapHv_pseudo_huber_regression • _mapHv_huber_regression mappers.py • combineHv • log1pexpx • sigmoid • append_one • rdd_to_df • df_to_rdd • parse_vector • scale_data • uniform_scale • normalize_scale utils.py • _ADMM_ML_Algorithm • ADMMLogistic • ADMML2SVM • ADMMLSSVM • ADMMLeastSquares • ADMMPseudoHuber • ADMMHuber • functionVals • predict mlalgs.py examples.py users designers developers optimization Code Structure (UML diagram)
  • 11. Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017 © 2017 Robert Bosch LLC and affiliates. All rights reserved. 11 System Configuration: – No. of nodes = 6 (Hortonworks 2.7) – No. of cores (per node) = 12 (@ 3.20GHz) – RAM size (per node) = 64 GB. – Hard disk size (per node) = 2 TB. Data : and  Small Data: No. of samples = 10000000 ( ~5GB )  Big Data: No. of samples = 100000000 ( ~50GB ) Spark Configuration: – No. of executors = 15. – Cores per executor = 2. – Executor memory = 10 GB. – Driver memory = 2GB. 20 [ 1,1]U x 1 2 3 4 5 6 7 8 9 10 205( ) 1( ) 5( ) 1( ) ... 2y x x x x x x x x x x x               (0,1)   Experiment (Regression)
  • 12. Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017 © 2017 Robert Bosch LLC and affiliates. All rights reserved. 12 MLLIB (SGD) ML (L-BFGS) ADMM Small Data: No. of samples = 10000000 ( ~5GB ) 1584.6 (9.6) 290.8 (4.4)* 653.5(0.75) Big Data: No. of samples = 100000000 ( ~50GB ) 2597 (8.5) 5811 (7.6) Table. Time comparisons ADMM vs. MLLib (in sec)   ADMM provides competitive computation speeds compared to L-BFGS.  Initial - selection and advanced adaptive strategies can lead to faster convergence. For example: following [4] convergence in 5 iterations ~ 485 sec.  For un-normalized data ML / MLLib internal normalization may lead to sub-optimal solutions. For example: ML / MLLib : 15.07 and ADMML: 12.34  optf  optf  * convergence in 1 iteration convergence in 7 iteration† † Experiment (Regression)
  • 13.  Generic ADMM based formulation: Coverage (ML algorithms)  Robust Guarantees on Convergence and Accuracy  Python API’s give accessibility to users, developers and designer ADMML Package Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017 © 2017 Robert Bosch LLC and affiliates. All rights reserved. 13 Developers ADMM engine Loss + Regularizer ML Algorithms Designers Users PythonAPIs
  • 14. Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017 © 2017 Robert Bosch LLC and affiliates. All rights reserved. 14 Please cite: Dhar S, Yi C, Ramakrishnan N, Shah M. ADMM based scalable machine learning on Spark. IEEE Big Data 2015. https://github.com/DL-Benchmarks/ADMML Contributors: Sauptik Dhar, Naveen Ramakrishnan, Jeff Irion, Jiayi Liu, Unmesh Kurup, Mohak Shah Current Algorithms Classification (Elastic/Group - Regularized) - Logistic Regression - LS-SVM - L2-SVM Regression (Elastic/Group - Regularized) - Least Squares (i.e. Ridge Regression, Lasso, Elastic-net, Group-Lasso etc.) - Huber - Pseudo-Huber Future Roadmap - Initial step-size selection. - Consensus ADMM (coverage for various discontinuous loss functions, like SVMs, alpha- trimmed functions etc.) - Multiclass algorithms (like, multinomial logistic regression, C&S-SVM etc.)