ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mohak Shah

Research and Technology Center North America | Sauptik Dhar, Mohak Shah | 5/21/2017
© 2017 Robert Bosch LLC and affiliates. All rights reserved.
1
ADMM based Scalable Machine Learning on Apache Spark
Sauptik Dhar, Mohak Shah
Bosch AI Research

Data is transformational
2
Image: https://www.slideshare.net/mongodb/internet-of-things-and-big-data-vision-and-concrete-use-cases

Big data, Spark and Status-quo
3
 Challenges
 Learning  (Convex) Optimization
 Current solutions (MLLib/ML packages) adopt
‒ SGD: convergence dependent on step-size, conditionality
‒ LBFGS: Adapting to non-differentiable functions non-trivial
 ADMM (Alternating Direction Method of Multipliers)
 Large problem  (simpler) sub-problems
 Guaranteed convergence and robustness to step-size selection
 Robust to ill-conditioned problems
https://www.simplilearn.com/apache-spark-guide-
for-newbies-article

 ADMM for Spark
 Coverage (ML algorithms)
 Go beyond MLLib/ML for Python community
 Address sub-optimality leading from internal normalization (MLLib/ML)
ADMM advantages
4

 Generic ADMM based formulation: Coverage (ML algorithms)
 Robust Guarantees on Convergence and Accuracy
 Python API’s give accessibility to users, developers and designer
ADMML Package
Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017
5
Developers
ADMM
engine
Loss +
Regularizer
ML
Algorithms
Designers
Users
PythonAPIs

6
 Given: training data where
 Solve: where,
1
( , )N
i i i
y 
x Dx 
, ( )
{ 1, 1} , ( )
y
regression
classification



 

,,
min ( ( ), ) ( )bb
L f y Rww
x w
,
( ) T
b
f b w
x w x
Methods Loss Function Regularizer
Classification Elastic-net
Group
Logistic Regression
LS-SVM
Squared-Hinge SVM
Regression
Linear Regression
Huber
Pseudo-Huber
{ 1, 1}y  
y
,( ( ), )bL f yw x ( )R w
( )
1
(1 ) log(1 )
T
i i
N y b
i
N e 

 w x
2
1
(1 2 ) (1 ( ))
N T
i ii
N y b
  w x
2
1
(1 2 ) (max(1 ( )))
N T
i ii
N y b
  w x
2
1
(1 2 ) ( ( ))
N T
i ii
N y b
  w x
2
1 2
1 2( ( )) , if ( )
(1 )
( ) (1 2) , else
T T
i i i iN
i T
i i
y b y b
N
y b

 

     

  

w x w x
w x
2 2
1
(1 ) ( ( ))
N T
i ii
N y b 
    w x
2
1
(1 )
2
D j
jj
j
w
w  
 
 
 
 
 
  
2g g
g G
w


0 (L2-reg) 
1 (L1-reg) 
Generic ML formulation

7
Solve smaller sub-problems which can be easily distributed.
min ( ( ), ) ( )L f y Rww
x w
,
min ( ( ), ) ( ) s.t.L f y R  ww z
x z w z
2: ( ( ), ) ( ) ( 2) ( )L f y R constw
x z w z u
      
1k 
21
argmin L( ( ), ) ( 2)k k k
f y 
   w
w
w x w z u
21 1
argmin ( ) ( 2)k k k
R  
   
z
z z w z u
1 1
( )k k k k 
  u u w z
ADMM algorithm
Augmented Lagrangian,
ADMM Steps at each iteration
 w - update:
 z - update:
 u – update:

8
 Given with and
 Solve
 ADMM updates,
1
( , )N
i i i
y 
x Dx  y
21 2
2
1argmin ( )
2 21
k T k k
i i
N
y
N i
     

w x w w z u
w
2
21 1
21
1
1 =
argmin
2
1
(1 )
2
( )
; and (1 )
(1 )
j
D jk k k
jj
j
k k
j jk
ji j
j
S
Sz t
t
w z u
z
z
z z
w u

   

   
  
 
    
 
 



  

 

  

1
1 1
;
1 1
and ( )
N NT k k
i i i iD
i i
P P I q y
N N
 
 
     q x x x z u
2
2
1 1
min
1
(1 ) + ( )
2 2
ND Tj
j i ij
j i
y
Nw
w
w x w   
 
 
 
  
 
   
Example (Linear Regression with Elastic-Net)

Example (Logistic Regression with Elastic-Net)
9
 Given with and
 Solve
 ADMM updates,
 Gradient
 Hessian
1
( , )N
i i i
y 
x Dx  { 1, 1}y  
2
1 1
min
1
(1 ) + log(1 )
2
T
i i
ND yj
jj
j i
e
Nw
w xw
w   
 
 
 
  
 
   
1
2
2
1argmin log(1 )
1
2
Tyk i i
k k
N
e
N i

  

  
x w
w
w
w z u
1 ( )(1 ) k k
i i iiN
y p     w z ux
1 (1 ) T
i i i ii I
N
p p  x x
Algorithm 1: Iterative Algorithm for
Input:
Output :
initialize
while not converged do
return
1k w
, ,k k k
w z u
1k 
w
(0)
, 0k
j wv
( )
( )
( )
( )
( 1) ( ) ( ) 1 ( )
1 (1 )
1
1
1 (1 )
(1 ) ( )
( )
T j
ij
i
Tj
i i i
j k k
i i i
j j j j
p e
P
N
j j
p p IiN
y pi
P



 
 


 
 
    
 
x v
q
xx
x w z u
v v q
1 ( )k j
 vw

10
• rho_initialize
• rho_adjustment
• w_update
• z_update
admml.py
• mapHv
• _mapHv_logistic_binary
• _mapHv_sq_hinge_binary
• _mapHv_pseudo_huber_regression
• _mapHv_huber_regression
mappers.py
• combineHv
• log1pexpx
• sigmoid
• append_one
• rdd_to_df
• df_to_rdd
• parse_vector
• scale_data
• uniform_scale
• normalize_scale
utils.py
• _ADMM_ML_Algorithm
• ADMMLogistic
• ADMML2SVM
• ADMMLSSVM
• ADMMLeastSquares
• ADMMPseudoHuber
• ADMMHuber
• functionVals
• predict
mlalgs.py
examples.py
users
designers
developers
optimization
Code Structure (UML diagram)

11
System Configuration:
– No. of nodes = 6 (Hortonworks 2.7)
– No. of cores (per node) = 12 (@ 3.20GHz)
– RAM size (per node) = 64 GB.
– Hard disk size (per node) = 2 TB.
Data : and
 Small Data: No. of samples = 10000000 ( ~5GB )
 Big Data: No. of samples = 100000000 ( ~50GB )
Spark Configuration:
– No. of executors = 15.
– Cores per executor = 2.
– Executor memory = 10 GB.
– Driver memory = 2GB.
20
[ 1,1]U x
1 2 3 4 5 6 7 8 9 10 205( ) 1( ) 5( ) 1( ) ... 2y x x x x x x x x x x x              
(0,1)  
Experiment (Regression)

12
MLLIB (SGD) ML (L-BFGS) ADMM
Small Data: No. of samples = 10000000 ( ~5GB )
1584.6 (9.6) 290.8 (4.4)* 653.5(0.75)
Big Data: No. of samples = 100000000 ( ~50GB )
2597 (8.5) 5811 (7.6)
Table. Time comparisons ADMM vs. MLLib (in sec)

 ADMM provides competitive computation speeds compared to L-BFGS.
 Initial - selection and advanced adaptive strategies can lead to faster convergence.
For example: following [4] convergence in 5 iterations ~ 485 sec.
 For un-normalized data ML / MLLib internal normalization may lead to sub-optimal
solutions. For example: ML / MLLib : 15.07 and ADMML: 12.34

optf  optf 
* convergence in 1 iteration
convergence in 7 iteration†
†
Experiment (Regression)

 Generic ADMM based formulation: Coverage (ML algorithms)
 Robust Guarantees on Convergence and Accuracy
 Python API’s give accessibility to users, developers and designer
ADMML Package
13
Developers
ADMM
engine
Loss +
Regularizer
ML
Algorithms
Designers
Users
PythonAPIs

14
Please cite: Dhar S, Yi C, Ramakrishnan N, Shah M. ADMM based scalable machine learning on Spark. IEEE Big Data 2015.
https://github.com/DL-Benchmarks/ADMML
Contributors: Sauptik Dhar, Naveen Ramakrishnan, Jeff Irion, Jiayi Liu, Unmesh Kurup, Mohak Shah
Current Algorithms
Classification (Elastic/Group - Regularized)
- Logistic Regression
- LS-SVM
- L2-SVM
Regression (Elastic/Group - Regularized)
- Least Squares (i.e. Ridge Regression, Lasso, Elastic-net, Group-Lasso etc.)
- Huber
- Pseudo-Huber
Future Roadmap
- Initial step-size selection.
- Consensus ADMM (coverage for various discontinuous loss functions, like SVMs, alpha- trimmed functions etc.)
- Multiclass algorithms (like, multinomial logistic regression, C&S-SVM etc.)

ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mohak Shah

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mohak Shah

Similar to ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mohak Shah (20)

More from Databricks

More from Databricks (20)

Recently uploaded

Recently uploaded (20)

ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mohak Shah