SlideShare a Scribd company logo
Distributed Optimization: An Overview and Some
Theoretical Results
Zhengyuan Zhu
Joint work with Xin Zhang and Jia Kevin Liu
Department of Statistics and
Center for Survey Statistics Methodology
Iowa State University
2/13/2018
Introduction
1 Introduction
2 Asynchronous Stochastic Gradient Descent
3 Convergence Analysis
4 Numerical Study
5 Conclusion
6 Preliminary Work
Zhang & Liu & Zhu Asyn-SGD 2 / 35
Introduction
Introduction
Connection to remote senesing WK:
My research interests: spatial statistics, spatial sampling design,
survey statistics.
National Resouce Inventory survey, remote sensing data help improve
survey estimates of agricultural statistics and natural resources.
Massive spatial-temporal imputation (gap-filling), computationally
effieicent functional approach, application to Landsat and MODIS
data.
Massive Imputation for hyper-spectral satellite data (OCO2), data
sparse in space-time
Unmixing problem for the SMOS and OCO2 data.
Original title: Asynchronous Stochastic Gradient Descent with
unbounded delay on nonconvex problem
Actual title: Distributed Optimization: An Overview and Some
Theoretical Results
Zhang & Liu & Zhu Asyn-SGD 3 / 35
Introduction
Distributed Computation
Problem: Datasets are becoming extremely large (features and sample
size), and they may be collected and/or stored in a distributed system.
With Moore’s law coming to an end in a few years (2025?), we can
no longer rely on hardware improvements anymore.
Distributed computation study how to divide a ’big’ problem into
several small parts and allocate these parts to many computers, then
combine ’local’ results to obtain the final result;
Bertsekas and Tsitsiklis (1999) provided a general frame work for
parallele and distributed computation.
Micro chip level: multi-core CPU/GPU
Macro data center level: networked cloud computing
Zhang & Liu & Zhu Asyn-SGD 4 / 35
Introduction
Workflow of Distributed Computing
Figure: Distributed Computing Workflow
Zhang & Liu & Zhu Asyn-SGD 5 / 35
Introduction
Some issues relevant to theory of data system
Centralized vs local computation: local computation reduces data
transfer costs, and have less issue with data privacy and
confidentianity.
Synchronous vs asynchronous methods: synchronization could involve
significant communication overhead; server variability may lead to
inefficiency; asyn methods may have convergence issue depending on
the delay function and the algorithm.
Data homogeneity vs heteroscedasticity
Homogeneous: Databases Ξ1, ..., Ξk are shared, i.i.d or stationary. The
objective function computed at each local machine is unbiased;
Heterogeneous: Databases Ξ1, ..., Ξk are not i.i.d or stationary, i.e.,
they could be from different sources or collection with differen
methods. The objective function in each local machine may be biased;
Trade-off in computation, communication, and inference precision.
Zhang & Liu & Zhu Asyn-SGD 6 / 35
Introduction
Distributed Optimization Algorithms
Some of the well-studied algorithms for distributed optimization:
Stochastic Gradient Descent (SGD) Bottou (1998, 2011) theory and
application to large scale machine learning; Recht et. al. (2011)
Asynchoronous SGD algorithm HOGWILD!; Lian et. al. (2015),
convergence rate for non-convex problem with bounded delay;
Alternating Direction Method of Multiplier (ADMM) Gabay and B.
Mercier (1976), Boyd et.al. (2010), Chang et. al. (2015), Hong
(2017)
Distributed quasi-newton methods for faster rates of convergence
Eisen et. al. (2017) uses gradient to estimate the curvature,
Mansoori (2017) used a matrix splitting technique to compute
Newton direction in a distributed way.
Zhang & Liu & Zhu Asyn-SGD 7 / 35
Introduction
Applications in ML
Distributed optimization, and in particular distributed SGD, has become a
very popular way to speedup machine learning algorithms. Some successful
examples:
In ?, parallel system is used to train SVM, which could save
computational time and avoid out of memory;
? designed an parallelizable method, called CCD++, for matrix
factorization in large-scale recommender systems.;
Distributed deep learning: ? proposed two distributed algorithms,
Downpour SGD and Sandblaster, to train DNN; Abadi et. al. (2016)
introduced TensorFlow for large scale machine learning
Zhang & Liu & Zhu Asyn-SGD 8 / 35
Asynchronous Stochastic Gradient Descent
1 Introduction
2 Asynchronous Stochastic Gradient Descent
3 Convergence Analysis
4 Numerical Study
5 Conclusion
6 Preliminary Work
Zhang & Liu & Zhu Asyn-SGD 9 / 35
Asynchronous Stochastic Gradient Descent
Overview for Stochastic Gradient Descent (SGD)
Our work focuses on the feasiblity of a distributed asynchronous
optimization algorithm, Asynchronous Stochastic Gradient Descent, under
unbounded delays.
Also referred to as stochastic approximation in the literature;
First introduced in ? and ?;
The idea: simply use a noisy unbiased gradient to replace the
unknown true gradient in the gradient descent algorithm;
The stochastic gradient descent works as follows. To solve the
following optimization:
min
x∈Rd
f (x) = E[F(x; ξ)], (1)
Let xk+1 = xk − γkG(xk), where xk presents the parameter in k-th
iteration, G(xk) is a noisy unbiased gradient based on xk;
Zhang & Liu & Zhu Asyn-SGD 10 / 35
Asynchronous Stochastic Gradient Descent
Asynchronous Stochastic Gradient Descent (Asyn-SGD)
Asyn-SGD is a extension framework based on SGD. It could be
implemented as following:
For workers,
compute gradients G with current parameter x and random sample ξ;
report gradients to server;
For server,
collect the certain amount (M) of gradients from workers;
update current parameter with these gradients;
Zhang & Liu & Zhu Asyn-SGD 11 / 35
Asynchronous Stochastic Gradient Descent
Asynchronous Stochastic Gradient Descent (Asyn-SGD)
Algorithm 1 Asynchronous Stochastic Gradient Descent (Asyn-SGD)
Require: Database Ξ, step size {γk}, initial point x0, batch size M;
Ensure: xk;
At parameter server:
1: for i= 1, 2 ,..., k do
2: Collecting M gradients G(xi−τi,m
; ξi,m) from workers;
3: Updating xi+1 = xi − γi
M
m=1 G(xi−τi,m
; ξi,m);
4: end for
At workers:
5: Receive current parameter x∗ from parameter server;
6: Randomly select sample ξ from database;
7: Compute stochastic gradient G(x∗; ξ) and report it to server;
Here τi,m is the delay in i-th iteration and m-th batch.
Zhang & Liu & Zhu Asyn-SGD 12 / 35
Asynchronous Stochastic Gradient Descent
Asynchronous Stochastic Gradient Descent with
Incremental batch size (Asyn-SGDI)
A modified version of Asyn-SGD is to increase the batch size when
determining the undate direction. With large batch size, the variance of
the gradient noise would decrease, which might lead to a faster result.
Algorithm 2 Asyn SGD with increment batch size (Asyn-SGDI)
Require: Database Ξ, step size {γk}, initial point x0, increasing batch
size{Mi = ni M};
Ensure: xk;
At parameter server:
1: for i= 1, 2 ,..., k do
2: Collecting M gradients G(xi−τi,m
; ξi,m) from workers;
3: Updating xi+1 = xi − γi
ni
Mi
m=1 G(xi−τi,m
; ξi,m);
4: end for
Zhang & Liu & Zhu Asyn-SGD 13 / 35
Convergence Analysis
1 Introduction
2 Asynchronous Stochastic Gradient Descent
3 Convergence Analysis
4 Numerical Study
5 Conclusion
6 Preliminary Work
Zhang & Liu & Zhu Asyn-SGD 14 / 35
Convergence Analysis
General Assumption
Assumption
(Lower bounded objective function) For objective function f , there exists a
optimal point x∗, s.t. ∀x, f (x) ≥ f (x∗).
Assumption
(Lipschitz Continuous Gradient) The objective function f satisfies
f (x) − f (y) ≤ L x − y , ∀x, y.
Assumption
(Unbiased graidents with bounded variance) The stochastic gradient
G(x; ξ) is unbiased with bounded variance, that is to say:
1 E(G(x; ξ)) = f (x), ∀x, ξ;
2 E( (G(x; ξ)) − f (x) 2) ≤ σ2, ∀x;
Zhang & Liu & Zhu Asyn-SGD 15 / 35
Convergence Analysis
Restriction for probabilities of delay variables
Assumption
There exists a sequence {ci }, such that
cj+1 +
γkML2
2
k
i=j
iP(τk = i) ≤ cj , ∀ j, k, (2)
where τk presents the maximun delay in k-th iteration: τk = maxm τk,m
and γk is the step size.
Here, {ci } is the weight in the asynchronicity error.
Zhang & Liu & Zhu Asyn-SGD 16 / 35
Convergence Analysis
Convergence Analysis for Asyn-SGD
Now we can give the convergence result for Asyn-SGD:
Theorem
Assume above assumptions hold and the stepsize {γk} satisfies
1 γk ≤ 1
2Mc1+ML, ∀k;
2 γk is unsummable but γ2
k is summable;
where M is fixed batch size, L is Lipschitz constant in Assumption 2 and
c1 is from the sequence in Assumption 4, then we have
E[ ∞
k=1 γk f (xk) 2] < ∞, and E[ f (xk) 2] → 0.
Corollary
If the step size γk = O(1/(K1/2log(K)), ∀ > 0, then the asymptotic
convergence rate for Asyn-SGD is
E( f (xk) 2
) = o(1/
√
K).
Zhang & Liu & Zhu Asyn-SGD 17 / 35
Convergence Analysis
Convergence Analysis for Asyn-SGD with incremental
batch size
Similarly, we can get the convergence result for Asyn-SGD with
incremental batch size:
Theorem
Assume the above assumptions hold and the size of database is infinite,
set batch size {Mk := nkM} satisfying that ∞
k=1
1
nk
< ∞ and step size
{γk} satisfying that γk ≤ 1
2M1c1+M1L, ∀k, then we have
E[ ∞
k=1 γk f (xk) 2] < ∞, and E[ f (xk) 2] → 0.
Corollary
For > 0, and nk = o( 1
k1+ ), with fixed stepsize satisfying the requirement
in Theorem 3.2, we have E( f (xk) 2) = o(1/K).
Zhang & Liu & Zhu Asyn-SGD 18 / 35
Convergence Analysis
Bounded Delay Variable
First we consider a simple case, in which the delay variables are bounded.
Corollary
(Bounded Delay Variable) If the delay variables {τk} are bounded, then
{ci } exists.
This is a very common case, as discussed in ?, ? etc.. This scenario is
reasonable as long as all the worker runs with evenly speed.
Zhang & Liu & Zhu Asyn-SGD 19 / 35
Convergence Analysis
I.I.D. Delay Variable
Second case is to assume that for the sequence of delays {τk} is I.I.D. and
the commmon distribution has finite second moment. This scenoria is
rational when the iteration number is very large and the system has
reached the stationarity.
Corollary
(I.I.D. Delay Variale) If the probability series {τk} is I.I.D as τ and τ has
finite second moment, then {ci } exists.
Zhang & Liu & Zhu Asyn-SGD 20 / 35
Convergence Analysis
Uniform Upper Bound
Third case: the delay variables can have different distributions as long as
uniformly they could be bounded by a second moment finite sequence.
This is a more general case.
Corollary
(Uniformly Upper Bounded Probability Series) Consider the probability
series of delay variables {τk}∞
k=1, if there exists a series {ai }∞
i=1 s.t.
1
∞
i=1 i2ai < ∞;
2 P(τk = i) ≤ ai , ∀k;
then {ci } exists.
Zhang & Liu & Zhu Asyn-SGD 21 / 35
Numerical Study
1 Introduction
2 Asynchronous Stochastic Gradient Descent
3 Convergence Analysis
4 Numerical Study
5 Conclusion
6 Preliminary Work
Zhang & Liu & Zhu Asyn-SGD 22 / 35
Numerical Study
Example 1: MLE for MVN Covariance Matrix
First, we consider maximum likelihood estimation for the covariance matrix
in multivariate normal distribution. This problem can be formulated as:
min
Σ∈Rd×d
ln |Σ| +
1
n
n
i=1
(xi − µ)T
Σ−1
(xi − µ) (3)
subjet to Σ 0
where Σ is the covariance matrix, µ is the mean vector and xi are samples.
The gradient for this problem has been derived in ?.
Zhang & Liu & Zhu Asyn-SGD 23 / 35
Numerical Study
Example 1: MLE for MVN Covariance Matrix
We randomly generate data from multivariate normal distribuion with
mean as (0, 0) and covariance matrix as (10, 3; 3, 5).
(a) is with bounded delay variable and the upper bound is 50; (b)
uses poisson delay with parameter 30; in (c), we simulates a virtual
system and the working time t for each worker follows the same
model, t ∼ Exp(λ) and λ ∼ Gamma(2, 1).
The green solid line is the convergence result for Asyn-SGD with
O(1/k) step size, the orange dotted line is the convergence result for
Asyn-SGD with O(1/(K1/2log(K))) step size and the purple dashed
line is the convergence result for Asyn-SGDI.
Zhang & Liu & Zhu Asyn-SGD 24 / 35
Numerical Study
Example 1: MLE for MVN Covariance Matrix
(a) bounded by 50 (b) Poi(50) (c) System delay
Figure: Convergence for Asyn-SGD and Asyn-SGDI
In the three cases, the l2 norm of gradient will go to zero while Asyn-SGDI
is fastest and Asyn-SGD with stepsize O(1/k) is slowest.
Zhang & Liu & Zhu Asyn-SGD 25 / 35
Numerical Study
Example 1: MLE for MVN Covariance Matrix
We consider an extreme case where the delay variable is from discrete
uniform distribution (evenly probability).
Figure: A counter example when Asyn-SGD fails
Zhang & Liu & Zhu Asyn-SGD 26 / 35
Numerical Study
Example 1: MLE for MVN Covariance Matrix
We also compare the computation time of Syn-SGD, Asyn-SGD and
Asyn-SGDI on this problem. The step size for Syn-SGD and Asyn-SGD is
O(1/k) and the step size for Asyn-SGD is constant.
Figure: Computation time for three algorithms: the red line is for Syn-SGD; blue dotted line is
for Asyn-SGD; black dotdash line is for Asyn-SGDI
Zhang & Liu & Zhu Asyn-SGD 27 / 35
Numerical Study
Example 2: Low Rank Matrix Completion
This problem is to find the lowest rank matrix X which matches the
expectation of observed symmetric matrices, E[A]. It could be
mathematically formulated as following:
min E[ A − YY T 2
F ] (4)
subjet to Y ∈ Rn×p
where X = YY T . Using SGD to solve this problem has been discussed in
many works, including ? and ? etc.
Zhang & Liu & Zhu Asyn-SGD 28 / 35
Numerical Study
Example 2: Low Rank Matrix Completion
(a) bounded by 50 (b) Poi(30) (c) System delay
Figure: Convergence for Asyn-SGD and Asyn-SGDI
Zhang & Liu & Zhu Asyn-SGD 29 / 35
Conclusion
1 Introduction
2 Asynchronous Stochastic Gradient Descent
3 Convergence Analysis
4 Numerical Study
5 Conclusion
6 Preliminary Work
Zhang & Liu & Zhu Asyn-SGD 30 / 35
Conclusion
Conclusion
In our work, we analyze the convergence of Asyn-SGD on nonconvex
optimization problem with unbounded delay;
We propose a new Lyapurov function, which consists of classical error
and asynchronicity error;
A sufficient condition for delay variable is given to guarantee the
convergence of Asyn-SGD;
With proper stepsize, the asymptotic convergence rate for Asyn-SGD
is o(1/
√
k) and that for Asyn-SGDI is o(1/k).
This algorithm requires local gradient to be unbiased. For
heterogeneous case, we are working on an ADMM based asynchonous
solution.
Zhang & Liu & Zhu Asyn-SGD 31 / 35
Preliminary Work
1 Introduction
2 Asynchronous Stochastic Gradient Descent
3 Convergence Analysis
4 Numerical Study
5 Conclusion
6 Preliminary Work
Zhang & Liu & Zhu Asyn-SGD 32 / 35
Preliminary Work
Distributed Computing and ADMM
Consider following problem:
Data are distributed in several machine, let’s say Ξ1, Ξ2, ..., Ξk;
The objective function is
min
x
L(x; Ξ1, Ξ2, ..., Ξk) =
k
i=1
Li (x; Ξi ); (5)
Communication cost is too expensive so each machine could only
”see” local objection function Li (x; Ξi );
Data is biased, which means xi = arg min Li (x; Ξi ) is not consistent.
Zhang & Liu & Zhu Asyn-SGD 33 / 35
Preliminary Work
Problem Formulation
Reformuating the problem:
min
x
L(x; Ξ1, Ξ2, ..., Ξk) =
k
i=1
Li (x; Ξi ) (6)
⇒ min
x
k
i=1
Li (xi ; Ξi ), s.t.xi = x, ∀ i (7)
The corresponding augmented Lagrangian function:
L({xi }, x; y) =
k
i=1
Li (xi ; Ξi ) +
k
i=1
yk, xk − x +
k
i=1
ρi
2
xi − x 2
; (8)
Thus, x and yk could be updated in central server; xk could be
updated in loacl machine. Only x, xk and yk are transported between
the central server and local machines.
Zhang & Liu & Zhu Asyn-SGD 34 / 35
Preliminary Work
ADMM based parallel computing framework
Algorithm 3 ADMM based parallel computing framework
Require: Database {Ξi }, {ρi }, initial point;
Ensure: xT ;
At parameter server:
1: for t= 1, 2 ,..., T do
2: Collect xk from local machines;
3: Update xt+1: xt+1 = arg minx
K
i=1 yt
i , xt
i − x + K
i=1
ρi
2 xi − x 2;
4: Update yt+1
k = yt
k + ρk(xt+1
k − xt+1);
5: end for At local machine i:
6: Receive current yt+1
i and xt+1 from parameter server;
7: Update
xt+1
i = arg min
xi
Li (x; Ξi ) +
K
i=1
yt+1
i , xi − xt+1
+
ρi
2
xi − xt+1 2
;
Zhang & Liu & Zhu Asyn-SGD 35 / 35

More Related Content

What's hot

Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...zukun
 
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)neeraj7svp
 
Clustering techniques
Clustering techniquesClustering techniques
Clustering techniques
talktoharry
 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradient
Fabian Pedregosa
 
Neural Networks: Model Building Through Linear Regression
Neural Networks: Model Building Through Linear RegressionNeural Networks: Model Building Through Linear Regression
Neural Networks: Model Building Through Linear Regression
Mostafa G. M. Mostafa
 
Gradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsGradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation Graphs
Yoonho Lee
 
Analysing and combining partial problem solutions for properly informed heuri...
Analysing and combining partial problem solutions for properly informed heuri...Analysing and combining partial problem solutions for properly informed heuri...
Analysing and combining partial problem solutions for properly informed heuri...
Alexander Decker
 
Graph Based Clustering
Graph Based ClusteringGraph Based Clustering
Graph Based Clustering
SSA KPI
 
A General Framework for Enhancing Prediction Performance on Time Series Data
A General Framework for Enhancing Prediction Performance on Time Series DataA General Framework for Enhancing Prediction Performance on Time Series Data
A General Framework for Enhancing Prediction Performance on Time Series Data
HopeBay Technologies, Inc.
 
Neural Networks: Support Vector machines
Neural Networks: Support Vector machinesNeural Networks: Support Vector machines
Neural Networks: Support Vector machines
Mostafa G. M. Mostafa
 
Recursive algorithms
Recursive algorithmsRecursive algorithms
Recursive algorithms
subhashchandra197
 
Probabilistic PCA, EM, and more
Probabilistic PCA, EM, and moreProbabilistic PCA, EM, and more
Probabilistic PCA, EM, and more
hsharmasshare
 
010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process
Ha Phuong
 
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsMethods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
Ryan B Harvey, CSDP, CSM
 
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ijfls
 
2019 GDRR: Blockchain Data Analytics - ChainNet: Learning on Blockchain Graph...
2019 GDRR: Blockchain Data Analytics - ChainNet: Learning on Blockchain Graph...2019 GDRR: Blockchain Data Analytics - ChainNet: Learning on Blockchain Graph...
2019 GDRR: Blockchain Data Analytics - ChainNet: Learning on Blockchain Graph...
The Statistical and Applied Mathematical Sciences Institute
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
Partha Sarathi Kar
 
2019 GDRR: Blockchain Data Analytics - Modeling Cryptocurrency Markets with T...
2019 GDRR: Blockchain Data Analytics - Modeling Cryptocurrency Markets with T...2019 GDRR: Blockchain Data Analytics - Modeling Cryptocurrency Markets with T...
2019 GDRR: Blockchain Data Analytics - Modeling Cryptocurrency Markets with T...
The Statistical and Applied Mathematical Sciences Institute
 

What's hot (19)

Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...
 
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
 
Clustering techniques
Clustering techniquesClustering techniques
Clustering techniques
 
50120140503004
5012014050300450120140503004
50120140503004
 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradient
 
Neural Networks: Model Building Through Linear Regression
Neural Networks: Model Building Through Linear RegressionNeural Networks: Model Building Through Linear Regression
Neural Networks: Model Building Through Linear Regression
 
Gradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsGradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation Graphs
 
Analysing and combining partial problem solutions for properly informed heuri...
Analysing and combining partial problem solutions for properly informed heuri...Analysing and combining partial problem solutions for properly informed heuri...
Analysing and combining partial problem solutions for properly informed heuri...
 
Graph Based Clustering
Graph Based ClusteringGraph Based Clustering
Graph Based Clustering
 
A General Framework for Enhancing Prediction Performance on Time Series Data
A General Framework for Enhancing Prediction Performance on Time Series DataA General Framework for Enhancing Prediction Performance on Time Series Data
A General Framework for Enhancing Prediction Performance on Time Series Data
 
Neural Networks: Support Vector machines
Neural Networks: Support Vector machinesNeural Networks: Support Vector machines
Neural Networks: Support Vector machines
 
Recursive algorithms
Recursive algorithmsRecursive algorithms
Recursive algorithms
 
Probabilistic PCA, EM, and more
Probabilistic PCA, EM, and moreProbabilistic PCA, EM, and more
Probabilistic PCA, EM, and more
 
010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process
 
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsMethods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
 
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
 
2019 GDRR: Blockchain Data Analytics - ChainNet: Learning on Blockchain Graph...
2019 GDRR: Blockchain Data Analytics - ChainNet: Learning on Blockchain Graph...2019 GDRR: Blockchain Data Analytics - ChainNet: Learning on Blockchain Graph...
2019 GDRR: Blockchain Data Analytics - ChainNet: Learning on Blockchain Graph...
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
2019 GDRR: Blockchain Data Analytics - Modeling Cryptocurrency Markets with T...
2019 GDRR: Blockchain Data Analytics - Modeling Cryptocurrency Markets with T...2019 GDRR: Blockchain Data Analytics - Modeling Cryptocurrency Markets with T...
2019 GDRR: Blockchain Data Analytics - Modeling Cryptocurrency Markets with T...
 

Similar to CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Systems: An Overview and Some Theoretical Results - Zhengyuan Zhu, Feb 13, 2018

MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
The Statistical and Applied Mathematical Sciences Institute
 
The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...
SSA KPI
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learning
Ryo Iwaki
 
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...Adam Fausett
 
Design and Implementation of Variable Radius Sphere Decoding Algorithm
Design and Implementation of Variable Radius Sphere Decoding AlgorithmDesign and Implementation of Variable Radius Sphere Decoding Algorithm
Design and Implementation of Variable Radius Sphere Decoding Algorithm
csandit
 
Developing fast low-rank tensor methods for solving PDEs with uncertain coef...
Developing fast  low-rank tensor methods for solving PDEs with uncertain coef...Developing fast  low-rank tensor methods for solving PDEs with uncertain coef...
Developing fast low-rank tensor methods for solving PDEs with uncertain coef...
Alexander Litvinenko
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
csandit
 
Scalable Constrained Spectral Clustering
Scalable Constrained Spectral ClusteringScalable Constrained Spectral Clustering
Scalable Constrained Spectral Clustering
1crore projects
 
Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...
Willy Marroquin (WillyDevNET)
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
Christian Robert
 
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Fabian Pedregosa
 
block-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfblock-mdp-masters-defense.pdf
block-mdp-masters-defense.pdf
Junghyun Lee
 
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Atsushi Nitanda
 
A nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formulaA nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formula
Alexander Litvinenko
 
15.sp.dictionary_draft.pdf
15.sp.dictionary_draft.pdf15.sp.dictionary_draft.pdf
15.sp.dictionary_draft.pdf
AllanKelvinSales
 
Icitam2019 2020 book_chapter
Icitam2019 2020 book_chapterIcitam2019 2020 book_chapter
Icitam2019 2020 book_chapter
Ban Bang
 
One Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationOne Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical Computation
Work-Bench
 
USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...
USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...
USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...
IJCSEA Journal
 
How to Decide the Best Fuzzy Model in ANFIS
How to Decide the Best Fuzzy Model in ANFIS How to Decide the Best Fuzzy Model in ANFIS

Similar to CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Systems: An Overview and Some Theoretical Results - Zhengyuan Zhu, Feb 13, 2018 (20)

MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
 
The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learning
 
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
 
Design and Implementation of Variable Radius Sphere Decoding Algorithm
Design and Implementation of Variable Radius Sphere Decoding AlgorithmDesign and Implementation of Variable Radius Sphere Decoding Algorithm
Design and Implementation of Variable Radius Sphere Decoding Algorithm
 
Developing fast low-rank tensor methods for solving PDEs with uncertain coef...
Developing fast  low-rank tensor methods for solving PDEs with uncertain coef...Developing fast  low-rank tensor methods for solving PDEs with uncertain coef...
Developing fast low-rank tensor methods for solving PDEs with uncertain coef...
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
 
Scalable Constrained Spectral Clustering
Scalable Constrained Spectral ClusteringScalable Constrained Spectral Clustering
Scalable Constrained Spectral Clustering
 
Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and Algorithms
 
block-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfblock-mdp-masters-defense.pdf
block-mdp-masters-defense.pdf
 
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
 
A nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formulaA nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formula
 
AROPUB-IJPGE-14-30
AROPUB-IJPGE-14-30AROPUB-IJPGE-14-30
AROPUB-IJPGE-14-30
 
15.sp.dictionary_draft.pdf
15.sp.dictionary_draft.pdf15.sp.dictionary_draft.pdf
15.sp.dictionary_draft.pdf
 
Icitam2019 2020 book_chapter
Icitam2019 2020 book_chapterIcitam2019 2020 book_chapter
Icitam2019 2020 book_chapter
 
One Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationOne Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical Computation
 
USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...
USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...
USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...
 
How to Decide the Best Fuzzy Model in ANFIS
How to Decide the Best Fuzzy Model in ANFIS How to Decide the Best Fuzzy Model in ANFIS
How to Decide the Best Fuzzy Model in ANFIS
 

More from The Statistical and Applied Mathematical Sciences Institute

Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
The Statistical and Applied Mathematical Sciences Institute
 
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
The Statistical and Applied Mathematical Sciences Institute
 
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
The Statistical and Applied Mathematical Sciences Institute
 
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
The Statistical and Applied Mathematical Sciences Institute
 
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
The Statistical and Applied Mathematical Sciences Institute
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
The Statistical and Applied Mathematical Sciences Institute
 

More from The Statistical and Applied Mathematical Sciences Institute (20)

Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
 
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
 
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
 
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
 
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
 
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
 
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
 
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
 
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
 
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
 
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
 
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
 
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
 
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
 
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
 
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
 
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 

Recently uploaded

Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
ArianaBusciglio
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Landownership in the Philippines under the Americans-2-pptx.pptx
Landownership in the Philippines under the Americans-2-pptx.pptxLandownership in the Philippines under the Americans-2-pptx.pptx
Landownership in the Philippines under the Americans-2-pptx.pptx
JezreelCabil2
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
NelTorrente
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
Celine George
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
Scholarhat
 

Recently uploaded (20)

Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
Landownership in the Philippines under the Americans-2-pptx.pptx
Landownership in the Philippines under the Americans-2-pptx.pptxLandownership in the Philippines under the Americans-2-pptx.pptx
Landownership in the Philippines under the Americans-2-pptx.pptx
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
 

CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Systems: An Overview and Some Theoretical Results - Zhengyuan Zhu, Feb 13, 2018

  • 1. Distributed Optimization: An Overview and Some Theoretical Results Zhengyuan Zhu Joint work with Xin Zhang and Jia Kevin Liu Department of Statistics and Center for Survey Statistics Methodology Iowa State University 2/13/2018
  • 2. Introduction 1 Introduction 2 Asynchronous Stochastic Gradient Descent 3 Convergence Analysis 4 Numerical Study 5 Conclusion 6 Preliminary Work Zhang & Liu & Zhu Asyn-SGD 2 / 35
  • 3. Introduction Introduction Connection to remote senesing WK: My research interests: spatial statistics, spatial sampling design, survey statistics. National Resouce Inventory survey, remote sensing data help improve survey estimates of agricultural statistics and natural resources. Massive spatial-temporal imputation (gap-filling), computationally effieicent functional approach, application to Landsat and MODIS data. Massive Imputation for hyper-spectral satellite data (OCO2), data sparse in space-time Unmixing problem for the SMOS and OCO2 data. Original title: Asynchronous Stochastic Gradient Descent with unbounded delay on nonconvex problem Actual title: Distributed Optimization: An Overview and Some Theoretical Results Zhang & Liu & Zhu Asyn-SGD 3 / 35
  • 4. Introduction Distributed Computation Problem: Datasets are becoming extremely large (features and sample size), and they may be collected and/or stored in a distributed system. With Moore’s law coming to an end in a few years (2025?), we can no longer rely on hardware improvements anymore. Distributed computation study how to divide a ’big’ problem into several small parts and allocate these parts to many computers, then combine ’local’ results to obtain the final result; Bertsekas and Tsitsiklis (1999) provided a general frame work for parallele and distributed computation. Micro chip level: multi-core CPU/GPU Macro data center level: networked cloud computing Zhang & Liu & Zhu Asyn-SGD 4 / 35
  • 5. Introduction Workflow of Distributed Computing Figure: Distributed Computing Workflow Zhang & Liu & Zhu Asyn-SGD 5 / 35
  • 6. Introduction Some issues relevant to theory of data system Centralized vs local computation: local computation reduces data transfer costs, and have less issue with data privacy and confidentianity. Synchronous vs asynchronous methods: synchronization could involve significant communication overhead; server variability may lead to inefficiency; asyn methods may have convergence issue depending on the delay function and the algorithm. Data homogeneity vs heteroscedasticity Homogeneous: Databases Ξ1, ..., Ξk are shared, i.i.d or stationary. The objective function computed at each local machine is unbiased; Heterogeneous: Databases Ξ1, ..., Ξk are not i.i.d or stationary, i.e., they could be from different sources or collection with differen methods. The objective function in each local machine may be biased; Trade-off in computation, communication, and inference precision. Zhang & Liu & Zhu Asyn-SGD 6 / 35
  • 7. Introduction Distributed Optimization Algorithms Some of the well-studied algorithms for distributed optimization: Stochastic Gradient Descent (SGD) Bottou (1998, 2011) theory and application to large scale machine learning; Recht et. al. (2011) Asynchoronous SGD algorithm HOGWILD!; Lian et. al. (2015), convergence rate for non-convex problem with bounded delay; Alternating Direction Method of Multiplier (ADMM) Gabay and B. Mercier (1976), Boyd et.al. (2010), Chang et. al. (2015), Hong (2017) Distributed quasi-newton methods for faster rates of convergence Eisen et. al. (2017) uses gradient to estimate the curvature, Mansoori (2017) used a matrix splitting technique to compute Newton direction in a distributed way. Zhang & Liu & Zhu Asyn-SGD 7 / 35
  • 8. Introduction Applications in ML Distributed optimization, and in particular distributed SGD, has become a very popular way to speedup machine learning algorithms. Some successful examples: In ?, parallel system is used to train SVM, which could save computational time and avoid out of memory; ? designed an parallelizable method, called CCD++, for matrix factorization in large-scale recommender systems.; Distributed deep learning: ? proposed two distributed algorithms, Downpour SGD and Sandblaster, to train DNN; Abadi et. al. (2016) introduced TensorFlow for large scale machine learning Zhang & Liu & Zhu Asyn-SGD 8 / 35
  • 9. Asynchronous Stochastic Gradient Descent 1 Introduction 2 Asynchronous Stochastic Gradient Descent 3 Convergence Analysis 4 Numerical Study 5 Conclusion 6 Preliminary Work Zhang & Liu & Zhu Asyn-SGD 9 / 35
  • 10. Asynchronous Stochastic Gradient Descent Overview for Stochastic Gradient Descent (SGD) Our work focuses on the feasiblity of a distributed asynchronous optimization algorithm, Asynchronous Stochastic Gradient Descent, under unbounded delays. Also referred to as stochastic approximation in the literature; First introduced in ? and ?; The idea: simply use a noisy unbiased gradient to replace the unknown true gradient in the gradient descent algorithm; The stochastic gradient descent works as follows. To solve the following optimization: min x∈Rd f (x) = E[F(x; ξ)], (1) Let xk+1 = xk − γkG(xk), where xk presents the parameter in k-th iteration, G(xk) is a noisy unbiased gradient based on xk; Zhang & Liu & Zhu Asyn-SGD 10 / 35
  • 11. Asynchronous Stochastic Gradient Descent Asynchronous Stochastic Gradient Descent (Asyn-SGD) Asyn-SGD is a extension framework based on SGD. It could be implemented as following: For workers, compute gradients G with current parameter x and random sample ξ; report gradients to server; For server, collect the certain amount (M) of gradients from workers; update current parameter with these gradients; Zhang & Liu & Zhu Asyn-SGD 11 / 35
  • 12. Asynchronous Stochastic Gradient Descent Asynchronous Stochastic Gradient Descent (Asyn-SGD) Algorithm 1 Asynchronous Stochastic Gradient Descent (Asyn-SGD) Require: Database Ξ, step size {γk}, initial point x0, batch size M; Ensure: xk; At parameter server: 1: for i= 1, 2 ,..., k do 2: Collecting M gradients G(xi−τi,m ; ξi,m) from workers; 3: Updating xi+1 = xi − γi M m=1 G(xi−τi,m ; ξi,m); 4: end for At workers: 5: Receive current parameter x∗ from parameter server; 6: Randomly select sample ξ from database; 7: Compute stochastic gradient G(x∗; ξ) and report it to server; Here τi,m is the delay in i-th iteration and m-th batch. Zhang & Liu & Zhu Asyn-SGD 12 / 35
  • 13. Asynchronous Stochastic Gradient Descent Asynchronous Stochastic Gradient Descent with Incremental batch size (Asyn-SGDI) A modified version of Asyn-SGD is to increase the batch size when determining the undate direction. With large batch size, the variance of the gradient noise would decrease, which might lead to a faster result. Algorithm 2 Asyn SGD with increment batch size (Asyn-SGDI) Require: Database Ξ, step size {γk}, initial point x0, increasing batch size{Mi = ni M}; Ensure: xk; At parameter server: 1: for i= 1, 2 ,..., k do 2: Collecting M gradients G(xi−τi,m ; ξi,m) from workers; 3: Updating xi+1 = xi − γi ni Mi m=1 G(xi−τi,m ; ξi,m); 4: end for Zhang & Liu & Zhu Asyn-SGD 13 / 35
  • 14. Convergence Analysis 1 Introduction 2 Asynchronous Stochastic Gradient Descent 3 Convergence Analysis 4 Numerical Study 5 Conclusion 6 Preliminary Work Zhang & Liu & Zhu Asyn-SGD 14 / 35
  • 15. Convergence Analysis General Assumption Assumption (Lower bounded objective function) For objective function f , there exists a optimal point x∗, s.t. ∀x, f (x) ≥ f (x∗). Assumption (Lipschitz Continuous Gradient) The objective function f satisfies f (x) − f (y) ≤ L x − y , ∀x, y. Assumption (Unbiased graidents with bounded variance) The stochastic gradient G(x; ξ) is unbiased with bounded variance, that is to say: 1 E(G(x; ξ)) = f (x), ∀x, ξ; 2 E( (G(x; ξ)) − f (x) 2) ≤ σ2, ∀x; Zhang & Liu & Zhu Asyn-SGD 15 / 35
  • 16. Convergence Analysis Restriction for probabilities of delay variables Assumption There exists a sequence {ci }, such that cj+1 + γkML2 2 k i=j iP(τk = i) ≤ cj , ∀ j, k, (2) where τk presents the maximun delay in k-th iteration: τk = maxm τk,m and γk is the step size. Here, {ci } is the weight in the asynchronicity error. Zhang & Liu & Zhu Asyn-SGD 16 / 35
  • 17. Convergence Analysis Convergence Analysis for Asyn-SGD Now we can give the convergence result for Asyn-SGD: Theorem Assume above assumptions hold and the stepsize {γk} satisfies 1 γk ≤ 1 2Mc1+ML, ∀k; 2 γk is unsummable but γ2 k is summable; where M is fixed batch size, L is Lipschitz constant in Assumption 2 and c1 is from the sequence in Assumption 4, then we have E[ ∞ k=1 γk f (xk) 2] < ∞, and E[ f (xk) 2] → 0. Corollary If the step size γk = O(1/(K1/2log(K)), ∀ > 0, then the asymptotic convergence rate for Asyn-SGD is E( f (xk) 2 ) = o(1/ √ K). Zhang & Liu & Zhu Asyn-SGD 17 / 35
  • 18. Convergence Analysis Convergence Analysis for Asyn-SGD with incremental batch size Similarly, we can get the convergence result for Asyn-SGD with incremental batch size: Theorem Assume the above assumptions hold and the size of database is infinite, set batch size {Mk := nkM} satisfying that ∞ k=1 1 nk < ∞ and step size {γk} satisfying that γk ≤ 1 2M1c1+M1L, ∀k, then we have E[ ∞ k=1 γk f (xk) 2] < ∞, and E[ f (xk) 2] → 0. Corollary For > 0, and nk = o( 1 k1+ ), with fixed stepsize satisfying the requirement in Theorem 3.2, we have E( f (xk) 2) = o(1/K). Zhang & Liu & Zhu Asyn-SGD 18 / 35
  • 19. Convergence Analysis Bounded Delay Variable First we consider a simple case, in which the delay variables are bounded. Corollary (Bounded Delay Variable) If the delay variables {τk} are bounded, then {ci } exists. This is a very common case, as discussed in ?, ? etc.. This scenario is reasonable as long as all the worker runs with evenly speed. Zhang & Liu & Zhu Asyn-SGD 19 / 35
  • 20. Convergence Analysis I.I.D. Delay Variable Second case is to assume that for the sequence of delays {τk} is I.I.D. and the commmon distribution has finite second moment. This scenoria is rational when the iteration number is very large and the system has reached the stationarity. Corollary (I.I.D. Delay Variale) If the probability series {τk} is I.I.D as τ and τ has finite second moment, then {ci } exists. Zhang & Liu & Zhu Asyn-SGD 20 / 35
  • 21. Convergence Analysis Uniform Upper Bound Third case: the delay variables can have different distributions as long as uniformly they could be bounded by a second moment finite sequence. This is a more general case. Corollary (Uniformly Upper Bounded Probability Series) Consider the probability series of delay variables {τk}∞ k=1, if there exists a series {ai }∞ i=1 s.t. 1 ∞ i=1 i2ai < ∞; 2 P(τk = i) ≤ ai , ∀k; then {ci } exists. Zhang & Liu & Zhu Asyn-SGD 21 / 35
  • 22. Numerical Study 1 Introduction 2 Asynchronous Stochastic Gradient Descent 3 Convergence Analysis 4 Numerical Study 5 Conclusion 6 Preliminary Work Zhang & Liu & Zhu Asyn-SGD 22 / 35
  • 23. Numerical Study Example 1: MLE for MVN Covariance Matrix First, we consider maximum likelihood estimation for the covariance matrix in multivariate normal distribution. This problem can be formulated as: min Σ∈Rd×d ln |Σ| + 1 n n i=1 (xi − µ)T Σ−1 (xi − µ) (3) subjet to Σ 0 where Σ is the covariance matrix, µ is the mean vector and xi are samples. The gradient for this problem has been derived in ?. Zhang & Liu & Zhu Asyn-SGD 23 / 35
  • 24. Numerical Study Example 1: MLE for MVN Covariance Matrix We randomly generate data from multivariate normal distribuion with mean as (0, 0) and covariance matrix as (10, 3; 3, 5). (a) is with bounded delay variable and the upper bound is 50; (b) uses poisson delay with parameter 30; in (c), we simulates a virtual system and the working time t for each worker follows the same model, t ∼ Exp(λ) and λ ∼ Gamma(2, 1). The green solid line is the convergence result for Asyn-SGD with O(1/k) step size, the orange dotted line is the convergence result for Asyn-SGD with O(1/(K1/2log(K))) step size and the purple dashed line is the convergence result for Asyn-SGDI. Zhang & Liu & Zhu Asyn-SGD 24 / 35
  • 25. Numerical Study Example 1: MLE for MVN Covariance Matrix (a) bounded by 50 (b) Poi(50) (c) System delay Figure: Convergence for Asyn-SGD and Asyn-SGDI In the three cases, the l2 norm of gradient will go to zero while Asyn-SGDI is fastest and Asyn-SGD with stepsize O(1/k) is slowest. Zhang & Liu & Zhu Asyn-SGD 25 / 35
  • 26. Numerical Study Example 1: MLE for MVN Covariance Matrix We consider an extreme case where the delay variable is from discrete uniform distribution (evenly probability). Figure: A counter example when Asyn-SGD fails Zhang & Liu & Zhu Asyn-SGD 26 / 35
  • 27. Numerical Study Example 1: MLE for MVN Covariance Matrix We also compare the computation time of Syn-SGD, Asyn-SGD and Asyn-SGDI on this problem. The step size for Syn-SGD and Asyn-SGD is O(1/k) and the step size for Asyn-SGD is constant. Figure: Computation time for three algorithms: the red line is for Syn-SGD; blue dotted line is for Asyn-SGD; black dotdash line is for Asyn-SGDI Zhang & Liu & Zhu Asyn-SGD 27 / 35
  • 28. Numerical Study Example 2: Low Rank Matrix Completion This problem is to find the lowest rank matrix X which matches the expectation of observed symmetric matrices, E[A]. It could be mathematically formulated as following: min E[ A − YY T 2 F ] (4) subjet to Y ∈ Rn×p where X = YY T . Using SGD to solve this problem has been discussed in many works, including ? and ? etc. Zhang & Liu & Zhu Asyn-SGD 28 / 35
  • 29. Numerical Study Example 2: Low Rank Matrix Completion (a) bounded by 50 (b) Poi(30) (c) System delay Figure: Convergence for Asyn-SGD and Asyn-SGDI Zhang & Liu & Zhu Asyn-SGD 29 / 35
  • 30. Conclusion 1 Introduction 2 Asynchronous Stochastic Gradient Descent 3 Convergence Analysis 4 Numerical Study 5 Conclusion 6 Preliminary Work Zhang & Liu & Zhu Asyn-SGD 30 / 35
  • 31. Conclusion Conclusion In our work, we analyze the convergence of Asyn-SGD on nonconvex optimization problem with unbounded delay; We propose a new Lyapurov function, which consists of classical error and asynchronicity error; A sufficient condition for delay variable is given to guarantee the convergence of Asyn-SGD; With proper stepsize, the asymptotic convergence rate for Asyn-SGD is o(1/ √ k) and that for Asyn-SGDI is o(1/k). This algorithm requires local gradient to be unbiased. For heterogeneous case, we are working on an ADMM based asynchonous solution. Zhang & Liu & Zhu Asyn-SGD 31 / 35
  • 32. Preliminary Work 1 Introduction 2 Asynchronous Stochastic Gradient Descent 3 Convergence Analysis 4 Numerical Study 5 Conclusion 6 Preliminary Work Zhang & Liu & Zhu Asyn-SGD 32 / 35
  • 33. Preliminary Work Distributed Computing and ADMM Consider following problem: Data are distributed in several machine, let’s say Ξ1, Ξ2, ..., Ξk; The objective function is min x L(x; Ξ1, Ξ2, ..., Ξk) = k i=1 Li (x; Ξi ); (5) Communication cost is too expensive so each machine could only ”see” local objection function Li (x; Ξi ); Data is biased, which means xi = arg min Li (x; Ξi ) is not consistent. Zhang & Liu & Zhu Asyn-SGD 33 / 35
  • 34. Preliminary Work Problem Formulation Reformuating the problem: min x L(x; Ξ1, Ξ2, ..., Ξk) = k i=1 Li (x; Ξi ) (6) ⇒ min x k i=1 Li (xi ; Ξi ), s.t.xi = x, ∀ i (7) The corresponding augmented Lagrangian function: L({xi }, x; y) = k i=1 Li (xi ; Ξi ) + k i=1 yk, xk − x + k i=1 ρi 2 xi − x 2 ; (8) Thus, x and yk could be updated in central server; xk could be updated in loacl machine. Only x, xk and yk are transported between the central server and local machines. Zhang & Liu & Zhu Asyn-SGD 34 / 35
  • 35. Preliminary Work ADMM based parallel computing framework Algorithm 3 ADMM based parallel computing framework Require: Database {Ξi }, {ρi }, initial point; Ensure: xT ; At parameter server: 1: for t= 1, 2 ,..., T do 2: Collect xk from local machines; 3: Update xt+1: xt+1 = arg minx K i=1 yt i , xt i − x + K i=1 ρi 2 xi − x 2; 4: Update yt+1 k = yt k + ρk(xt+1 k − xt+1); 5: end for At local machine i: 6: Receive current yt+1 i and xt+1 from parameter server; 7: Update xt+1 i = arg min xi Li (x; Ξi ) + K i=1 yt+1 i , xi − xt+1 + ρi 2 xi − xt+1 2 ; Zhang & Liu & Zhu Asyn-SGD 35 / 35