SlideShare a Scribd company logo
1 of 14
Download to read offline
Distributed optimization
mquartulli@vicomtech.org
Motivation
• Given a trained model, ML / prediction is easy to distribute
• Not a full-blown “Big Data” problem
• What about model training in the face of Big (training) Data?
• Distributed training needed!
• Under the hood: ML as optimisation
ML and optimisation
‘Big Data’ ML:
• high training sample volumes
• high-dimensional data
• distributed data: collection, storage
methods are based on optimisation
• write ML as a (typically convex) optimisation problem
• optimise.
Problem formalization
Problem:
• minimize J(𝜃), 𝜃 ∈ ℝd
• subject to Ji(𝜃) ≤ bi, i = 1,...,m
with
• 𝜃 = (𝜃1 ,…, 𝜃d) ∈ ℝd
the optimisation variable
• J : Rd
→ R the objective function
• Ji : Rd
→ R, i = 1,…, m the constraints
• constants b1 ,…, bm the bounds for the constraints.
Gradient descent
• Update the parameters in the opposite direction of the gradient of
the objective function ∇ 𝜃J(𝜃) w.r.t. the parameters.
• The learning rate 𝜂 determines the size of the steps we take to reach
a (local) minimum.
• We follow the direction of the slope of the surface created by the
objective function downhill until we reach a valley. 



[NOTE: heavily based on Sebastian Ruder’s “An overview of
gradient descent optimization algorithms”, 19 Jan 2016]
Batch gradient descent
• Idea: depending on the amount of data, trade-off between the
accuracy of the parameter update and the time it takes to perform
an update.
• Update: 𝜃 = 𝜃 - 𝜂 ∙ ∇ 𝜃J(𝜃)
Stochastic gradient descent
• Idea: perform a parameter update for each training example x(i) and
label y(i)
• Update: 𝜃 = 𝜃 - 𝜂 ∙ ∇ 𝜃J(𝜃; x(i), y(i))
• Performs redundant computations for large datasets
Momentum gradient descent
• Idea: overcome ravine oscillations by momentum
• Update:
• vt = 𝛾 vt-1 + 𝜂 ∙ ∇ 𝜃J(𝜃)
• 𝜃 = 𝜃 - vt
Nesterov accelerated gradient
• Idea: 1. big jump in the direction of the previous accumulated
gradient & measure the gradient and then 2. make a correction.
• Update:
• vt = 𝛾 vt-1 + 𝜂 ∙ ∇ 𝜃J(𝜃-𝛾 vt-1)
• 𝜃 = 𝜃 - vt
Adagrad
• Idea: larger updates for infrequent and smaller updates for frequent
parameters.
• Update: let gt,i = ∇ 𝜃J(𝜃i); 𝜃t+1,i = 𝜃t,i + 𝛥𝜃t. Then:
• SGD: 𝛥𝜃t = - 𝜂 ∙ gt
• Adagrad: 𝛥𝜃t = - 𝜂 / √(Gt+ϵ) ⊙ gt



with Gt ∈ℝd⨉d a diagonal matrix where each diagonal element i,i the sum
of square of gradients w.r.t. 𝜃i up to time step t, ⊙ element-wise matrix-
vector multiplication.
Adadelta
• Idea: Instead of accumulating all past squared gradients, restrict the
window of accumulated past gradients to some fixed size w.
• The sum of gradients is recursively defined as a decaying average
of all past squared gradients:

E[𝛥𝜃2]t = 𝛾 E[𝛥𝜃2]t-1 + (1-𝛾) 𝛥𝜃t2
• Update: we replace the diagonal matrix Gt with the decaying
average over past squared gradients E[g2]t 



𝛥𝜃t = - RMS[𝛥𝜃]t-1/RMS[g]t ⊙ gt
RMSprop
• Idea: use the first update vector of Adadelta
• Update:
• E[g2]t = 0.9 E[g2]t-1 + 0.1 gt2
• 𝛥𝜃t = - 𝜂 / √(E[g2]t + ϵ) ⊙ gt
Visualization and comparison
Adagrad, Adadelta, and RMSprop
almost immediately head off in the
right direction and converge
similarly fast, while Momentum and
NAG are led off-track, evoking the
image of a ball rolling down the hill.
NAG, however, is quickly able to
correct its course due to its
increased responsiveness by
looking ahead and heads to the
minimum.
Conclusions
• Big Data ML requires (scalable, distributed) algorithms to process
training points in small batches, performing effective incremental
updates to the model
• Final objective: a closed loop that trains models, compares them
recursively
• Key challenge: evaluation metrics in the face of available resources 

(including data)

More Related Content

What's hot

Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionKnoldus Inc.
 
Dimensionality reduction
Dimensionality reductionDimensionality reduction
Dimensionality reductionShatakirti Er
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and ldaSuresh Pokharel
 
A Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image SimilarityA Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image SimilarityFarah M. Altufaili
 
Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionJordan McBain
 
Matrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlpMatrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlpankit_ppt
 
Graph based approaches to Gene Expression Clustering
Graph based approaches to Gene Expression ClusteringGraph based approaches to Gene Expression Clustering
Graph based approaches to Gene Expression ClusteringGovind Maheswaran
 
Beginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBeginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBenjamin Bengfort
 
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin RSelf-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin Rshanelynn
 
KNN - Classification Model (Step by Step)
KNN - Classification Model (Step by Step)KNN - Classification Model (Step by Step)
KNN - Classification Model (Step by Step)Manish nath choudhary
 
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsMethods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsRyan B Harvey, CSDP, CSM
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means ClusteringAnna Fensel
 
Some Engg. Applications of Matrices and Partial Derivatives
Some Engg. Applications of Matrices and Partial DerivativesSome Engg. Applications of Matrices and Partial Derivatives
Some Engg. Applications of Matrices and Partial DerivativesSanjaySingh011996
 
Dimension reduction(jiten01)
Dimension reduction(jiten01)Dimension reduction(jiten01)
Dimension reduction(jiten01)Jiten Dhimmar
 

What's hot (20)

Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Dimensionality reduction
Dimensionality reductionDimensionality reduction
Dimensionality reduction
 
Dimensionality reduction
Dimensionality reductionDimensionality reduction
Dimensionality reduction
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
 
Lda
LdaLda
Lda
 
A Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image SimilarityA Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image Similarity
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty Detection
 
Matrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlpMatrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlp
 
Graph based approaches to Gene Expression Clustering
Graph based approaches to Gene Expression ClusteringGraph based approaches to Gene Expression Clustering
Graph based approaches to Gene Expression Clustering
 
Beginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBeginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix Factorization
 
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin RSelf-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
 
Cluster Analysis for Dummies
Cluster Analysis for DummiesCluster Analysis for Dummies
Cluster Analysis for Dummies
 
KNN - Classification Model (Step by Step)
KNN - Classification Model (Step by Step)KNN - Classification Model (Step by Step)
KNN - Classification Model (Step by Step)
 
07 learning
07 learning07 learning
07 learning
 
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsMethods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
 
Pca
PcaPca
Pca
 
Some Engg. Applications of Matrices and Partial Derivatives
Some Engg. Applications of Matrices and Partial DerivativesSome Engg. Applications of Matrices and Partial Derivatives
Some Engg. Applications of Matrices and Partial Derivatives
 
Dimension reduction(jiten01)
Dimension reduction(jiten01)Dimension reduction(jiten01)
Dimension reduction(jiten01)
 

Viewers also liked

07 data structures_and_representations
07 data structures_and_representations07 data structures_and_representations
07 data structures_and_representationsMarco Quartulli
 
05 sensor signal_models_feature_extraction
05 sensor signal_models_feature_extraction05 sensor signal_models_feature_extraction
05 sensor signal_models_feature_extractionMarco Quartulli
 
07 big skyearth_dlr_7_april_2016
07 big skyearth_dlr_7_april_201607 big skyearth_dlr_7_april_2016
07 big skyearth_dlr_7_april_2016Marco Quartulli
 
08 visualisation seminar ver0.2
08 visualisation seminar   ver0.208 visualisation seminar   ver0.2
08 visualisation seminar ver0.2Marco Quartulli
 
04 bigdata and_cloud_computing
04 bigdata and_cloud_computing04 bigdata and_cloud_computing
04 bigdata and_cloud_computingMarco Quartulli
 

Viewers also liked (10)

07 data structures_and_representations
07 data structures_and_representations07 data structures_and_representations
07 data structures_and_representations
 
05 astrostat feigelson
05 astrostat feigelson05 astrostat feigelson
05 astrostat feigelson
 
05 sensor signal_models_feature_extraction
05 sensor signal_models_feature_extraction05 sensor signal_models_feature_extraction
05 sensor signal_models_feature_extraction
 
04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
 
06 ashish mahabal bse2
06 ashish mahabal bse206 ashish mahabal bse2
06 ashish mahabal bse2
 
06 ashish mahabal bse1
06 ashish mahabal bse106 ashish mahabal bse1
06 ashish mahabal bse1
 
07 big skyearth_dlr_7_april_2016
07 big skyearth_dlr_7_april_201607 big skyearth_dlr_7_april_2016
07 big skyearth_dlr_7_april_2016
 
08 visualisation seminar ver0.2
08 visualisation seminar   ver0.208 visualisation seminar   ver0.2
08 visualisation seminar ver0.2
 
06 ashish mahabal bse3
06 ashish mahabal bse306 ashish mahabal bse3
06 ashish mahabal bse3
 
04 bigdata and_cloud_computing
04 bigdata and_cloud_computing04 bigdata and_cloud_computing
04 bigdata and_cloud_computing
 

Similar to 08 distributed optimization

Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelineChenYiHuang5
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptxssuserf07225
 
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Maninda Edirisooriya
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةFares Al-Qunaieer
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptxPrabhuSelvaraj15
 
Dynamic programming class 16
Dynamic programming class 16Dynamic programming class 16
Dynamic programming class 16Kumar
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsJason Riedy
 
Matlab tutorial and Linear Algebra Review.ppt
Matlab tutorial and Linear Algebra Review.pptMatlab tutorial and Linear Algebra Review.ppt
Matlab tutorial and Linear Algebra Review.pptIndra Hermawan
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learningYogendra Singh
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep LearningSebastian Ruder
 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsJason Riedy
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersJen Aman
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Sparkdatamantra
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fittingWush Wu
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models ananth
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learningmilad abbasi
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningMehrnaz Faraz
 

Similar to 08 distributed optimization (20)

Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptx
 
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلة
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptx
 
Dynamic programming class 16
Dynamic programming class 16Dynamic programming class 16
Dynamic programming class 16
 
Regression ppt
Regression pptRegression ppt
Regression ppt
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
 
Matlab tutorial and Linear Algebra Review.ppt
Matlab tutorial and Linear Algebra Review.pptMatlab tutorial and Linear Algebra Review.ppt
Matlab tutorial and Linear Algebra Review.ppt
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep Learning
 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming Graphs
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Spark
 
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 

Recently uploaded

Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXDole Philippines School
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxnoordubaliya2003
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...navyadasi1992
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 

Recently uploaded (20)

Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptx
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 

08 distributed optimization

  • 2. Motivation • Given a trained model, ML / prediction is easy to distribute • Not a full-blown “Big Data” problem • What about model training in the face of Big (training) Data? • Distributed training needed! • Under the hood: ML as optimisation
  • 3. ML and optimisation ‘Big Data’ ML: • high training sample volumes • high-dimensional data • distributed data: collection, storage methods are based on optimisation • write ML as a (typically convex) optimisation problem • optimise.
  • 4. Problem formalization Problem: • minimize J(𝜃), 𝜃 ∈ ℝd • subject to Ji(𝜃) ≤ bi, i = 1,...,m with • 𝜃 = (𝜃1 ,…, 𝜃d) ∈ ℝd the optimisation variable • J : Rd → R the objective function • Ji : Rd → R, i = 1,…, m the constraints • constants b1 ,…, bm the bounds for the constraints.
  • 5. Gradient descent • Update the parameters in the opposite direction of the gradient of the objective function ∇ 𝜃J(𝜃) w.r.t. the parameters. • The learning rate 𝜂 determines the size of the steps we take to reach a (local) minimum. • We follow the direction of the slope of the surface created by the objective function downhill until we reach a valley. 
 
 [NOTE: heavily based on Sebastian Ruder’s “An overview of gradient descent optimization algorithms”, 19 Jan 2016]
  • 6. Batch gradient descent • Idea: depending on the amount of data, trade-off between the accuracy of the parameter update and the time it takes to perform an update. • Update: 𝜃 = 𝜃 - 𝜂 ∙ ∇ 𝜃J(𝜃)
  • 7. Stochastic gradient descent • Idea: perform a parameter update for each training example x(i) and label y(i) • Update: 𝜃 = 𝜃 - 𝜂 ∙ ∇ 𝜃J(𝜃; x(i), y(i)) • Performs redundant computations for large datasets
  • 8. Momentum gradient descent • Idea: overcome ravine oscillations by momentum • Update: • vt = 𝛾 vt-1 + 𝜂 ∙ ∇ 𝜃J(𝜃) • 𝜃 = 𝜃 - vt
  • 9. Nesterov accelerated gradient • Idea: 1. big jump in the direction of the previous accumulated gradient & measure the gradient and then 2. make a correction. • Update: • vt = 𝛾 vt-1 + 𝜂 ∙ ∇ 𝜃J(𝜃-𝛾 vt-1) • 𝜃 = 𝜃 - vt
  • 10. Adagrad • Idea: larger updates for infrequent and smaller updates for frequent parameters. • Update: let gt,i = ∇ 𝜃J(𝜃i); 𝜃t+1,i = 𝜃t,i + 𝛥𝜃t. Then: • SGD: 𝛥𝜃t = - 𝜂 ∙ gt • Adagrad: 𝛥𝜃t = - 𝜂 / √(Gt+ϵ) ⊙ gt
 
 with Gt ∈ℝd⨉d a diagonal matrix where each diagonal element i,i the sum of square of gradients w.r.t. 𝜃i up to time step t, ⊙ element-wise matrix- vector multiplication.
  • 11. Adadelta • Idea: Instead of accumulating all past squared gradients, restrict the window of accumulated past gradients to some fixed size w. • The sum of gradients is recursively defined as a decaying average of all past squared gradients:
 E[𝛥𝜃2]t = 𝛾 E[𝛥𝜃2]t-1 + (1-𝛾) 𝛥𝜃t2 • Update: we replace the diagonal matrix Gt with the decaying average over past squared gradients E[g2]t 
 
 𝛥𝜃t = - RMS[𝛥𝜃]t-1/RMS[g]t ⊙ gt
  • 12. RMSprop • Idea: use the first update vector of Adadelta • Update: • E[g2]t = 0.9 E[g2]t-1 + 0.1 gt2 • 𝛥𝜃t = - 𝜂 / √(E[g2]t + ϵ) ⊙ gt
  • 13. Visualization and comparison Adagrad, Adadelta, and RMSprop almost immediately head off in the right direction and converge similarly fast, while Momentum and NAG are led off-track, evoking the image of a ball rolling down the hill. NAG, however, is quickly able to correct its course due to its increased responsiveness by looking ahead and heads to the minimum.
  • 14. Conclusions • Big Data ML requires (scalable, distributed) algorithms to process training points in small batches, performing effective incremental updates to the model • Final objective: a closed loop that trains models, compares them recursively • Key challenge: evaluation metrics in the face of available resources 
 (including data)