SlideShare a Scribd company logo
1 of 14
Download to read offline
Distributed optimization
mquartulli@vicomtech.org
Motivation
• Given a trained model, ML / prediction is easy to distribute
• Not a full-blown “Big Data” problem
• What about model training in the face of Big (training) Data?
• Distributed training needed!
• Under the hood: ML as optimisation
ML and optimisation
‘Big Data’ ML:
• high training sample volumes
• high-dimensional data
• distributed data: collection, storage
methods are based on optimisation
• write ML as a (typically convex) optimisation problem
• optimise.
Problem formalization
Problem:
• minimize J(𝜃), 𝜃 ∈ ℝd
• subject to Ji(𝜃) ≤ bi, i = 1,...,m
with
• 𝜃 = (𝜃1 ,…, 𝜃d) ∈ ℝd
the optimisation variable
• J : Rd
→ R the objective function
• Ji : Rd
→ R, i = 1,…, m the constraints
• constants b1 ,…, bm the bounds for the constraints.
Gradient descent
• Update the parameters in the opposite direction of the gradient of
the objective function ∇ 𝜃J(𝜃) w.r.t. the parameters.
• The learning rate 𝜂 determines the size of the steps we take to reach
a (local) minimum.
• We follow the direction of the slope of the surface created by the
objective function downhill until we reach a valley. 



[NOTE: heavily based on Sebastian Ruder’s “An overview of
gradient descent optimization algorithms”, 19 Jan 2016]
Batch gradient descent
• Idea: depending on the amount of data, trade-off between the
accuracy of the parameter update and the time it takes to perform
an update.
• Update: 𝜃 = 𝜃 - 𝜂 ∙ ∇ 𝜃J(𝜃)
Stochastic gradient descent
• Idea: perform a parameter update for each training example x(i) and
label y(i)
• Update: 𝜃 = 𝜃 - 𝜂 ∙ ∇ 𝜃J(𝜃; x(i), y(i))
• Performs redundant computations for large datasets
Momentum gradient descent
• Idea: overcome ravine oscillations by momentum
• Update:
• vt = 𝛾 vt-1 + 𝜂 ∙ ∇ 𝜃J(𝜃)
• 𝜃 = 𝜃 - vt
Nesterov accelerated gradient
• Idea: 1. big jump in the direction of the previous accumulated
gradient & measure the gradient and then 2. make a correction.
• Update:
• vt = 𝛾 vt-1 + 𝜂 ∙ ∇ 𝜃J(𝜃-𝛾 vt-1)
• 𝜃 = 𝜃 - vt
Adagrad
• Idea: larger updates for infrequent and smaller updates for frequent
parameters.
• Update: let gt,i = ∇ 𝜃J(𝜃i); 𝜃t+1,i = 𝜃t,i + 𝛥𝜃t. Then:
• SGD: 𝛥𝜃t = - 𝜂 ∙ gt
• Adagrad: 𝛥𝜃t = - 𝜂 / √(Gt+ϵ) ⊙ gt



with Gt ∈ℝd⨉d a diagonal matrix where each diagonal element i,i the sum
of square of gradients w.r.t. 𝜃i up to time step t, ⊙ element-wise matrix-
vector multiplication.
Adadelta
• Idea: Instead of accumulating all past squared gradients, restrict the
window of accumulated past gradients to some fixed size w.
• The sum of gradients is recursively defined as a decaying average
of all past squared gradients:

E[𝛥𝜃2]t = 𝛾 E[𝛥𝜃2]t-1 + (1-𝛾) 𝛥𝜃t2
• Update: we replace the diagonal matrix Gt with the decaying
average over past squared gradients E[g2]t 



𝛥𝜃t = - RMS[𝛥𝜃]t-1/RMS[g]t ⊙ gt
RMSprop
• Idea: use the first update vector of Adadelta
• Update:
• E[g2]t = 0.9 E[g2]t-1 + 0.1 gt2
• 𝛥𝜃t = - 𝜂 / √(E[g2]t + ϵ) ⊙ gt
Visualization and comparison
Adagrad, Adadelta, and RMSprop
almost immediately head off in the
right direction and converge
similarly fast, while Momentum and
NAG are led off-track, evoking the
image of a ball rolling down the hill.
NAG, however, is quickly able to
correct its course due to its
increased responsiveness by
looking ahead and heads to the
minimum.
Conclusions
• Big Data ML requires (scalable, distributed) algorithms to process
training points in small batches, performing effective incremental
updates to the model
• Final objective: a closed loop that trains models, compares them
recursively
• Key challenge: evaluation metrics in the face of available resources 

(including data)

More Related Content

What's hot

Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionKnoldus Inc.
 
Dimensionality reduction
Dimensionality reductionDimensionality reduction
Dimensionality reductionShatakirti Er
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and ldaSuresh Pokharel
 
A Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image SimilarityA Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image SimilarityFarah M. Altufaili
 
Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionJordan McBain
 
Matrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlpMatrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlpankit_ppt
 
Graph based approaches to Gene Expression Clustering
Graph based approaches to Gene Expression ClusteringGraph based approaches to Gene Expression Clustering
Graph based approaches to Gene Expression ClusteringGovind Maheswaran
 
Beginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBeginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBenjamin Bengfort
 
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin RSelf-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin Rshanelynn
 
KNN - Classification Model (Step by Step)
KNN - Classification Model (Step by Step)KNN - Classification Model (Step by Step)
KNN - Classification Model (Step by Step)Manish nath choudhary
 
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsMethods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsRyan B Harvey, CSDP, CSM
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means ClusteringAnna Fensel
 
Some Engg. Applications of Matrices and Partial Derivatives
Some Engg. Applications of Matrices and Partial DerivativesSome Engg. Applications of Matrices and Partial Derivatives
Some Engg. Applications of Matrices and Partial DerivativesSanjaySingh011996
 
Dimension reduction(jiten01)
Dimension reduction(jiten01)Dimension reduction(jiten01)
Dimension reduction(jiten01)Jiten Dhimmar
 

What's hot (20)

Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Dimensionality reduction
Dimensionality reductionDimensionality reduction
Dimensionality reduction
 
Dimensionality reduction
Dimensionality reductionDimensionality reduction
Dimensionality reduction
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
 
Lda
LdaLda
Lda
 
A Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image SimilarityA Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image Similarity
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty Detection
 
Matrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlpMatrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlp
 
Graph based approaches to Gene Expression Clustering
Graph based approaches to Gene Expression ClusteringGraph based approaches to Gene Expression Clustering
Graph based approaches to Gene Expression Clustering
 
Beginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBeginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix Factorization
 
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin RSelf-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
 
Cluster Analysis for Dummies
Cluster Analysis for DummiesCluster Analysis for Dummies
Cluster Analysis for Dummies
 
KNN - Classification Model (Step by Step)
KNN - Classification Model (Step by Step)KNN - Classification Model (Step by Step)
KNN - Classification Model (Step by Step)
 
07 learning
07 learning07 learning
07 learning
 
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsMethods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
 
Pca
PcaPca
Pca
 
Some Engg. Applications of Matrices and Partial Derivatives
Some Engg. Applications of Matrices and Partial DerivativesSome Engg. Applications of Matrices and Partial Derivatives
Some Engg. Applications of Matrices and Partial Derivatives
 
Dimension reduction(jiten01)
Dimension reduction(jiten01)Dimension reduction(jiten01)
Dimension reduction(jiten01)
 

Viewers also liked

07 data structures_and_representations
07 data structures_and_representations07 data structures_and_representations
07 data structures_and_representationsMarco Quartulli
 
05 sensor signal_models_feature_extraction
05 sensor signal_models_feature_extraction05 sensor signal_models_feature_extraction
05 sensor signal_models_feature_extractionMarco Quartulli
 
07 big skyearth_dlr_7_april_2016
07 big skyearth_dlr_7_april_201607 big skyearth_dlr_7_april_2016
07 big skyearth_dlr_7_april_2016Marco Quartulli
 
08 visualisation seminar ver0.2
08 visualisation seminar   ver0.208 visualisation seminar   ver0.2
08 visualisation seminar ver0.2Marco Quartulli
 
04 bigdata and_cloud_computing
04 bigdata and_cloud_computing04 bigdata and_cloud_computing
04 bigdata and_cloud_computingMarco Quartulli
 

Viewers also liked (10)

07 data structures_and_representations
07 data structures_and_representations07 data structures_and_representations
07 data structures_and_representations
 
05 astrostat feigelson
05 astrostat feigelson05 astrostat feigelson
05 astrostat feigelson
 
05 sensor signal_models_feature_extraction
05 sensor signal_models_feature_extraction05 sensor signal_models_feature_extraction
05 sensor signal_models_feature_extraction
 
04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
 
06 ashish mahabal bse2
06 ashish mahabal bse206 ashish mahabal bse2
06 ashish mahabal bse2
 
06 ashish mahabal bse1
06 ashish mahabal bse106 ashish mahabal bse1
06 ashish mahabal bse1
 
07 big skyearth_dlr_7_april_2016
07 big skyearth_dlr_7_april_201607 big skyearth_dlr_7_april_2016
07 big skyearth_dlr_7_april_2016
 
08 visualisation seminar ver0.2
08 visualisation seminar   ver0.208 visualisation seminar   ver0.2
08 visualisation seminar ver0.2
 
06 ashish mahabal bse3
06 ashish mahabal bse306 ashish mahabal bse3
06 ashish mahabal bse3
 
04 bigdata and_cloud_computing
04 bigdata and_cloud_computing04 bigdata and_cloud_computing
04 bigdata and_cloud_computing
 

Similar to 08 distributed optimization

Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelineChenYiHuang5
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptxssuserf07225
 
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Maninda Edirisooriya
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةFares Al-Qunaieer
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptxPrabhuSelvaraj15
 
Dynamic programming class 16
Dynamic programming class 16Dynamic programming class 16
Dynamic programming class 16Kumar
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsJason Riedy
 
Matlab tutorial and Linear Algebra Review.ppt
Matlab tutorial and Linear Algebra Review.pptMatlab tutorial and Linear Algebra Review.ppt
Matlab tutorial and Linear Algebra Review.pptIndra Hermawan
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learningYogendra Singh
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep LearningSebastian Ruder
 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsJason Riedy
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersJen Aman
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Sparkdatamantra
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fittingWush Wu
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models ananth
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learningmilad abbasi
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningMehrnaz Faraz
 

Similar to 08 distributed optimization (20)

Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptx
 
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلة
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptx
 
Dynamic programming class 16
Dynamic programming class 16Dynamic programming class 16
Dynamic programming class 16
 
Regression ppt
Regression pptRegression ppt
Regression ppt
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
 
Matlab tutorial and Linear Algebra Review.ppt
Matlab tutorial and Linear Algebra Review.pptMatlab tutorial and Linear Algebra Review.ppt
Matlab tutorial and Linear Algebra Review.ppt
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep Learning
 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming Graphs
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Spark
 
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 

Recently uploaded

The Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravityThe Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravitySubhadipsau21168
 
Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.k64182334
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreams
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreamsAhmedabad Call Girls Service 9537192988 can satisfy every one of your dreams
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreamsoolala9823
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Jshifa
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 

Recently uploaded (20)

The Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravityThe Black hole shadow in Modified Gravity
The Black hole shadow in Modified Gravity
 
Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreams
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreamsAhmedabad Call Girls Service 9537192988 can satisfy every one of your dreams
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreams
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 

08 distributed optimization

  • 2. Motivation • Given a trained model, ML / prediction is easy to distribute • Not a full-blown “Big Data” problem • What about model training in the face of Big (training) Data? • Distributed training needed! • Under the hood: ML as optimisation
  • 3. ML and optimisation ‘Big Data’ ML: • high training sample volumes • high-dimensional data • distributed data: collection, storage methods are based on optimisation • write ML as a (typically convex) optimisation problem • optimise.
  • 4. Problem formalization Problem: • minimize J(𝜃), 𝜃 ∈ ℝd • subject to Ji(𝜃) ≤ bi, i = 1,...,m with • 𝜃 = (𝜃1 ,…, 𝜃d) ∈ ℝd the optimisation variable • J : Rd → R the objective function • Ji : Rd → R, i = 1,…, m the constraints • constants b1 ,…, bm the bounds for the constraints.
  • 5. Gradient descent • Update the parameters in the opposite direction of the gradient of the objective function ∇ 𝜃J(𝜃) w.r.t. the parameters. • The learning rate 𝜂 determines the size of the steps we take to reach a (local) minimum. • We follow the direction of the slope of the surface created by the objective function downhill until we reach a valley. 
 
 [NOTE: heavily based on Sebastian Ruder’s “An overview of gradient descent optimization algorithms”, 19 Jan 2016]
  • 6. Batch gradient descent • Idea: depending on the amount of data, trade-off between the accuracy of the parameter update and the time it takes to perform an update. • Update: 𝜃 = 𝜃 - 𝜂 ∙ ∇ 𝜃J(𝜃)
  • 7. Stochastic gradient descent • Idea: perform a parameter update for each training example x(i) and label y(i) • Update: 𝜃 = 𝜃 - 𝜂 ∙ ∇ 𝜃J(𝜃; x(i), y(i)) • Performs redundant computations for large datasets
  • 8. Momentum gradient descent • Idea: overcome ravine oscillations by momentum • Update: • vt = 𝛾 vt-1 + 𝜂 ∙ ∇ 𝜃J(𝜃) • 𝜃 = 𝜃 - vt
  • 9. Nesterov accelerated gradient • Idea: 1. big jump in the direction of the previous accumulated gradient & measure the gradient and then 2. make a correction. • Update: • vt = 𝛾 vt-1 + 𝜂 ∙ ∇ 𝜃J(𝜃-𝛾 vt-1) • 𝜃 = 𝜃 - vt
  • 10. Adagrad • Idea: larger updates for infrequent and smaller updates for frequent parameters. • Update: let gt,i = ∇ 𝜃J(𝜃i); 𝜃t+1,i = 𝜃t,i + 𝛥𝜃t. Then: • SGD: 𝛥𝜃t = - 𝜂 ∙ gt • Adagrad: 𝛥𝜃t = - 𝜂 / √(Gt+ϵ) ⊙ gt
 
 with Gt ∈ℝd⨉d a diagonal matrix where each diagonal element i,i the sum of square of gradients w.r.t. 𝜃i up to time step t, ⊙ element-wise matrix- vector multiplication.
  • 11. Adadelta • Idea: Instead of accumulating all past squared gradients, restrict the window of accumulated past gradients to some fixed size w. • The sum of gradients is recursively defined as a decaying average of all past squared gradients:
 E[𝛥𝜃2]t = 𝛾 E[𝛥𝜃2]t-1 + (1-𝛾) 𝛥𝜃t2 • Update: we replace the diagonal matrix Gt with the decaying average over past squared gradients E[g2]t 
 
 𝛥𝜃t = - RMS[𝛥𝜃]t-1/RMS[g]t ⊙ gt
  • 12. RMSprop • Idea: use the first update vector of Adadelta • Update: • E[g2]t = 0.9 E[g2]t-1 + 0.1 gt2 • 𝛥𝜃t = - 𝜂 / √(E[g2]t + ϵ) ⊙ gt
  • 13. Visualization and comparison Adagrad, Adadelta, and RMSprop almost immediately head off in the right direction and converge similarly fast, while Momentum and NAG are led off-track, evoking the image of a ball rolling down the hill. NAG, however, is quickly able to correct its course due to its increased responsiveness by looking ahead and heads to the minimum.
  • 14. Conclusions • Big Data ML requires (scalable, distributed) algorithms to process training points in small batches, performing effective incremental updates to the model • Final objective: a closed loop that trains models, compares them recursively • Key challenge: evaluation metrics in the face of available resources 
 (including data)