SlideShare a Scribd company logo
1 of 27
Basic Concepts of Large Scale
Optimization for Machine Learning
Devdatt Dubhashi
AI and Data Science
Computer Science and Engineering
Chalmers
Machine Intelligence Sweden AB
Behind the Cat Pictures …
• Amazing successes of ML in
computer vision, natural
language processing …
• Underneath the hood is
optimization
• Large scale machine learning:
– large n (data points)
– large d (dimension)
Minimization of Finite Sums
• Assumptions on component functions: convex, smooth, …
Empirical Risk Minimization (ERM)
• Labelled training data:
• Parametrized class of prediction
functions:
• Empirical Loss:
min
Data Driven Clustering
• Given data points, cluster
• Classic K-means algorithm
• Needs to know k, number of
clusters
• Data driven clustering: find the
right number of clusters driven
by data. (Panahi, D: ICML 2017)
Minimization of Finite Sums
• Assumptions on component functions: convex, smooth, …
Mother of all First Order Methods:Gradient Descent
Gradient Descent Convergence
However, GD is not viable for large scale ML because each iteration has cost nd
Stochastic Gradient Descent (SGD)
Robbins and Munro 1950
• Index sampled uniformly at random with
replacement from [n]
• Cost per iteration is d
• Hugely successful in machine learning!
Stochastic, Batch and Full Gradient Descent
• Full GD:
• Minibatch GD:
• Stochastic GD:
The Unreasonable Effectiveness of SGD
• Very fast initial convergence
• Cheap O(d) per iteration as
opposed to O(nd) for full GD
• Very slow at the end ...
Convergence is only O(1/ 𝑘) for
smooth and O(1/k) for smooth
strongly convex functions.
• … but we do not need to run the
iterations to optimum, better to
stop early (Bottou and Bosquet)
SGD: Have the Cake and Eat it Too!
(Bottou and Bosquet 2008)
Variance/Noise Reduction
Can we improve convergence of SGD?
Variance/Noise Reduction
Variance Reduction: Three Takes
Nesterov Momentum for GD
Y. Nesterov, Doklady 1983.
Katyusha Momentum for SGD
Z. Allen-Zhu, STOC 2017, JMLR 2018
Non-smooth Objectives
• What if the objective is non-smooth?
• Need a proxy for gradients.
LASSO
SON Clustering
Proximal Operator
• Proximal operator:
• Special case: projection:
• Like a gradient step:
• Fixed points are minimizers:
Proximal Gradient Algorithm
• Objective split
• Iteration
• Special cases:
– g=0: usual gradient descent
– f=0: proximal algorithm
– g= indicator of convex set: projected gradient descent (constrained
optimization)
• Only works if the proximal operator can be evaluated efficiently!
PointSAGA: Stochastic Prox with Variance Reduction
Defazio 2016
MP-SAGA: Stochastic Prox with Variance Reduction
Panahi, Dubhashi, ICML 2017, (2019 under review: proximal operator in closed form!
SGD for Deep Learning
• SGD variants (Adagrad, RMSprop, Adam …) used to train
neural networks.
• Use aggressive adaptation with different learning rates for
different parameters.
• Theory says it shouldn’t work for highly nonconvex problems!
• But Adagrad greatly improved the robustness of SGD and
Google used it for training large-scale neural nets to recognize
cats
Variance Reduction for Deep Learning
Defazio, Bottou, 2019
References
Frances Bach Tutorials/Short Courses: https://www.di.ens.fr/~fbach/

More Related Content

Similar to Optforml

Chromatic Sparse Learning
Chromatic Sparse LearningChromatic Sparse Learning
Chromatic Sparse LearningDatabricks
 
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014Manchor Ko
 
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
Disentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative ModelsDisentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative ModelsRyohei Suzuki
 
30thSep2014
30thSep201430thSep2014
30thSep2014Mia liu
 
Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxIvo Andreev
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptxssuserf07225
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsMark Peng
 
Webpage Personalization and User Profiling
Webpage Personalization and User ProfilingWebpage Personalization and User Profiling
Webpage Personalization and User Profilingyingfeng
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksSangwoo Mo
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hakky St
 
Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2yannabraham
 
Sparse Graph Attention Networks 2021.pptx
Sparse Graph Attention Networks 2021.pptxSparse Graph Attention Networks 2021.pptx
Sparse Graph Attention Networks 2021.pptxssuser2624f71
 
Final Presentation - Edan&Itzik
Final Presentation - Edan&ItzikFinal Presentation - Edan&Itzik
Final Presentation - Edan&Itzikitzik cohen
 
MLSEV. Logistic Regression, Deepnets, and Time Series
MLSEV. Logistic Regression, Deepnets, and Time Series MLSEV. Logistic Regression, Deepnets, and Time Series
MLSEV. Logistic Regression, Deepnets, and Time Series BigML, Inc
 
Learning visual representation without human label
Learning visual representation without human labelLearning visual representation without human label
Learning visual representation without human labelKai-Wen Zhao
 

Similar to Optforml (20)

Chromatic Sparse Learning
Chromatic Sparse LearningChromatic Sparse Learning
Chromatic Sparse Learning
 
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
 
Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014
 
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
 
Disentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative ModelsDisentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative Models
 
30thSep2014
30thSep201430thSep2014
30thSep2014
 
Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...
Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...
Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...
 
Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackbox
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptx
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
Webpage Personalization and User Profiling
Webpage Personalization and User ProfilingWebpage Personalization and User Profiling
Webpage Personalization and User Profiling
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural Networks
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
 
Machine Learning - Supervised Learning
Machine Learning - Supervised LearningMachine Learning - Supervised Learning
Machine Learning - Supervised Learning
 
Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2
 
Sparse Graph Attention Networks 2021.pptx
Sparse Graph Attention Networks 2021.pptxSparse Graph Attention Networks 2021.pptx
Sparse Graph Attention Networks 2021.pptx
 
Final Presentation - Edan&Itzik
Final Presentation - Edan&ItzikFinal Presentation - Edan&Itzik
Final Presentation - Edan&Itzik
 
MLSEV. Logistic Regression, Deepnets, and Time Series
MLSEV. Logistic Regression, Deepnets, and Time Series MLSEV. Logistic Regression, Deepnets, and Time Series
MLSEV. Logistic Regression, Deepnets, and Time Series
 
Learning visual representation without human label
Learning visual representation without human labelLearning visual representation without human label
Learning visual representation without human label
 

More from Devdatt Dubhashi

More from Devdatt Dubhashi (7)

Corr clust-kiel
Corr clust-kielCorr clust-kiel
Corr clust-kiel
 
Dig18
Dig18Dig18
Dig18
 
Ai energy
Ai energyAi energy
Ai energy
 
Ai finance
Ai financeAi finance
Ai finance
 
AI: The New Electricity to Harness Our Digital Future
AI: The New Electricity to Harness Our Digital FutureAI: The New Electricity to Harness Our Digital Future
AI: The New Electricity to Harness Our Digital Future
 
Sicsds
SicsdsSicsds
Sicsds
 
Sins2016
Sins2016Sins2016
Sins2016
 

Recently uploaded

定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 

Recently uploaded (20)

定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 

Optforml

  • 1. Basic Concepts of Large Scale Optimization for Machine Learning Devdatt Dubhashi AI and Data Science Computer Science and Engineering Chalmers Machine Intelligence Sweden AB
  • 2. Behind the Cat Pictures … • Amazing successes of ML in computer vision, natural language processing … • Underneath the hood is optimization • Large scale machine learning: – large n (data points) – large d (dimension)
  • 3. Minimization of Finite Sums • Assumptions on component functions: convex, smooth, …
  • 4. Empirical Risk Minimization (ERM) • Labelled training data: • Parametrized class of prediction functions: • Empirical Loss: min
  • 5. Data Driven Clustering • Given data points, cluster • Classic K-means algorithm • Needs to know k, number of clusters • Data driven clustering: find the right number of clusters driven by data. (Panahi, D: ICML 2017)
  • 6. Minimization of Finite Sums • Assumptions on component functions: convex, smooth, …
  • 7. Mother of all First Order Methods:Gradient Descent
  • 8. Gradient Descent Convergence However, GD is not viable for large scale ML because each iteration has cost nd
  • 9. Stochastic Gradient Descent (SGD) Robbins and Munro 1950 • Index sampled uniformly at random with replacement from [n] • Cost per iteration is d • Hugely successful in machine learning!
  • 10. Stochastic, Batch and Full Gradient Descent • Full GD: • Minibatch GD: • Stochastic GD:
  • 11. The Unreasonable Effectiveness of SGD • Very fast initial convergence • Cheap O(d) per iteration as opposed to O(nd) for full GD • Very slow at the end ... Convergence is only O(1/ 𝑘) for smooth and O(1/k) for smooth strongly convex functions. • … but we do not need to run the iterations to optimum, better to stop early (Bottou and Bosquet)
  • 12. SGD: Have the Cake and Eat it Too! (Bottou and Bosquet 2008)
  • 13. Variance/Noise Reduction Can we improve convergence of SGD?
  • 15.
  • 17.
  • 18. Nesterov Momentum for GD Y. Nesterov, Doklady 1983.
  • 19. Katyusha Momentum for SGD Z. Allen-Zhu, STOC 2017, JMLR 2018
  • 20. Non-smooth Objectives • What if the objective is non-smooth? • Need a proxy for gradients. LASSO SON Clustering
  • 21. Proximal Operator • Proximal operator: • Special case: projection: • Like a gradient step: • Fixed points are minimizers:
  • 22. Proximal Gradient Algorithm • Objective split • Iteration • Special cases: – g=0: usual gradient descent – f=0: proximal algorithm – g= indicator of convex set: projected gradient descent (constrained optimization) • Only works if the proximal operator can be evaluated efficiently!
  • 23. PointSAGA: Stochastic Prox with Variance Reduction Defazio 2016
  • 24. MP-SAGA: Stochastic Prox with Variance Reduction Panahi, Dubhashi, ICML 2017, (2019 under review: proximal operator in closed form!
  • 25. SGD for Deep Learning • SGD variants (Adagrad, RMSprop, Adam …) used to train neural networks. • Use aggressive adaptation with different learning rates for different parameters. • Theory says it shouldn’t work for highly nonconvex problems! • But Adagrad greatly improved the robustness of SGD and Google used it for training large-scale neural nets to recognize cats
  • 26. Variance Reduction for Deep Learning Defazio, Bottou, 2019
  • 27. References Frances Bach Tutorials/Short Courses: https://www.di.ens.fr/~fbach/