SlideShare a Scribd company logo
ICML2012読み会:
Scaling Up Coordinate Descent Algorithms for
       Large L1 regularization Problems
               2012-07-28
             Yoshihiko Suhara
              @sleepy_yoshi
読む論文
• Scaling Up Coordinate Descent Algorithms for
  Large L1 regularization Problems
  – by C. Scherrer, M. Halappanavar, A. Tewari, D.
    Haglin


• Coordinate Descent の並列計算
  – [Bradley+ 11] Parallel Coordinate Descent for L1-
    Regularized Loss Minimization (ICML2011) とか

                                                        2
概要
• 共有メモリマルチコア環境におけるParallel
  Coordinate Descentの一般化フレームワークを紹
  介

• 以下の2つの手法を提案
 – Thread-Greedy Coordinate Descent
 – Coloring-Based Coordinate Descent

• Parallel CDの4手法を実験で比較
 – Thread-Greedy が思いのほかよかった

                                       3
L1正則化損失関数の最適化
• L1正則化損失関数
              𝑛
            1
        min     ℓ 𝒚 𝑖 , 𝑿𝒘   𝑖   + 𝜆 𝒘   1
         𝒘 𝑛
                𝑖=1
• ここで
  – 𝑿 ∈ ℝ 𝑛×𝑘 : 計画行列
  – 𝒘 ∈ ℝ 𝑘 : 重みベクトル
  – ℓ(𝑦,⋅): 微分可能な凸関数

• たとえば
  – Lasso (L1 + 二乗誤差)
  – L1正則化ロジスティック回帰


                                             4
記法
𝑿 = 𝑿1 , 𝑿2 , … , 𝑿 𝑗 , … 𝑿 𝑘
  𝒆 𝑗 = 0, 0, … , 1, … , 0 𝑇
                𝒙1𝑇
                𝒙2𝑇
        𝑿=       ⋮
                𝒙 𝑖𝑇
                 ⋮
                𝒙𝑇 𝑛


                                5
補足: Coordinate Descent
• 座標降下法とも呼ばれる (?)
• 選択された次元に対して直線探索
• いろんな次元の選び方
 – 例) Cyclic Coordinate Descent
• 並列計算する場合には全次元の部分集合を選択して更新




                                  6
GenCD: A Generic Framework for
  Parallel Coordinate Descent


                   なぜかここから英語



                                 7
Generic Coordinate Descent (GenCD)




                                     8
Step 1: Select
• Selecting 𝐽 coordinates
• The selection criteria differs for variations of CD
  techniques
   – cyclic CD (CCD)
   – stochastic CD (SCD)
      • selection of a singlton
   – fully greedy CD
      • 𝐽 = {1, … , 𝑘}
   – Shotgun [Bradley+ 11]
      • selects a random subset of a given size

                                                        9
Step 2: Propose
• Propose step computes a proposed increment 𝛿 𝑗 for
  each 𝑗 ∈ 𝐽.
   – this step does not actually change the weights

• In Step 2, we maintain a vector 𝝋 ∈ ℝ 𝑘 , where 𝝋 𝑗 is a
  proxy for the objective function evaluated at 𝒘 + 𝜹 𝑗 𝒆 𝑗
   – update 𝝋 𝑗 whenever a new proposal is calculated for j
   – 𝝋 is not necessary if the algorithm will accepts all
     proposals



                                                              10
Step 3: Accept
• In Accept step, the algorithm accepts 𝐽′ ⊆ 𝐽
   –  [Bradley+ 11] show correlations among features can
     lead to divergence if too many coordinates are updated at
     once (see below figure)

• In CCD, SCD, Shotgun, the algorithm allows all
  proposals to be accepted
   – No need to calculate 𝝋




                                                             11
Step 4: Update
• In Update step, the algorithm updates
  according to the set 𝐽′




 𝑿𝒘 を保持



                                          12
Approximate Minimization (1/2)
• Propose step calculates a proposed increment
   𝜹 𝑗 for each 𝑗 ∈ 𝐽
        𝛿 = argmin 𝛿 𝐹 𝒘 + 𝛿𝒆 𝑗 + 𝜆|𝒘 𝑗 + 𝛿|
                       1    𝑛
      where, 𝐹 𝒘 =         𝑖=1 ℓ   𝒚 𝑖 , 𝑿𝒘   𝑖
                       𝑛


•  For a general loss function, there is no
  closed-form solution along a given coordinate.
  – Thus, consider approximate minimization

                                                  13
Approximate Minimization (2/2)
• Well known minimizer (e.g., [Yuan and Lin 10])

                    𝛻𝑗 𝐹 𝒘 − 𝜆 𝛻𝑗 𝐹 𝒘 + 𝜆
       𝛿 = −𝜓   𝒘𝑗;           ,
                         𝛽          𝛽
                           𝑎         if 𝑥 < 𝑎
        where, 𝜓 𝑥; 𝑎, 𝑏 = 𝑏         if 𝑥 > 𝑏
                           𝑥        otherwise

                for squared loss 𝛽 = 1, logistic loss 𝛽 = 1/4.


                                                           14
Step 2: Propose (Approximated)
                                                     ′
                                                𝑖ℓ       𝒚 𝑖 ,𝒛 𝑖 ,𝑿 𝑗
                                                                         ?
                                                          𝑛




       Decrease in the approximated objective
                                                                             15
Experiments



              16
Algorithms (conventional)
• SHOTGUN [Bradley+ 11]
   – Select step: random subset of the columns
   – Accept step: accepts every proposal
       • No need to compute a proxy for the objective
   – convergence is guaranteed only if the # of coordinates selected
     is at most 𝑃 ∗ = 𝑘 (*1)
                       2𝜌


• GREEDY
   – Select step: all coordinates
   – Propose step: each thread generating proposals for some subset
     of the coordinates using approximation
   – Accept step: Only accepts the single best among the all threads.


                                     (*1) 𝜌 is the matrix eigenvalue of 𝑿 𝑇 𝑿   17
Comparisons of the Algorithms




                                18
Algorithms (proposed)
• THREAD-GREEDY
   – Select step: random set of coordinates (?)
   – Propose step: each thread generating proposals for some subset of the
     coordinates using approximation
   – Accept step: Each thread accepts the best of the proposals
   – no proof for convergence (however, empirical results are encouraging)

• COLORING
   – Preprocessing: structurally independent features are identified via
     partial distance-2 coloring
   – Select step: a random color is selected
   – Accept step: accepts every proposal
       • since the features are disjoint.




                                                                           19
Implementation and Platform
• Implementation
  – gcc with OpenMP
     • -O3 -fopenmp flags
     • parallel for pragma
     • static scheduling
         – Given n iterations and p threads, each thread gets n/p iterations


• Platform
  – AMD Opteron (Magny-Cours)
     • with 48 cores (12 cores x 4 sockets)
  – 256GB Memory

                                                                               20
Datasets




(Number of Non-Zero)
                        21
Convergence rates




ナゼカワカラナイ




                               22
Scalability




              23
Summary
• Presented GenCD, a generic framework for
  expressing parallel coordinate descent
  – Select, Propose, Accept, Upadte

• Performs convergence and scalability tests for the
  four algorithms
  – but the authors do not favor any of these algorithms
    over the others

• The condition for convergence of the THREAD-
  GREEDY algorithm is an open question
                                                           24
References
• [Yuan and Lin 10] G. Yuan, C. Lin, “A Comparison of Opitmization Methods
  and Software for Large-scale L1-regularized Linear Classification”, Journal
  of Machine Learning Research, vol.11, pp.3183-3234, 2010.
• [Bradley+ 11] J. K. Bradley, A. Kyrola, D. Bickson, C. Guestrin, “Parallel
  Coordinate Descent for L1-Regularized Loss Minimization”, In Proc. ICML
  ‘11, 2011.




                                                                            25
おわり




      26

More Related Content

What's hot

PMF BPMF and BPTF
PMF BPMF and BPTFPMF BPMF and BPTF
PMF BPMF and BPTF
Pei-Che Chang
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
NAVER Engineering
 
Multiclass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark ExamplesMulticlass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark Examples
Marjan Sterjev
 
Cheatsheet deep-learning-tips-tricks
Cheatsheet deep-learning-tips-tricksCheatsheet deep-learning-tips-tricks
Cheatsheet deep-learning-tips-tricks
Steve Nouri
 
K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture modelsVu Pham
 
Brief intro : Invariance and Equivariance
Brief intro : Invariance and EquivarianceBrief intro : Invariance and Equivariance
Brief intro : Invariance and Equivariance
홍배 김
 
Cheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networksCheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networks
Steve Nouri
 
Julia Kreutzer - 2017 - Bandit Structured Prediction for Neural Seq2Seq Learning
Julia Kreutzer - 2017 - Bandit Structured Prediction for Neural Seq2Seq LearningJulia Kreutzer - 2017 - Bandit Structured Prediction for Neural Seq2Seq Learning
Julia Kreutzer - 2017 - Bandit Structured Prediction for Neural Seq2Seq Learning
Association for Computational Linguistics
 
그림 그리는 AI
그림 그리는 AI그림 그리는 AI
그림 그리는 AI
NAVER Engineering
 
Large scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using sparkLarge scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using spark
Mila, Université de Montréal
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Vivian S. Zhang
 
Variational Autoencoders For Image Generation
Variational Autoencoders For Image GenerationVariational Autoencoders For Image Generation
Variational Autoencoders For Image Generation
Jason Anderson
 
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid ParallelismDS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
Parameswaran Raman
 
2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark
DB Tsai
 
Gaussian processing
Gaussian processingGaussian processing
Gaussian processing
홍배 김
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
홍배 김
 
MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3
arogozhnikov
 
Image transforms 2
Image transforms 2Image transforms 2
Image transforms 2
Ali Baig
 
Introduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi ChenIntroduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi Chen
Zhuyi Xue
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
DB Tsai
 

What's hot (20)

PMF BPMF and BPTF
PMF BPMF and BPTFPMF BPMF and BPTF
PMF BPMF and BPTF
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
 
Multiclass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark ExamplesMulticlass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark Examples
 
Cheatsheet deep-learning-tips-tricks
Cheatsheet deep-learning-tips-tricksCheatsheet deep-learning-tips-tricks
Cheatsheet deep-learning-tips-tricks
 
K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture models
 
Brief intro : Invariance and Equivariance
Brief intro : Invariance and EquivarianceBrief intro : Invariance and Equivariance
Brief intro : Invariance and Equivariance
 
Cheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networksCheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networks
 
Julia Kreutzer - 2017 - Bandit Structured Prediction for Neural Seq2Seq Learning
Julia Kreutzer - 2017 - Bandit Structured Prediction for Neural Seq2Seq LearningJulia Kreutzer - 2017 - Bandit Structured Prediction for Neural Seq2Seq Learning
Julia Kreutzer - 2017 - Bandit Structured Prediction for Neural Seq2Seq Learning
 
그림 그리는 AI
그림 그리는 AI그림 그리는 AI
그림 그리는 AI
 
Large scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using sparkLarge scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using spark
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
 
Variational Autoencoders For Image Generation
Variational Autoencoders For Image GenerationVariational Autoencoders For Image Generation
Variational Autoencoders For Image Generation
 
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid ParallelismDS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
 
2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark
 
Gaussian processing
Gaussian processingGaussian processing
Gaussian processing
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3
 
Image transforms 2
Image transforms 2Image transforms 2
Image transforms 2
 
Introduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi ChenIntroduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi Chen
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
 

Viewers also liked

ICML2013読み会: Distributed training of Large-scale Logistic models
ICML2013読み会: Distributed training of Large-scale Logistic modelsICML2013読み会: Distributed training of Large-scale Logistic models
ICML2013読み会: Distributed training of Large-scale Logistic models
sleepy_yoshi
 
PRML復々習レーン#2 2.3.6 - 2.3.7
PRML復々習レーン#2 2.3.6 - 2.3.7PRML復々習レーン#2 2.3.6 - 2.3.7
PRML復々習レーン#2 2.3.6 - 2.3.7
sleepy_yoshi
 
PRML復々習レーン#3 3.1.3-3.1.5
PRML復々習レーン#3 3.1.3-3.1.5PRML復々習レーン#3 3.1.3-3.1.5
PRML復々習レーン#3 3.1.3-3.1.5
sleepy_yoshi
 
PRML復々習レーン#3 前回までのあらすじ
PRML復々習レーン#3 前回までのあらすじPRML復々習レーン#3 前回までのあらすじ
PRML復々習レーン#3 前回までのあらすじ
sleepy_yoshi
 
WSDM2012読み会: Learning to Rank with Multi-Aspect Relevance for Vertical Search
WSDM2012読み会: Learning to Rank with Multi-Aspect Relevance for Vertical SearchWSDM2012読み会: Learning to Rank with Multi-Aspect Relevance for Vertical Search
WSDM2012読み会: Learning to Rank with Multi-Aspect Relevance for Vertical Search
sleepy_yoshi
 
SVM実践ガイド (A Practical Guide to Support Vector Classification)
SVM実践ガイド (A Practical Guide to Support Vector Classification)SVM実践ガイド (A Practical Guide to Support Vector Classification)
SVM実践ガイド (A Practical Guide to Support Vector Classification)
sleepy_yoshi
 

Viewers also liked (6)

ICML2013読み会: Distributed training of Large-scale Logistic models
ICML2013読み会: Distributed training of Large-scale Logistic modelsICML2013読み会: Distributed training of Large-scale Logistic models
ICML2013読み会: Distributed training of Large-scale Logistic models
 
PRML復々習レーン#2 2.3.6 - 2.3.7
PRML復々習レーン#2 2.3.6 - 2.3.7PRML復々習レーン#2 2.3.6 - 2.3.7
PRML復々習レーン#2 2.3.6 - 2.3.7
 
PRML復々習レーン#3 3.1.3-3.1.5
PRML復々習レーン#3 3.1.3-3.1.5PRML復々習レーン#3 3.1.3-3.1.5
PRML復々習レーン#3 3.1.3-3.1.5
 
PRML復々習レーン#3 前回までのあらすじ
PRML復々習レーン#3 前回までのあらすじPRML復々習レーン#3 前回までのあらすじ
PRML復々習レーン#3 前回までのあらすじ
 
WSDM2012読み会: Learning to Rank with Multi-Aspect Relevance for Vertical Search
WSDM2012読み会: Learning to Rank with Multi-Aspect Relevance for Vertical SearchWSDM2012読み会: Learning to Rank with Multi-Aspect Relevance for Vertical Search
WSDM2012読み会: Learning to Rank with Multi-Aspect Relevance for Vertical Search
 
SVM実践ガイド (A Practical Guide to Support Vector Classification)
SVM実践ガイド (A Practical Guide to Support Vector Classification)SVM実践ガイド (A Practical Guide to Support Vector Classification)
SVM実践ガイド (A Practical Guide to Support Vector Classification)
 

Similar to ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularization Problems

Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
ChenYiHuang5
 
cos323_s06_lecture03_optimization.ppt
cos323_s06_lecture03_optimization.pptcos323_s06_lecture03_optimization.ppt
cos323_s06_lecture03_optimization.ppt
devesh604174
 
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
Distributional RL via Moment Matching
Distributional RL via Moment MatchingDistributional RL via Moment Matching
Distributional RL via Moment Matching
taeseon ryu
 
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Atsushi Nitanda
 
Coordinate Descent method
Coordinate Descent methodCoordinate Descent method
Coordinate Descent method
Sanghyuk Chun
 
Parallel Optimization in Machine Learning
Parallel Optimization in Machine LearningParallel Optimization in Machine Learning
Parallel Optimization in Machine Learning
Fabian Pedregosa
 
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptx
PrabhuSelvaraj15
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
Jason Riedy
 
Exploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal WabbitExploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal WabbitShiladitya Sen
 
Oh2423312334
Oh2423312334Oh2423312334
Oh2423312334
IJERA Editor
 
Reinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsReinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural Nets
Pierre de Lacaze
 
Parallel Algorithms for Geometric Graph Problems (at Stanford)
Parallel Algorithms for Geometric Graph Problems (at Stanford)Parallel Algorithms for Geometric Graph Problems (at Stanford)
Parallel Algorithms for Geometric Graph Problems (at Stanford)Grigory Yaroslavtsev
 
adversarial robustness lecture
adversarial robustness lectureadversarial robustness lecture
adversarial robustness lecture
MuhammadAhmedShah2
 
Deep Learning for Computer Vision: Optimization (UPC 2016)
Deep Learning for Computer Vision: Optimization (UPC 2016)Deep Learning for Computer Vision: Optimization (UPC 2016)
Deep Learning for Computer Vision: Optimization (UPC 2016)
Universitat Politècnica de Catalunya
 
Lecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptxLecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptx
gnans Kgnanshek
 
Regression.pptx
Regression.pptxRegression.pptx
Regression.pptx
Tigabu Yaya
 
Regression.pptx
Regression.pptxRegression.pptx
Regression.pptx
tayyaba19799
 
Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!
ChenYiHuang5
 

Similar to ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularization Problems (20)

Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
cos323_s06_lecture03_optimization.ppt
cos323_s06_lecture03_optimization.pptcos323_s06_lecture03_optimization.ppt
cos323_s06_lecture03_optimization.ppt
 
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
 
Distributional RL via Moment Matching
Distributional RL via Moment MatchingDistributional RL via Moment Matching
Distributional RL via Moment Matching
 
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
 
Coordinate Descent method
Coordinate Descent methodCoordinate Descent method
Coordinate Descent method
 
Parallel Optimization in Machine Learning
Parallel Optimization in Machine LearningParallel Optimization in Machine Learning
Parallel Optimization in Machine Learning
 
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptx
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
 
Exploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal WabbitExploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal Wabbit
 
Oh2423312334
Oh2423312334Oh2423312334
Oh2423312334
 
Reinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsReinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural Nets
 
Parallel Algorithms for Geometric Graph Problems (at Stanford)
Parallel Algorithms for Geometric Graph Problems (at Stanford)Parallel Algorithms for Geometric Graph Problems (at Stanford)
Parallel Algorithms for Geometric Graph Problems (at Stanford)
 
adversarial robustness lecture
adversarial robustness lectureadversarial robustness lecture
adversarial robustness lecture
 
Deep Learning for Computer Vision: Optimization (UPC 2016)
Deep Learning for Computer Vision: Optimization (UPC 2016)Deep Learning for Computer Vision: Optimization (UPC 2016)
Deep Learning for Computer Vision: Optimization (UPC 2016)
 
Lecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptxLecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptx
 
Regression.pptx
Regression.pptxRegression.pptx
Regression.pptx
 
Regression.pptx
Regression.pptxRegression.pptx
Regression.pptx
 
Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!
 

More from sleepy_yoshi

KDD2014勉強会: Large-Scale High-Precision Topic Modeling on Twitter
KDD2014勉強会: Large-Scale High-Precision Topic Modeling on TwitterKDD2014勉強会: Large-Scale High-Precision Topic Modeling on Twitter
KDD2014勉強会: Large-Scale High-Precision Topic Modeling on Twitter
sleepy_yoshi
 
KDD2013読み会: Direct Optimization of Ranking Measures
KDD2013読み会: Direct Optimization of Ranking MeasuresKDD2013読み会: Direct Optimization of Ranking Measures
KDD2013読み会: Direct Optimization of Ranking Measures
sleepy_yoshi
 
PRML復々習レーン#15 前回までのあらすじ
PRML復々習レーン#15 前回までのあらすじPRML復々習レーン#15 前回までのあらすじ
PRML復々習レーン#15 前回までのあらすじ
sleepy_yoshi
 
PRML復々習レーン#14 前回までのあらすじ
PRML復々習レーン#14 前回までのあらすじPRML復々習レーン#14 前回までのあらすじ
PRML復々習レーン#14 前回までのあらすじ
sleepy_yoshi
 
PRML復々習レーン#13 前回までのあらすじ
PRML復々習レーン#13 前回までのあらすじPRML復々習レーン#13 前回までのあらすじ
PRML復々習レーン#13 前回までのあらすじ
sleepy_yoshi
 
PRML復々習レーン#12 前回までのあらすじ
PRML復々習レーン#12 前回までのあらすじPRML復々習レーン#12 前回までのあらすじ
PRML復々習レーン#12 前回までのあらすじ
sleepy_yoshi
 
SEXI2013読み会: Adult Query Classification for Web Search and Recommendation
SEXI2013読み会: Adult Query Classification for Web Search and RecommendationSEXI2013読み会: Adult Query Classification for Web Search and Recommendation
SEXI2013読み会: Adult Query Classification for Web Search and Recommendation
sleepy_yoshi
 
計算論的学習理論入門 -PAC学習とかVC次元とか-
計算論的学習理論入門 -PAC学習とかVC次元とか-計算論的学習理論入門 -PAC学習とかVC次元とか-
計算論的学習理論入門 -PAC学習とかVC次元とか-
sleepy_yoshi
 
PRML復々習レーン#11 前回までのあらすじ
PRML復々習レーン#11 前回までのあらすじPRML復々習レーン#11 前回までのあらすじ
PRML復々習レーン#11 前回までのあらすじ
sleepy_yoshi
 
SMO徹底入門 - SVMをちゃんと実装する
SMO徹底入門 - SVMをちゃんと実装するSMO徹底入門 - SVMをちゃんと実装する
SMO徹底入門 - SVMをちゃんと実装する
sleepy_yoshi
 
PRML復々習レーン#10 前回までのあらすじ
PRML復々習レーン#10 前回までのあらすじPRML復々習レーン#10 前回までのあらすじ
PRML復々習レーン#10 前回までのあらすじ
sleepy_yoshi
 
PRML復々習レーン#10 7.1.3-7.1.5
PRML復々習レーン#10 7.1.3-7.1.5PRML復々習レーン#10 7.1.3-7.1.5
PRML復々習レーン#10 7.1.3-7.1.5
sleepy_yoshi
 
PRML復々習レーン#9 6.3-6.3.1
PRML復々習レーン#9 6.3-6.3.1PRML復々習レーン#9 6.3-6.3.1
PRML復々習レーン#9 6.3-6.3.1
sleepy_yoshi
 
PRML復々習レーン#9 前回までのあらすじ
PRML復々習レーン#9 前回までのあらすじPRML復々習レーン#9 前回までのあらすじ
PRML復々習レーン#9 前回までのあらすじ
sleepy_yoshi
 
PRML復々習レーン#7 前回までのあらすじ
PRML復々習レーン#7 前回までのあらすじPRML復々習レーン#7 前回までのあらすじ
PRML復々習レーン#7 前回までのあらすじ
sleepy_yoshi
 
SIGIR2012勉強会 23 Learning to Rank
SIGIR2012勉強会 23 Learning to RankSIGIR2012勉強会 23 Learning to Rank
SIGIR2012勉強会 23 Learning to Rank
sleepy_yoshi
 
DSIRNLP#3 LT: 辞書挟み込み型転置インデクスFIg4.5
DSIRNLP#3 LT: 辞書挟み込み型転置インデクスFIg4.5DSIRNLP#3 LT: 辞書挟み込み型転置インデクスFIg4.5
DSIRNLP#3 LT: 辞書挟み込み型転置インデクスFIg4.5
sleepy_yoshi
 
Collaborative Ranking: A Case Study on Entity Ranking (EMNLP2011読み会)
Collaborative Ranking: A Case Study on Entity Ranking (EMNLP2011読み会)Collaborative Ranking: A Case Study on Entity Ranking (EMNLP2011読み会)
Collaborative Ranking: A Case Study on Entity Ranking (EMNLP2011読み会)
sleepy_yoshi
 
SIGIR2011読み会 3. Learning to Rank
SIGIR2011読み会 3. Learning to RankSIGIR2011読み会 3. Learning to Rank
SIGIR2011読み会 3. Learning to Rank
sleepy_yoshi
 
TokyoNLP#7 きれいなジャイアンのカカカカ☆カーネル法入門-C++
TokyoNLP#7 きれいなジャイアンのカカカカ☆カーネル法入門-C++TokyoNLP#7 きれいなジャイアンのカカカカ☆カーネル法入門-C++
TokyoNLP#7 きれいなジャイアンのカカカカ☆カーネル法入門-C++
sleepy_yoshi
 

More from sleepy_yoshi (20)

KDD2014勉強会: Large-Scale High-Precision Topic Modeling on Twitter
KDD2014勉強会: Large-Scale High-Precision Topic Modeling on TwitterKDD2014勉強会: Large-Scale High-Precision Topic Modeling on Twitter
KDD2014勉強会: Large-Scale High-Precision Topic Modeling on Twitter
 
KDD2013読み会: Direct Optimization of Ranking Measures
KDD2013読み会: Direct Optimization of Ranking MeasuresKDD2013読み会: Direct Optimization of Ranking Measures
KDD2013読み会: Direct Optimization of Ranking Measures
 
PRML復々習レーン#15 前回までのあらすじ
PRML復々習レーン#15 前回までのあらすじPRML復々習レーン#15 前回までのあらすじ
PRML復々習レーン#15 前回までのあらすじ
 
PRML復々習レーン#14 前回までのあらすじ
PRML復々習レーン#14 前回までのあらすじPRML復々習レーン#14 前回までのあらすじ
PRML復々習レーン#14 前回までのあらすじ
 
PRML復々習レーン#13 前回までのあらすじ
PRML復々習レーン#13 前回までのあらすじPRML復々習レーン#13 前回までのあらすじ
PRML復々習レーン#13 前回までのあらすじ
 
PRML復々習レーン#12 前回までのあらすじ
PRML復々習レーン#12 前回までのあらすじPRML復々習レーン#12 前回までのあらすじ
PRML復々習レーン#12 前回までのあらすじ
 
SEXI2013読み会: Adult Query Classification for Web Search and Recommendation
SEXI2013読み会: Adult Query Classification for Web Search and RecommendationSEXI2013読み会: Adult Query Classification for Web Search and Recommendation
SEXI2013読み会: Adult Query Classification for Web Search and Recommendation
 
計算論的学習理論入門 -PAC学習とかVC次元とか-
計算論的学習理論入門 -PAC学習とかVC次元とか-計算論的学習理論入門 -PAC学習とかVC次元とか-
計算論的学習理論入門 -PAC学習とかVC次元とか-
 
PRML復々習レーン#11 前回までのあらすじ
PRML復々習レーン#11 前回までのあらすじPRML復々習レーン#11 前回までのあらすじ
PRML復々習レーン#11 前回までのあらすじ
 
SMO徹底入門 - SVMをちゃんと実装する
SMO徹底入門 - SVMをちゃんと実装するSMO徹底入門 - SVMをちゃんと実装する
SMO徹底入門 - SVMをちゃんと実装する
 
PRML復々習レーン#10 前回までのあらすじ
PRML復々習レーン#10 前回までのあらすじPRML復々習レーン#10 前回までのあらすじ
PRML復々習レーン#10 前回までのあらすじ
 
PRML復々習レーン#10 7.1.3-7.1.5
PRML復々習レーン#10 7.1.3-7.1.5PRML復々習レーン#10 7.1.3-7.1.5
PRML復々習レーン#10 7.1.3-7.1.5
 
PRML復々習レーン#9 6.3-6.3.1
PRML復々習レーン#9 6.3-6.3.1PRML復々習レーン#9 6.3-6.3.1
PRML復々習レーン#9 6.3-6.3.1
 
PRML復々習レーン#9 前回までのあらすじ
PRML復々習レーン#9 前回までのあらすじPRML復々習レーン#9 前回までのあらすじ
PRML復々習レーン#9 前回までのあらすじ
 
PRML復々習レーン#7 前回までのあらすじ
PRML復々習レーン#7 前回までのあらすじPRML復々習レーン#7 前回までのあらすじ
PRML復々習レーン#7 前回までのあらすじ
 
SIGIR2012勉強会 23 Learning to Rank
SIGIR2012勉強会 23 Learning to RankSIGIR2012勉強会 23 Learning to Rank
SIGIR2012勉強会 23 Learning to Rank
 
DSIRNLP#3 LT: 辞書挟み込み型転置インデクスFIg4.5
DSIRNLP#3 LT: 辞書挟み込み型転置インデクスFIg4.5DSIRNLP#3 LT: 辞書挟み込み型転置インデクスFIg4.5
DSIRNLP#3 LT: 辞書挟み込み型転置インデクスFIg4.5
 
Collaborative Ranking: A Case Study on Entity Ranking (EMNLP2011読み会)
Collaborative Ranking: A Case Study on Entity Ranking (EMNLP2011読み会)Collaborative Ranking: A Case Study on Entity Ranking (EMNLP2011読み会)
Collaborative Ranking: A Case Study on Entity Ranking (EMNLP2011読み会)
 
SIGIR2011読み会 3. Learning to Rank
SIGIR2011読み会 3. Learning to RankSIGIR2011読み会 3. Learning to Rank
SIGIR2011読み会 3. Learning to Rank
 
TokyoNLP#7 きれいなジャイアンのカカカカ☆カーネル法入門-C++
TokyoNLP#7 きれいなジャイアンのカカカカ☆カーネル法入門-C++TokyoNLP#7 きれいなジャイアンのカカカカ☆カーネル法入門-C++
TokyoNLP#7 きれいなジャイアンのカカカカ☆カーネル法入門-C++
 

Recently uploaded

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 

Recently uploaded (20)

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 

ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularization Problems

  • 1. ICML2012読み会: Scaling Up Coordinate Descent Algorithms for Large L1 regularization Problems 2012-07-28 Yoshihiko Suhara @sleepy_yoshi
  • 2. 読む論文 • Scaling Up Coordinate Descent Algorithms for Large L1 regularization Problems – by C. Scherrer, M. Halappanavar, A. Tewari, D. Haglin • Coordinate Descent の並列計算 – [Bradley+ 11] Parallel Coordinate Descent for L1- Regularized Loss Minimization (ICML2011) とか 2
  • 3. 概要 • 共有メモリマルチコア環境におけるParallel Coordinate Descentの一般化フレームワークを紹 介 • 以下の2つの手法を提案 – Thread-Greedy Coordinate Descent – Coloring-Based Coordinate Descent • Parallel CDの4手法を実験で比較 – Thread-Greedy が思いのほかよかった 3
  • 4. L1正則化損失関数の最適化 • L1正則化損失関数 𝑛 1 min ℓ 𝒚 𝑖 , 𝑿𝒘 𝑖 + 𝜆 𝒘 1 𝒘 𝑛 𝑖=1 • ここで – 𝑿 ∈ ℝ 𝑛×𝑘 : 計画行列 – 𝒘 ∈ ℝ 𝑘 : 重みベクトル – ℓ(𝑦,⋅): 微分可能な凸関数 • たとえば – Lasso (L1 + 二乗誤差) – L1正則化ロジスティック回帰 4
  • 5. 記法 𝑿 = 𝑿1 , 𝑿2 , … , 𝑿 𝑗 , … 𝑿 𝑘 𝒆 𝑗 = 0, 0, … , 1, … , 0 𝑇 𝒙1𝑇 𝒙2𝑇 𝑿= ⋮ 𝒙 𝑖𝑇 ⋮ 𝒙𝑇 𝑛 5
  • 6. 補足: Coordinate Descent • 座標降下法とも呼ばれる (?) • 選択された次元に対して直線探索 • いろんな次元の選び方 – 例) Cyclic Coordinate Descent • 並列計算する場合には全次元の部分集合を選択して更新 6
  • 7. GenCD: A Generic Framework for Parallel Coordinate Descent なぜかここから英語 7
  • 9. Step 1: Select • Selecting 𝐽 coordinates • The selection criteria differs for variations of CD techniques – cyclic CD (CCD) – stochastic CD (SCD) • selection of a singlton – fully greedy CD • 𝐽 = {1, … , 𝑘} – Shotgun [Bradley+ 11] • selects a random subset of a given size 9
  • 10. Step 2: Propose • Propose step computes a proposed increment 𝛿 𝑗 for each 𝑗 ∈ 𝐽. – this step does not actually change the weights • In Step 2, we maintain a vector 𝝋 ∈ ℝ 𝑘 , where 𝝋 𝑗 is a proxy for the objective function evaluated at 𝒘 + 𝜹 𝑗 𝒆 𝑗 – update 𝝋 𝑗 whenever a new proposal is calculated for j – 𝝋 is not necessary if the algorithm will accepts all proposals 10
  • 11. Step 3: Accept • In Accept step, the algorithm accepts 𝐽′ ⊆ 𝐽 –  [Bradley+ 11] show correlations among features can lead to divergence if too many coordinates are updated at once (see below figure) • In CCD, SCD, Shotgun, the algorithm allows all proposals to be accepted – No need to calculate 𝝋 11
  • 12. Step 4: Update • In Update step, the algorithm updates according to the set 𝐽′ 𝑿𝒘 を保持 12
  • 13. Approximate Minimization (1/2) • Propose step calculates a proposed increment 𝜹 𝑗 for each 𝑗 ∈ 𝐽 𝛿 = argmin 𝛿 𝐹 𝒘 + 𝛿𝒆 𝑗 + 𝜆|𝒘 𝑗 + 𝛿| 1 𝑛 where, 𝐹 𝒘 = 𝑖=1 ℓ 𝒚 𝑖 , 𝑿𝒘 𝑖 𝑛 •  For a general loss function, there is no closed-form solution along a given coordinate. – Thus, consider approximate minimization 13
  • 14. Approximate Minimization (2/2) • Well known minimizer (e.g., [Yuan and Lin 10]) 𝛻𝑗 𝐹 𝒘 − 𝜆 𝛻𝑗 𝐹 𝒘 + 𝜆 𝛿 = −𝜓 𝒘𝑗; , 𝛽 𝛽 𝑎 if 𝑥 < 𝑎 where, 𝜓 𝑥; 𝑎, 𝑏 = 𝑏 if 𝑥 > 𝑏 𝑥 otherwise for squared loss 𝛽 = 1, logistic loss 𝛽 = 1/4. 14
  • 15. Step 2: Propose (Approximated) ′ 𝑖ℓ 𝒚 𝑖 ,𝒛 𝑖 ,𝑿 𝑗 ? 𝑛 Decrease in the approximated objective 15
  • 17. Algorithms (conventional) • SHOTGUN [Bradley+ 11] – Select step: random subset of the columns – Accept step: accepts every proposal • No need to compute a proxy for the objective – convergence is guaranteed only if the # of coordinates selected is at most 𝑃 ∗ = 𝑘 (*1) 2𝜌 • GREEDY – Select step: all coordinates – Propose step: each thread generating proposals for some subset of the coordinates using approximation – Accept step: Only accepts the single best among the all threads. (*1) 𝜌 is the matrix eigenvalue of 𝑿 𝑇 𝑿 17
  • 18. Comparisons of the Algorithms 18
  • 19. Algorithms (proposed) • THREAD-GREEDY – Select step: random set of coordinates (?) – Propose step: each thread generating proposals for some subset of the coordinates using approximation – Accept step: Each thread accepts the best of the proposals – no proof for convergence (however, empirical results are encouraging) • COLORING – Preprocessing: structurally independent features are identified via partial distance-2 coloring – Select step: a random color is selected – Accept step: accepts every proposal • since the features are disjoint. 19
  • 20. Implementation and Platform • Implementation – gcc with OpenMP • -O3 -fopenmp flags • parallel for pragma • static scheduling – Given n iterations and p threads, each thread gets n/p iterations • Platform – AMD Opteron (Magny-Cours) • with 48 cores (12 cores x 4 sockets) – 256GB Memory 20
  • 24. Summary • Presented GenCD, a generic framework for expressing parallel coordinate descent – Select, Propose, Accept, Upadte • Performs convergence and scalability tests for the four algorithms – but the authors do not favor any of these algorithms over the others • The condition for convergence of the THREAD- GREEDY algorithm is an open question 24
  • 25. References • [Yuan and Lin 10] G. Yuan, C. Lin, “A Comparison of Opitmization Methods and Software for Large-scale L1-regularized Linear Classification”, Journal of Machine Learning Research, vol.11, pp.3183-3234, 2010. • [Bradley+ 11] J. K. Bradley, A. Kyrola, D. Bickson, C. Guestrin, “Parallel Coordinate Descent for L1-Regularized Loss Minimization”, In Proc. ICML ‘11, 2011. 25
  • 26. おわり 26