Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Groups-Keeping Solution Path Algorithm For Sparse Regression

972 views

Published on

社内でKDD2017読み会をやったときの資料

Published in: Science
  • Be the first to comment

  • Be the first to like this

Groups-Keeping Solution Path Algorithm For Sparse Regression

  1. 1. KDD2017論文紹介 Group-Keeping Solution Path Algorithm for Sparse Regression with Automatic Feature Grouping データサイエンスグループ 吉永 尊洸
  2. 2. 論文概要 論文ジャンル スパースモデリング、特に特徴量間のグループ構造を自動抽出する手法であるOSCARに関する話 論文の背景・課題 グループ構造を「保ったまま」ハイパーパラメータ調整を行うアルゴリズムが知られていない 論文のメイン内容 グループ構造を「保ったまま」パラメータ調整を行うアルゴリズムを提案:OscarGKPath 注)グループ構造を壊さない方向しか探索しないので、本当に最適値を探索しているわけではありません 結論 全パラメータをグリッドサーチする場合と比較して、精度を損なわずに遥かに高速計算可能な アルゴリズムを提唱できた スパースモデリング
  3. 3. Contents • Introduction to sparse modeling • Review for OSCAR • Proposal : OscarGKPath • Results • Summary and Discussion
  4. 4. Contents • Introduction to sparse modeling • Review for OSCAR • Proposal : OscarGKPath • Results • Summary and Discussion
  5. 5. Sparse Modeling • A machine leaning method for high-dimensional data  Genetic data, Medical image, ... Important task : Feature selection Sparse modeling features which have non-zero coefficients are called “seleced”
  6. 6. Feature Selection • Conventional feature selection  AIC/BIC w/ stepwise  Equivalent to 𝐿0-norm regularization 1-1 1 1-1 1 min 𝛽 𝑖=1 𝑙 𝑦𝑖 − 𝑥𝑖 𝑇 𝛽 2 + 𝜆 𝛽 1min 𝛽 𝑖=1 𝑙 𝑦𝑖 − 𝑥𝑖 𝑇 𝛽 2 + 𝜆 𝑗 𝐼 𝛽𝑗 ≠ 0 • Lasso  𝐿1-norm : convex cone of 𝐿0-norm in [-1, 1] 𝛽 𝛽 Discontinuous and non-convex Continuous and convexContinuous Approximation # of selected features
  7. 7. Variation of Sparse Modeling • Lasso • Elastic net • SCAD • Adaptive Lasso • Fused Lasso • Generalized Lasso • (Non-overlapping/Overlapping) Group Lasso • Clustered Lasso OSCAR Extract Group structure Generalization of Lasso Respect the consistency of feature selection Basic
  8. 8. Contents • Introduction to sparse modeling • Review for OSCAR • Proposal : OscarGKPath • Results • Summary and Discussion
  9. 9. OSCAR(Octagonal Shrinkage and Clustering Algorithm for Regression) • Formulation The method of Lagrange multiplier min 𝛽 1 2 𝑖=1 𝑙 𝑦𝑖 − 𝑥𝑖 𝑇 𝛽 2 𝑠. 𝑡. 𝛽 1 + 𝑐 𝑗>𝑘 max 𝛽𝑗 , 𝛽 𝑘 ≤ 𝑡 𝐹 𝛽, 𝜆1, 𝜆2 = min 𝛽 1 2 𝑖=1 𝑙 𝑦𝑖 − 𝑥𝑖 𝑇 𝛽 2 + 𝜆1 𝛽 1 + 𝜆2 𝑗>𝑘 max 𝛽𝑗 , 𝛽 𝑘 where 𝑐 ≥ 0 and 𝑡 ≥ 0 are tuning parameters where 𝜆1 ≥ 0 and 𝜆2 ≥ 0 are regularization parameters 𝐿1-norm 𝐿∞-norm variables are normalized and/or standardized
  10. 10. Pictorial Image • Solutions for correlated data  ex) two features Regularization parameters  𝜆1, 𝜆2 [Zeng and Figueired, 2013] 𝐿1-norm 𝐿∞-norm 𝐿1 + 𝐿∞
  11. 11. Lasso vs OSCAR • OSCAR : Grouping structure is formulated Data : Facebook Comment Dataset
  12. 12. Adjustment hyper parameter • Solution Path  ex) Lasso Regularized parameter
  13. 13. Adjustment hyper parameter • OSCAR  No group-keeping solution path algorithm OSCAR Path ? Regularized parameter
  14. 14. Proposal • OscarGKPath  Group-keeping solution path algorithm of OSCAR OSCAR Path Regularized parameter
  15. 15. Contents • Introduction to sparse modeling • Review for OSCAR • Proposal : OscarGKPath • Results • Summary and Discussion
  16. 16. OSCAR revisited • Re-formulation min 𝜃 1 2 𝑖=1 𝑙 𝑦𝑖 − 𝑥𝑖 𝑇 𝜃 2 + 𝑔=1 𝐺 𝑤𝑔 𝜃𝑔 𝑠. 𝑡. 0 ≤ 𝜃1 < 𝜃2 < ⋯ < 𝜃 𝐺 where 𝑥𝑖 = 𝑥𝑖1 𝑥𝑖2 ⋯ 𝑥𝑖𝐺 and 𝑥𝑖𝑔 = 𝑗∈ℊ 𝑔 sign 𝛽𝑗 𝑥𝑖𝑗, 𝑤𝑔 = 𝑗∈ℊ 𝑔 𝜆1 + 𝑜 𝑗 − 1 𝜆2 , 𝑜 𝑗 ∈ 1, ⋯ , 𝑑 , ℊ 𝑔 ⊆ 1, ⋯ , 𝑑 , and we define an active sets: 𝐴 = 𝑔 ∈ 1, ⋯ , 𝐺 𝜃𝑔 > 0 , 𝐴 = 1, ⋯ , 𝐺 − 𝐴
  17. 17. Input parameters • Direction to the change of 𝜆1,2  𝑑 = 𝑑1 𝑑2 Δ𝜆 = 𝑑Δ𝜂, Δ𝜂 is determined by algorithm • Accuracy  𝜖 The proposed algorithm is approximated one • Interval of 𝜂  𝜂, 𝜂
  18. 18. OscarGKPath : Algorithm using optimality condition for OSCAR (abbreviation) using “termination condition” using the dual problem Any solution in the solution path can satisfy the duality gap : proved
  19. 19. Termination condition 1. A regression coefficients become zero : Δ𝜂 𝛢 2. Order of regression coefficients change : Δ𝜂 𝑂 ※Optimality condition of OSCAR is based on a given order of coefficients 3. 𝜂 reaches 𝜂 : 𝜂 − 𝜂 Δ𝜂max = min Δ𝜂 𝛢, Δ𝜂 𝑂, 𝜂 − 𝜂
  20. 20. Contents • Introduction to sparse modeling • Review for OSCAR • Proposal : OscarGKPath • Results • Summary and Discussion
  21. 21. Setup • Data Sets : • 5-fold cross-validation • direction : 𝑑 = 1 0.5 , 1 1 , 1 2 • 𝜂 : log2 𝜂 − 4 ≤ log2 𝜂 ≤ 15  OscarGKPath : 10 trials  “Batch Search” (search on 20 uniform grid linearly spaced by 0.1) ×400×5 • Duality gap : G 𝜃 𝜂 , 𝑑1 𝜂, 𝑑2 𝜂 ≤ 𝜀 = 0.1 × 𝐹 𝛽∗ , 𝑑1, 𝑑2 At one trial, only limited solution path is produced by “Batch search”
  22. 22. Batch Search vs OscarGKPath • Shorter time • Maintained accuracy Data : Right Ventricle Dataset Grid Search Proposal Grid Search Proposal
  23. 23. Contents • Introduction to sparse modeling • Review for OSCAR • Proposal : OscarGKPath • Results • Summary and Discussion
  24. 24. Summary and Discussion 論文ジャンル スパースモデリング、特に特徴量間のグループ構造を自動抽出する手法であるOSCARに関する話 論文の背景・課題 グループ構造を「保ったまま」ハイパーパラメータ調整を行うアルゴリズムが知られていない 論文のメイン内容 グループ構造を「保ったまま」パラメータ調整を行うアルゴリズムを提案:OscarGKPath 注)グループ構造を壊さない方向しか探索しないので、本当に最適値を探索しているわけではありません 結論 全パラメータをグリッドサーチする場合と比較して、精度を損なわずに遥かに高速計算可能な アルゴリズムを提唱できた 所見(個人的) どういうグループ化が良いかはアルゴリズムのスコープ外なので、使用条件は限定的かも スパースモデリング

×