Sparsity with sign-coherent groups of variables via the                        cooperative-Lasso            Julien Chiquet...
Notations  Let         Y be the output random variable,         X = (X 1 , . . . , X p ) be the input random variables, wh...
Generalized linear models  Suppose Y depends linearly on X through a function g:                                  E(Y ) = ...
Generalized linear models  Suppose Y depends linearly on X through a function g:                                  E(Y ) = ...
Estimation and selection at the group level    1. Structure: the set I = {1, . . . , p} splits into a known partition.    ...
Estimation and selection at the group level    1. Structure: the set I = {1, . . . , p} splits into a known partition.    ...
Toy example: the prostate dataset             Examines the correlation between the prostate specific antigen and 8         ...
Toy example: the prostate dataset         Examines the correlation between the prostate specific antigen and 8         clin...
Toy example: the prostate dataset             Examines the correlation between the prostate specific antigen and 8         ...
Toy example: the prostate dataset             Examines the correlation between the prostate specific antigen and 8         ...
Application to splice site detection                Predict splice site status (0/1) by a sequence of 7 bases and their   ...
Application to splice site detection  Predict splice site status (0/1) by a sequence of 7 bases and their  interactions.  ...
Group-Lasso limitations    1. Not a single zero should belong to a group with non-zeros                    Strong group sp...
Group-Lasso limitations    1. Not a single zero should belong to a group with non-zeros                    Strong group sp...
Motivation: multiple network inference   experiment 1                     experiment 2                   experiment 3     ...
Motivation: joint segmentation of aCGH profiles                                                                            ...
Motivation: joint segmentation of aCGH profiles                                                                            ...
Motivation: joint segmentation of aCGH profiles                                                                            ...
Motivation: joint segmentation of aCGH profiles                                                                            ...
Motivation: joint segmentation of aCGH profiles                                                                            ...
Motivation: joint segmentation of aCGH profiles                                                                            ...
Motivation: joint segmentation of aCGH profiles                                                                            ...
Outline  Definition  Resolution  Consistency  Model selection  Simulation studies  Sibling probe sets and gene selectioncoo...
Outline  Definition  Resolution  Consistency  Model selection  Simulation studies  Sibling probe sets and gene selectioncoo...
The cooperative-Lasso estimator  Definition              ˆcoop = arg min J(β), with J(β) = −              β                ...
A geometric view of sparsity                                 minimize − (β1 , β2 ) + λΩ(β1 , β2 )(β1 , β2 )               ...
A geometric view of sparsity                                minimize − (β1 , β2 ) + λΩ(β1 , β2 )                          ...
Ball crafting: group-Lasso                                                      β4 = 0             β4 = 0.3Admissible set ...
Ball crafting: group-Lasso                                                      β4 = 0             β4 = 0.3Admissible set ...
Ball crafting: group-Lasso                                                      β4 = 0             β4 = 0.3Admissible set ...
Ball crafting: group-Lasso                                                      β4 = 0             β4 = 0.3Admissible set ...
Ball crafting: cooperative-Lasso                                                     β4 = 0             β4 = 0.3Admissible...
Ball crafting: cooperative-Lasso                                                     β4 = 0             β4 = 0.3Admissible...
Ball crafting: cooperative-Lasso                                                     β4 = 0             β4 = 0.3Admissible...
Ball crafting: cooperative-Lasso                                                     β4 = 0             β4 = 0.3Admissible...
Outline  Definition  Resolution  Consistency  Model selection  Simulation studies  Sibling probe sets and gene selectioncoo...
Convex analysis Supporting Hyperplane  An hyperplane supports a set iff         the set is contained in one half-space     ...
Convex analysis Supporting Hyperplane  An hyperplane supports a set iff         the set is contained in one half-space     ...
Convex analysis Supporting Hyperplane  An hyperplane supports a set iff         the set is contained in one half-space     ...
Convex analysis Supporting Hyperplane  An hyperplane supports a set iff         the set is contained in one half-space     ...
Convex analysis Supporting Hyperplane  An hyperplane supports a set iff         the set is contained in one half-space     ...
Convex analysis Dual Cone and subgradient                                  Generalizes normals   β2                       ...
Convex analysis Dual Cone and subgradient                                  Generalizes normals   β2                       ...
Convex analysis Dual Cone and subgradient                                  Generalizes normals   β2                       ...
Convex analysis Dual Cone and subgradient                                  Generalizes normals   β2                       ...
Optimality conditions  Theorem  A necessary and sufficient condition for the optimality of β is that the  null vector 0 belo...
Optimality conditions  Theorem  A necessary and sufficient condition for the optimality of β is that the  null vector 0 belo...
Optimality conditions  Theorem  A necessary and sufficient condition for the optimality of β is that the  null vector 0 belo...
Linear regression with orthonormal design  Consider                    ˆ              1          2                    β = ...
Linear regression with orthonormal design  Consider                    ˆ              1          2                    β = ...
Linear regression with orthonormal design                    ˆlasso                    β1                                 ...
Linear regression with orthonormal design                    ˆgroup                    β1                                 ...
Linear regression with orthonormal design                    ˆcoop                    β1                                  ...
Outline  Definition  Resolution  Consistency  Model selection  Simulation studies  Sibling probe sets and gene selectioncoo...
Linear regression setup Technical assumptions(A1) X and Y have finite fourth order moments                                 ...
Irrepresentability condition  Define Sk = S ∩ Gk the support within a group and                                            ...
Consistency results  Theorem  If assumptions (A1-5) are satisfied and if there exists η > 0, then for  every sequence λn su...
Sketch of the proof                                        ˜    1. Construct an artifical estimator β S restricted to the t...
Sketch of the proof                                        ˜    1. Construct an artifical estimator β S restricted to the t...
Sketch of the proof                                        ˜    1. Construct an artifical estimator β S restricted to the t...
Sketch of the proof                                        ˜    1. Construct an artifical estimator β S restricted to the t...
Illustration         1.0         0.5                                                     Generate data y = Xβ + σε,coefficie...
Illustration         1.0         0.5                                                    Generate data y = Xβ + σε,coefficien...
Illustration         1.0         0.5                                                    Generate data y = Xβ + σε,coefficien...
Outline  Definition  Resolution  Consistency  Model selection  Simulation studies  Sibling probe sets and gene selectioncoo...
Optimism of the training error         The training error:                                           1                    ...
Optimism of the training error         The training error:                                           1                    ...
Cp statistics  For squared-error loss (and some other loss),                                         2                    ...
Cp statistics  For squared-error loss (and some other loss),                                         2                    ...
Generalized degrees of freedom      ˆ       ˆ  Let y(λ) = Xβ(λ) be the predicted values for a penalized estimator.  Propos...
Generalized degrees of freedom      ˆ       ˆ  Let y(λ) = Xβ(λ) be the predicted values for a penalized estimator.  Propos...
Generalized degrees of freedom      ˆ       ˆ  Let y(λ) = Xβ(λ) be the predicted values for a penalized estimator.  Propos...
Approximated degrees of freedom for the coop-Lasso  Proposition  Assuming that data are generated according to a linear re...
Approximated degrees of freedom for the coop-Lasso  Proposition  Assuming that data are generated according to a linear re...
Approximated information criteria  Following Zou et al, we extend the Cp stat to an “approximated” AIC                    ...
Outline  Definition  Resolution  Consistency  Model selection  Simulation studies  Sibling probe sets and gene selectioncoo...
Revisiting Elastic-Net experiments (1)                q                                            Generate data y = Xβ + ...
Revisiting Elastic-Net experiments (2)                                            Generate data y = Xβ + σε,              ...
Breiman’s setup Simulations setting  A wave-like vector of parameters β         p = 90 variables partitioned into K = 10 g...
Breiman’s setup Simulations setting  A wave-like vector of parameters β         p = 90 variables partitioned into K = 10 g...
Breiman’s setup Simulations setting  A wave-like vector of parameters β         p = 90 variables partitioned into K = 10 g...
Breiman’s setup Simulations setting  A wave-like vector of parameters β         p = 90 variables partitioned into K = 10 g...
Breiman’s setup Simulations setting  A wave-like vector of parameters β         p = 90 variables partitioned into K = 10 g...
Breiman’s setup Example of path of solution and signal recovery with BIC choice  The signal strength is generated so as   ...
Breiman’s setup Example of path of solution and signal recovery with BIC choice  The signal strength is generated so as   ...
Breiman’s setup Example of path of solution and signal recovery with BIC choice  The signal strength is generated so as   ...
Breiman’s setup Example of path of solution and signal recovery with BIC choice  The signal strength is generated so as   ...
Breiman’s setup Example of path of solution and signal recovery with BIC choice  The signal strength is generated so as   ...
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
Upcoming SlideShare
Loading in...5
×

Sparsity with sign-coherent groups of variables via the cooperative-Lasso

850

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
850
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Sparsity with sign-coherent groups of variables via the cooperative-Lasso

  1. 1. Sparsity with sign-coherent groups of variables via the cooperative-Lasso Julien Chiquet1 , Yves Grandvalet2 , Camille Charbonnier1 1 e ´ Statistique et G´nome, CNRS & Universit´ d’Evry Val d’Essonne e 2 Heudiasyc, CNRS & Universit´ de Technologie de Compi`gne e e SSB – 29 mars 2011 arXiv preprint. http://arxiv.org/abs/1103.2697 R-package scoop. http://stat.genopole.cnrs.fr/logiciels/scoopcooperative-Lasso 1
  2. 2. Notations Let Y be the output random variable, X = (X 1 , . . . , X p ) be the input random variables, where X j is the jth predictor. The data Given a sample {(yi , xi ), i = 1, . . . , n} of i.id. realizations of (Y, X), denote y = (y1 , . . . , yn ) the response vector, xj = (xj , . . . , xj ) the vector of data for the jth predictor, 1 n X the n × p design matrix of data whose jth column is xj , D = {i : (yi , xi ) ∈ training set}, T = {i : (yi , xi ) ∈ test set}.cooperative-Lasso 2
  3. 3. Generalized linear models Suppose Y depends linearly on X through a function g: E(Y ) = g(Xβ ). ˆ We predict a response yi by yi = g(xi β) for any i ∈ T by solving ˆ ˆ β = arg max D (β) = arg min Lg (yi , xi β), β β i∈D where Lg is a loss function depending on the function g. Typically, if Y is Gaussian and g = Id (OLS), Lg (y, xβ) = (y − xβ)2 if Y is binary and g : t → g(t) = (1 + e−t )−1 (logistic regression) Lg (y, xβ) = − y · xβ − log 1 + exβ or any negative log-likelihood of an exponential family distribution.cooperative-Lasso 3
  4. 4. Generalized linear models Suppose Y depends linearly on X through a function g: E(Y ) = g(Xβ ). ˆ We predict a response yi by yi = g(xi β) for any i ∈ T by solving ˆ ˆ β = arg max D (β) = arg min Lg (yi , xi β), β β i∈D where Lg is a loss function depending on the function g. Typically, if Y is Gaussian and g = Id (OLS), Lg (y, xβ) = (y − xβ)2 if Y is binary and g : t → g(t) = (1 + e−t )−1 (logistic regression) Lg (y, xβ) = − y · xβ − log 1 + exβ or any negative log-likelihood of an exponential family distribution.cooperative-Lasso 3
  5. 5. Estimation and selection at the group level 1. Structure: the set I = {1, . . . , p} splits into a known partition. K I= Gk , with Gk ∩ G = ∅, k = . k=1 2. Sparsity: the support S of β has few entries. S = {i : βi = 0}, such as |S| p. The group-Lasso estimator Grandvalet and Canu ’98, Bakin ’99, Yuan and Lin ’06 K ˆgroup = arg min − β D (β) +λ wk β Gk . β∈Rp k=1 λ ≥ 0 controls the overall amount of penalty, wk > 0 adapts the penalty between groups (dropped hereafter).cooperative-Lasso 4
  6. 6. Estimation and selection at the group level 1. Structure: the set I = {1, . . . , p} splits into a known partition. K I= Gk , with Gk ∩ G = ∅, k = . k=1 2. Sparsity: the support S of β has few entries. S = {i : βi = 0}, such as |S| p. The group-Lasso estimator Grandvalet and Canu ’98, Bakin ’99, Yuan and Lin ’06 K ˆgroup = arg min − β D (β) +λ wk β Gk . β∈Rp k=1 λ ≥ 0 controls the overall amount of penalty, wk > 0 adapts the penalty between groups (dropped hereafter).cooperative-Lasso 4
  7. 7. Toy example: the prostate dataset Examines the correlation between the prostate specific antigen and 8 clinical measures for 97 patients. svi lweight lcavol lcavol log(cancer volume) lweight log(prostate weight) age agecoefficients lbph log(benign prostatic hyperplasia amount) svi seminal vesicle invasion lcp log(capsular penetration) lbph gleason gleason Gleason score pgg45 age pgg45 percentage Gleason scores 4 or 5 lcp -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 lambda (log scale) Figure: Lasso cooperative-Lasso 5
  8. 8. Toy example: the prostate dataset Examines the correlation between the prostate specific antigen and 8 clinical measures for 97 patients. 600 age 500 lcavol log(cancer volume) 400 lweight log(prostate weight) age age 300Height lbph log(benign prostatic pgg45 200 hyperplasia amount) svi seminal vesicle invasion 100 lcp log(capsular penetration) 0 gleason Gleason score lweight gleason pgg45 percentage Gleason scores 4 lbph lcavol svi lcp or 5 Figure: hierarchical clustering cooperative-Lasso 5
  9. 9. Toy example: the prostate dataset Examines the correlation between the prostate specific antigen and 8 clinical measures for 97 patients. svi lweight lcavol lcavol log(cancer volume) lweight log(prostate weight) age agecoefficients lbph log(benign prostatic hyperplasia amount) svi seminal vesicle invasion lcp log(capsular penetration) lbph gleason gleason Gleason score pgg45 age pgg45 percentage Gleason scores 4 or 5 lcp -3 -2 -1 0 lambda (log scale) Figure: group-Lasso cooperative-Lasso 5
  10. 10. Toy example: the prostate dataset Examines the correlation between the prostate specific antigen and 8 clinical measures for 97 patients. svi lweight lcavol lcavol log(cancer volume) lweight log(prostate weight) age agecoefficients lbph log(benign prostatic hyperplasia amount) svi seminal vesicle invasion lcp log(capsular penetration) lbph gleason gleason Gleason score pgg45 age pgg45 percentage Gleason scores 4 or 5 lcp -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 lambda (log scale) Figure: Lasso cooperative-Lasso 5
  11. 11. Application to splice site detection Predict splice site status (0/1) by a sequence of 7 bases and their interactions. 2 1.5 order 0: 7 factors with 4 levels,Information content order 1: C7 factors with 42 levels, 2 1 order 2: C7 factors with 43 levels, 3 using dummy coding for factor, 0.5 we form groups. 0 1 2 3 4 5 6 7 8 9 Position L. Meier, S. van de Geer, P. B¨hlmann, 2008. u The group-Lasso for logistic regression, JRSS series B.cooperative-Lasso 6
  12. 12. Application to splice site detection Predict splice site status (0/1) by a sequence of 7 bases and their interactions. order 0 g49 g45 g61 order 1 order 2 order 0: 7 factors with 4 levels, g44 g54 g42 order 1: C7 factors with 42 levels, 2 order 2: C7 factors with 43 levels, 3 using dummy coding for factor, g4 we form groups. g18 g5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 L. Meier, S. van de Geer, P. B¨hlmann, 2008. u The group-Lasso for logistic regression, JRSS series B.cooperative-Lasso 6
  13. 13. Group-Lasso limitations 1. Not a single zero should belong to a group with non-zeros Strong group sparsity (Huang and Zhang, ’10 arXiv) establish the conditions where the group-Lasso outperforms the Lasso, and conversely. 2. No sign-coherence within group Required if groups gather consonant variables e.g., groups defined by clusters of positively correlated variables. The cooperative-Lasso A penalty which assumes a sign-coherent group structure, that is to say, groups which gather either non-positive, non-negative, or null parameters.cooperative-Lasso 7
  14. 14. Group-Lasso limitations 1. Not a single zero should belong to a group with non-zeros Strong group sparsity (Huang and Zhang, ’10 arXiv) establish the conditions where the group-Lasso outperforms the Lasso, and conversely. 2. No sign-coherence within group Required if groups gather consonant variables e.g., groups defined by clusters of positively correlated variables. The cooperative-Lasso A penalty which assumes a sign-coherent group structure, that is to say, groups which gather either non-positive, non-negative, or null parameters.cooperative-Lasso 7
  15. 15. Motivation: multiple network inference experiment 1 experiment 2 experiment 3 inference inference inference A group is a set of corresponding edges across tasks (e.g., red or blue ones): sign-coherence matters! J. Chiquet, Y. Grandvalet, C. Ambroise, 2010. Inferring multiple graphical structures, Statistics and Computing.cooperative-Lasso 8
  16. 16. Motivation: joint segmentation of aCGH profiles 2  minimize β − y  ,   β∈Rp p  s.t   |βi − βi−1 | < s, i=1 1 wherelog-ratio (CNVs) y a vector in Rp , β a vector in Rp , 0 βi a size-n vector with ith probes for the n profiles. a group gathers every position i -1 across profiles. Sign-coherence may avoid inconsistent variations across profiles. -2 0 50 100 150 200 position on chromosom K. Bleakley and J.-P. Vert, 2010. Joint segmentation of many aCGH profiles using fast group LARS, NIPS. cooperative-Lasso 9
  17. 17. Motivation: joint segmentation of aCGH profiles 2  minimize β − Y  ,  β∈Rn×p  p  s.t   βi − βi−1 < s, i=1 1 wherelog-ratio (CNVs) Y a n × p matrix with n profiles with size p. 0 βi a size-n vector with ith probes for the n profiles. a group gathers every position i -1 across profiles. Sign-coherence may avoid inconsistent variations across profiles. -2 0 50 100 150 200 position on chromosom K. Bleakley and J.-P. Vert, 2010. Joint segmentation of many aCGH profiles using fast group LARS, NIPS. cooperative-Lasso 9
  18. 18. Motivation: joint segmentation of aCGH profiles 2  minimize β − Y  ,  β∈Rn×p  p  s.t   βi − βi−1 < s, i=1 1 wherelog-ratio (CNVs) Y a n × p matrix with n profiles with size p. 0 βi a size-n vector with ith probes for the n profiles. a group gathers every position i -1 across profiles. Sign-coherence may avoid inconsistent variations across profiles. -2 0 50 100 150 200 position on chromosom K. Bleakley and J.-P. Vert, 2010. Joint segmentation of many aCGH profiles using fast group LARS, NIPS. cooperative-Lasso 9
  19. 19. Motivation: joint segmentation of aCGH profiles 2  minimize β − Y  ,  β∈Rn×p  p  s.t   βi − βi−1 < s, i=1 1 wherelog-ratio (CNVs) Y a n × p matrix with n profiles with size p. 0 βi a size-n vector with ith probes for the n profiles. a group gathers every position i -1 across profiles. Sign-coherence may avoid inconsistent variations across profiles. -2 0 50 100 150 200 position on chromosom K. Bleakley and J.-P. Vert, 2010. Joint segmentation of many aCGH profiles using fast group LARS, NIPS. cooperative-Lasso 9
  20. 20. Motivation: joint segmentation of aCGH profiles 2  minimize β − Y  ,  β∈Rn×p  p  s.t   βi − βi−1 < s, i=1 1 wherelog-ratio (CNVs) Y a n × p matrix with n profiles with size p. 0 βi a size-n vector with ith probes for the n profiles. a group gathers every position i -1 across profiles. Sign-coherence may avoid inconsistent variations across profiles. -2 0 50 100 150 200 position on chromosom K. Bleakley and J.-P. Vert, 2010. Joint segmentation of many aCGH profiles using fast group LARS, NIPS. cooperative-Lasso 9
  21. 21. Motivation: joint segmentation of aCGH profiles 2  minimize β − Y  ,  β∈Rn×p  p  s.t   βi − βi−1 < s, i=1 1 wherelog-ratio (CNVs) Y a n × p matrix with n profiles with size p. 0 βi a size-n vector with ith probes for the n profiles. a group gathers every position i -1 across profiles. Sign-coherence may avoid inconsistent variations across profiles. -2 0 50 100 150 200 position on chromosom K. Bleakley and J.-P. Vert, 2010. Joint segmentation of many aCGH profiles using fast group LARS, NIPS. cooperative-Lasso 9
  22. 22. Motivation: joint segmentation of aCGH profiles 2  minimize β − Y  ,  β∈Rn×p  p  s.t   βi − βi−1 < s, i=1 1 wherelog-ratio (CNVs) Y a n × p matrix with n profiles with size p. 0 βi a size-n vector with ith probes for the n profiles. a group gathers every position i -1 across profiles. Sign-coherence may avoid inconsistent variations across profiles. -2 0 50 100 150 200 position on chromosom K. Bleakley and J.-P. Vert, 2010. Joint segmentation of many aCGH profiles using fast group LARS, NIPS. cooperative-Lasso 9
  23. 23. Outline Definition Resolution Consistency Model selection Simulation studies Sibling probe sets and gene selectioncooperative-Lasso 10
  24. 24. Outline Definition Resolution Consistency Model selection Simulation studies Sibling probe sets and gene selectioncooperative-Lasso 11
  25. 25. The cooperative-Lasso estimator Definition ˆcoop = arg min J(β), with J(β) = − β D (β) +λ β coop , β∈Rp where, for any v ∈ Rp , K + − v coop = v+ group + v − group = vGk + vGk , k=1 and + + v+ = (v1 , . . . , vp ), vj = max(0, vj ), + − + v− = (v1 , . . . , vp ), vj = max(0, −vj ). −cooperative-Lasso 12
  26. 26. A geometric view of sparsity minimize − (β1 , β2 ) + λΩ(β1 , β2 )(β1 , β2 ) β1 ,β2 maximize (β1 , β2 ) β1 ,β2 s.t. Ω(β1 , β2 ) ≤ c β2 β1cooperative-Lasso 13
  27. 27. A geometric view of sparsity minimize − (β1 , β2 ) + λΩ(β1 , β2 ) β1 ,β2 maximize (β1 , β2 )β2 β1 ,β2 s.t. Ω(β1 , β2 ) ≤ c β1cooperative-Lasso 13
  28. 28. Ball crafting: group-Lasso β4 = 0 β4 = 0.3Admissible set β = (β1 , β2 , β3 , β4 ) , 1 1 β2 = 0 G1 = {1, 2}, G2 = {3, 4}. β3 β3 −1 1 −1 1 −1 −1Unit ball β1 β1 β group ≤1 β2 = 0.3 1 1 β3 β3 −1 1 −1 1 −1 −1 β1 β1cooperative-Lasso 14
  29. 29. Ball crafting: group-Lasso β4 = 0 β4 = 0.3Admissible set β = (β1 , β2 , β3 , β4 ) , 1 1 β2 = 0 G1 = {1, 2}, G2 = {3, 4}. β3 β3 −1 1 −1 1 −1 −1Unit ball β1 β1 β group ≤1 β2 = 0.3 1 1 β3 β3 −1 1 −1 1 −1 −1 β1 β1cooperative-Lasso 14
  30. 30. Ball crafting: group-Lasso β4 = 0 β4 = 0.3Admissible set β = (β1 , β2 , β3 , β4 ) , 1 1 β2 = 0 G1 = {1, 2}, G2 = {3, 4}. β3 β3 −1 1 −1 1 −1 −1Unit ball β1 β1 β group ≤1 β2 = 0.3 1 1 β3 β3 −1 1 −1 1 −1 −1 β1 β1cooperative-Lasso 14
  31. 31. Ball crafting: group-Lasso β4 = 0 β4 = 0.3Admissible set β = (β1 , β2 , β3 , β4 ) , 1 1 β2 = 0 G1 = {1, 2}, G2 = {3, 4}. β3 β3 −1 1 −1 1 −1 −1Unit ball β1 β1 β group ≤1 β2 = 0.3 1 1 β3 β3 −1 1 −1 1 −1 −1 β1 β1cooperative-Lasso 14
  32. 32. Ball crafting: cooperative-Lasso β4 = 0 β4 = 0.3Admissible set β = (β1 , β2 , β3 , β4 ) , 1 1 β2 = 0 G1 = {1, 2}, G2 = {3, 4}. β3 β3 −1 1 −1 1 −1 −1Unit ball β1 β1 β coop ≤1 β2 = 0.3 1 1 β3 β3 −1 1 −1 1 −1 −1 β1 β1cooperative-Lasso 15
  33. 33. Ball crafting: cooperative-Lasso β4 = 0 β4 = 0.3Admissible set β = (β1 , β2 , β3 , β4 ) , 1 1 β2 = 0 G1 = {1, 2}, G2 = {3, 4}. β3 β3 −1 1 −1 1 −1 −1Unit ball β1 β1 β coop ≤1 β2 = 0.3 1 1 β3 β3 −1 1 −1 1 −1 −1 β1 β1cooperative-Lasso 15
  34. 34. Ball crafting: cooperative-Lasso β4 = 0 β4 = 0.3Admissible set β = (β1 , β2 , β3 , β4 ) , 1 1 β2 = 0 G1 = {1, 2}, G2 = {3, 4}. β3 β3 −1 1 −1 1 −1 −1Unit ball β1 β1 β coop ≤1 β2 = 0.3 1 1 β3 β3 −1 1 −1 1 −1 −1 β1 β1cooperative-Lasso 15
  35. 35. Ball crafting: cooperative-Lasso β4 = 0 β4 = 0.3Admissible set β = (β1 , β2 , β3 , β4 ) , 1 1 β2 = 0 G1 = {1, 2}, G2 = {3, 4}. β3 β3 −1 1 −1 1 −1 −1Unit ball β1 β1 β coop ≤1 β2 = 0.3 1 1 β3 β3 −1 1 −1 1 −1 −1 β1 β1cooperative-Lasso 15
  36. 36. Outline Definition Resolution Consistency Model selection Simulation studies Sibling probe sets and gene selectioncooperative-Lasso 16
  37. 37. Convex analysis Supporting Hyperplane An hyperplane supports a set iff the set is contained in one half-space the set has at least one point on the hyperplane β2 β1cooperative-Lasso 17
  38. 38. Convex analysis Supporting Hyperplane An hyperplane supports a set iff the set is contained in one half-space the set has at least one point on the hyperplane β2 β2 β1 β1cooperative-Lasso 17
  39. 39. Convex analysis Supporting Hyperplane An hyperplane supports a set iff the set is contained in one half-space the set has at least one point on the hyperplane β2 β2 β1 β1cooperative-Lasso 17
  40. 40. Convex analysis Supporting Hyperplane An hyperplane supports a set iff the set is contained in one half-space the set has at least one point on the hyperplane β2 β2 β1 β1cooperative-Lasso 17
  41. 41. Convex analysis Supporting Hyperplane An hyperplane supports a set iff the set is contained in one half-space the set has at least one point on the hyperplane β2 β2 β2 β1 β1 β1 There are Supporting Hyperplane at all points of convex sets: Generalize tangentscooperative-Lasso 17
  42. 42. Convex analysis Dual Cone and subgradient Generalizes normals β2 β2 β2 β1 β1 β1 g is a subgradient at x the vector (g, −1) is normal to the supporting hyperplane at this point The subdifferential at x is the set of all subgradient at x.cooperative-Lasso 18
  43. 43. Convex analysis Dual Cone and subgradient Generalizes normals β2 β2 β2 β1 β1 β1 g is a subgradient at x the vector (g, −1) is normal to the supporting hyperplane at this point The subdifferential at x is the set of all subgradient at x.cooperative-Lasso 18
  44. 44. Convex analysis Dual Cone and subgradient Generalizes normals β2 β2 β2 β1 β1 β1 g is a subgradient at x the vector (g, −1) is normal to the supporting hyperplane at this point The subdifferential at x is the set of all subgradient at x.cooperative-Lasso 18
  45. 45. Convex analysis Dual Cone and subgradient Generalizes normals β2 β2 β2 β1 β1 β1 g is a subgradient at x the vector (g, −1) is normal to the supporting hyperplane at this point The subdifferential at x is the set of all subgradient at x.cooperative-Lasso 18
  46. 46. Optimality conditions Theorem A necessary and sufficient condition for the optimality of β is that the null vector 0 belong to the subdifferential of the convex function J: 0 ∂β J(β) = {v ∈ Rp : v = − β (β) + λθ}, where θ ∈ Rp belongs to the subdifferential of the coop-norm. Define ϕj (v) = (sign(vj )v)+ , then θ is such as βj ∀k ∈ {1, . . . , K} , ∀j ∈ Sk (β) , θj = , ϕj (β Gk ) c ∀k ∈ {1, . . . , K} , ∀j ∈ Sk (β) , ϕj (θ Gk ) ≤ 1. We derive a subset algorithm to solve that problem (that you can enjoy in the paper and the package).cooperative-Lasso 19
  47. 47. Optimality conditions Theorem A necessary and sufficient condition for the optimality of β is that the null vector 0 belong to the subdifferential of the convex function J: 0 ∂β J(β) = {v ∈ Rp : v = − β (β) + λθ}, where θ ∈ Rp belongs to the subdifferential of the coop-norm. Define ϕj (v) = (sign(vj )v)+ , then θ is such as βj ∀k ∈ {1, . . . , K} , ∀j ∈ Sk (β) , θj = , ϕj (β Gk ) c ∀k ∈ {1, . . . , K} , ∀j ∈ Sk (β) , ϕj (θ Gk ) ≤ 1. We derive a subset algorithm to solve that problem (that you can enjoy in the paper and the package).cooperative-Lasso 19
  48. 48. Optimality conditions Theorem A necessary and sufficient condition for the optimality of β is that the null vector 0 belong to the subdifferential of the convex function J: 0 ∂β J(β) = {v ∈ Rp : v = − β (β) + λθ}, where θ ∈ Rp belongs to the subdifferential of the coop-norm. Define ϕj (v) = (sign(vj )v)+ , then θ is such as βj ∀k ∈ {1, . . . , K} , ∀j ∈ Sk (β) , θj = , ϕj (β Gk ) c ∀k ∈ {1, . . . , K} , ∀j ∈ Sk (β) , ϕj (θ Gk ) ≤ 1. We derive a subset algorithm to solve that problem (that you can enjoy in the paper and the package).cooperative-Lasso 19
  49. 49. Linear regression with orthonormal design Consider ˆ 1 2 β = arg min y − Xβ + λΩ(β) , β 2 ˆols with X X = I. Hence, (xj ) (Xβ − y) = βj − β and ˆ 1 ˆols β = arg min β (β − β ) + λΩ(β) . β 2 We may find a closed-form of β for, e.g., 1. Ω(β) = β lasso , 2. Ω(β) = β group , 3. Ω(β) = β coop .cooperative-Lasso 20
  50. 50. Linear regression with orthonormal design Consider ˆ 1 2 β = arg min y − Xβ + λΩ(β) , β 2 ˆols with X X = I. Hence, (xj ) (Xβ − y) = βj − β and ˆ 1 ˆols β = arg min β (β − β ) + λΩ(β) . β 2 We may find a closed-form of β for, e.g., 1. Ω(β) = β lasso , 2. Ω(β) = β group , 3. Ω(β) = β coop .cooperative-Lasso 20
  51. 51. Linear regression with orthonormal design ˆlasso β1 ∀j ∈ {1, . . . , p} ,  + ˆlasso λ  ˆols βj = 1 − βj , ˆ β olsj + ˆlasso = βj ˆols βj − λ . ˆols β2 ˆols β1 Fig.: Lasso as a function of the OLS coefficientscooperative-Lasso 20
  52. 52. Linear regression with orthonormal design ˆgroup β1 ∀k ∈ {1, . . . , K} , ∀j ∈ Gk ,  + ˆgroup = 1 − λ  ˆols βj βj , βˆols Gk + ˆgroup = β Gk ˆols β Gk − λ . ˆols β2 ˆols β1 Fig.: Group-Lasso as a function of the OLS coefficientscooperative-Lasso 20
  53. 53. Linear regression with orthonormal design ˆcoop β1 ∀k ∈ {1, . . . , K} , ∀j ∈ Gk ,  + ˆcoop λ ˆols βj = 1 − ols  βj , ˆ ϕ (β ) j Gk + ˆcoop ϕj (β Gk ) = ˆols ϕj (β Gk ) − λ . ˆols β2 ˆols β1 Fig.: Coop-Lasso as a function of the OLS coefficientscooperative-Lasso 20
  54. 54. Outline Definition Resolution Consistency Model selection Simulation studies Sibling probe sets and gene selectioncooperative-Lasso 21
  55. 55. Linear regression setup Technical assumptions(A1) X and Y have finite fourth order moments 4 E X < ∞, E|Y |4 < ∞,(A2) the covariance matrix Ψ = EXX ∈ Rp×p is invertible,(A3) for every k = 1, . . . , K, if (β )+ > 0 and (β )− > 0 then for every j ∈ Gk β j = 0. (All sign-coherent groups are either included or excluded from the true support).cooperative-Lasso 22
  56. 56. Irrepresentability condition Define Sk = S ∩ Gk the support within a group and −1 [D(β)]jj = [sign(βj )β Gk ]+ . Assume there exists η > 0 such that(A4) For every group Gk including at least one null coefficient: max( (ΨSk S Ψ−1 D(β S )β S )+ , (ΨSk S Ψ−1 D(β S )β S )− ) ≤ 1 − η, c SS c SS(A5) For every group Gk intersecting the support and including either positive or negative coefficients, let νk be the sign of these coefficients (νk = 1 if (β Gk )+ > 0 and νk = −1 if (β Gk )− > 0): νk ΨSk S Ψ−1 D(β S )β S c SS 0, where denotes componentwise inequality.cooperative-Lasso 23
  57. 57. Consistency results Theorem If assumptions (A1-5) are satisfied and if there exists η > 0, then for every sequence λn such that λn = λ0 n−γ , γ ∈]0, 1/2[, ˆcoop −→ β β P ˆ and P(S(β coop ) = S) → 1. Asymptotically, the cooperative-Lasso is unbiased and enjoys exact support recovery (even when there are irrelevant variables within a group).cooperative-Lasso 24
  58. 58. Sketch of the proof ˜ 1. Construct an artifical estimator β S restricted to the true support S and extend it with 0 coefficients on S c . ˜ 2. Consider the event En on which β satisfies the original optimality coop ˜ conditions. On En , β = β ˆ ˆcoop and β c = 0, by uniqueness. S S S 3. We need to prove that limn→∞ P(En ) = 1. 4. Derive the asymptotic distribution of the derivative of the loss ˜ function X (y − Xβ) from TCL on second order moments, ˜ Optimality conditions on β S . Right choice of λn provides convergence in probability. 5. Assumptions (A4-5) assume that the limits in probability satisfy optimality constraints with strict inequalities. 6. As a result, optimility conditions are satisfied (with large inequalities) with probability tending to 1.cooperative-Lasso 25
  59. 59. Sketch of the proof ˜ 1. Construct an artifical estimator β S restricted to the true support S and extend it with 0 coefficients on S c . ˜ 2. Consider the event En on which β satisfies the original optimality coop ˜ conditions. On En , β = β ˆ ˆcoop and β c = 0, by uniqueness. S S S 3. We need to prove that limn→∞ P(En ) = 1. 4. Derive the asymptotic distribution of the derivative of the loss ˜ function X (y − Xβ) from TCL on second order moments, ˜ Optimality conditions on β S . Right choice of λn provides convergence in probability. 5. Assumptions (A4-5) assume that the limits in probability satisfy optimality constraints with strict inequalities. 6. As a result, optimility conditions are satisfied (with large inequalities) with probability tending to 1.cooperative-Lasso 25
  60. 60. Sketch of the proof ˜ 1. Construct an artifical estimator β S restricted to the true support S and extend it with 0 coefficients on S c . ˜ 2. Consider the event En on which β satisfies the original optimality coop ˜ conditions. On En , β = β ˆ ˆcoop and β c = 0, by uniqueness. S S S 3. We need to prove that limn→∞ P(En ) = 1. 4. Derive the asymptotic distribution of the derivative of the loss ˜ function X (y − Xβ) from TCL on second order moments, ˜ Optimality conditions on β S . Right choice of λn provides convergence in probability. 5. Assumptions (A4-5) assume that the limits in probability satisfy optimality constraints with strict inequalities. 6. As a result, optimility conditions are satisfied (with large inequalities) with probability tending to 1.cooperative-Lasso 25
  61. 61. Sketch of the proof ˜ 1. Construct an artifical estimator β S restricted to the true support S and extend it with 0 coefficients on S c . ˜ 2. Consider the event En on which β satisfies the original optimality coop ˜ conditions. On En , β = β ˆ ˆcoop and β c = 0, by uniqueness. S S S 3. We need to prove that limn→∞ P(En ) = 1. 4. Derive the asymptotic distribution of the derivative of the loss ˜ function X (y − Xβ) from TCL on second order moments, ˜ Optimality conditions on β S . Right choice of λn provides convergence in probability. 5. Assumptions (A4-5) assume that the limits in probability satisfy optimality constraints with strict inequalities. 6. As a result, optimility conditions are satisfied (with large inequalities) with probability tending to 1.cooperative-Lasso 25
  62. 62. Illustration 1.0 0.5 Generate data y = Xβ + σε,coefficients β = (1, 1, −1, −1, 0, 0, 0, 0) G = {{1, 2}, {3, 4}, {5, 6}, {7, 8}} 0.0 σ = 0.1, R2 ≈ 0.99, n = 20, irrepresentability conditions -0.5 holds for the coop-Lasso, holds not for the group-Lasso. average over 100 simulations. -1.0 -3 -2 -1 0 1 log10 (λ) Fig.:: 50% coverage intervals (upper / lower quartiles)cooperative-Lasso 26
  63. 63. Illustration 1.0 0.5 Generate data y = Xβ + σε,coefficients β = (1, 1, −1, −1, 0, 0, 0, 0) G = {{1, 2}, {3, 4}, {5, 6}, {7, 8}} 0.0 σ = 0.1, R2 ≈ 0.99, n = 20, irrepresentability conditions -0.5 holds for the coop-Lasso, holds not for the group-Lasso. average over 100 simulations. -1.0 -3 -2 -1 0 1 log10 (λ) Fig.:group-Lasso: 50% coverage intervals (upper / lower quartiles)cooperative-Lasso 26
  64. 64. Illustration 1.0 0.5 Generate data y = Xβ + σε,coefficients β = (1, 1, −1, −1, 0, 0, 0, 0) G = {{1, 2}, {3, 4}, {5, 6}, {7, 8}} 0.0 σ = 0.1, R2 ≈ 0.99, n = 20, irrepresentability conditions -0.5 holds for the coop-Lasso, holds not for the group-Lasso. average over 100 simulations. -1.0 -3 -2 -1 0 1 log10 (λ) Fig.:coop-Lasso: 50% coverage intervals (upper / lower quartiles)cooperative-Lasso 26
  65. 65. Outline Definition Resolution Consistency Model selection Simulation studies Sibling probe sets and gene selectioncooperative-Lasso 27
  66. 66. Optimism of the training error The training error: 1 ˆ err = L(yi , xi β). |D| i∈D The test error (“extra-sample” error): ˆ Errex = EX,Y [L(Y, X β)|D]. The “in-sample” error 1 ˆ Errin = EY L(Yi , xi β)|D . |D| i∈D Definition (Optimism) Errin = err + ”optimism”.cooperative-Lasso 28
  67. 67. Optimism of the training error The training error: 1 ˆ err = L(yi , xi β). |D| i∈D The test error (“extra-sample” error): ˆ Errex = EX,Y [L(Y, X β)|D]. The “in-sample” error 1 ˆ Errin = EY L(Yi , xi β)|D . |D| i∈D Definition (Optimism) Errin = err + ”optimism”.cooperative-Lasso 28
  68. 68. Cp statistics For squared-error loss (and some other loss), 2 Errin = err + cov(ˆi , yi ). y |D| i∈D The amount by which err underestimates the true error depends on how strongly yi affects its own prediction. The harder we fit the data, the greater the covariance will be thereby increasing the optimism (ESLII 5th print). Mallows’ Cp Statistic ˆ For a linear regression fit yi with p inputs i∈D cov(ˆi , yi ) = pσ 2 : y df 2 Cp = err + 2 · ˆ σ , with df = p. |D|cooperative-Lasso 29
  69. 69. Cp statistics For squared-error loss (and some other loss), 2 Errin = err + cov(ˆi , yi ). y |D| i∈D The amount by which err underestimates the true error depends on how strongly yi affects its own prediction. The harder we fit the data, the greater the covariance will be thereby increasing the optimism (ESLII 5th print). Mallows’ Cp Statistic ˆ For a linear regression fit yi with p inputs i∈D cov(ˆi , yi ) = pσ 2 : y df 2 Cp = err + 2 · ˆ σ , with df = p. |D|cooperative-Lasso 29
  70. 70. Generalized degrees of freedom ˆ ˆ Let y(λ) = Xβ(λ) be the predicted values for a penalized estimator. Proposition (Efron (’04)+ Stein’s Lemma (’81)) . 1 ˆ ∂ yλ df(λ) = 2 cov(ˆi (λ), yi ) = Ey tr y . σ ∂y i∈D For the Lasso, Zou et al. (’07) show that ˆlasso (λ) df lasso (λ) = β . 0 Assuming X X = I Yuan and Lin (’06) show for the group-Lasso that the trace term equals ˆgroup   K β Gk (λ) df group (λ) = ˆgroup 1 β Gk (λ) > 0 1 + (pk − 1) . k=1 β ols Gkcooperative-Lasso 30
  71. 71. Generalized degrees of freedom ˆ ˆ Let y(λ) = Xβ(λ) be the predicted values for a penalized estimator. Proposition (Efron (’04)+ Stein’s Lemma (’81)) . 1 ˆ ∂ yλ df(λ) = 2 cov(ˆi (λ), yi ) = Ey tr y . σ ∂y i∈D For the Lasso, Zou et al. (’07) show that ˆlasso (λ) df lasso (λ) = β . 0 Assuming X X = I Yuan and Lin (’06) show for the group-Lasso that the trace term equals ˆgroup   K β Gk (λ) df group (λ) = ˆgroup 1 β Gk (λ) > 0 1 + (pk − 1) . k=1 β ols Gkcooperative-Lasso 30
  72. 72. Generalized degrees of freedom ˆ ˆ Let y(λ) = Xβ(λ) be the predicted values for a penalized estimator. Proposition (Efron (’04)+ Stein’s Lemma (’81)) . 1 ˆ ∂ yλ df(λ) = 2 cov(ˆi (λ), yi ) = Ey tr y . σ ∂y i∈D For the Lasso, Zou et al. (’07) show that ˆlasso (λ) df lasso (λ) = β . 0 Assuming X X = I Yuan and Lin (’06) show for the group-Lasso that the trace term equals ˆgroup   K β Gk (λ) df group (λ) = ˆgroup 1 β Gk (λ) > 0 1 + (pk − 1) . k=1 β ols Gkcooperative-Lasso 30
  73. 73. Approximated degrees of freedom for the coop-Lasso Proposition Assuming that data are generated according to a linear regression model and that X is orthonormal, the following expression of df coop (λ) is an unbiased estimate of df(λ) +   K ˆcoop β Gk (λ) 1 + (pk − 1)   df coop (λ) = 1 + +  ˆcoop + β G (λ) >0 ˆols   k=1 k β Gk −   ˆcoop β Gk (λ) 1 + (pk − 1)   +1 − −  , ˆcoop − β G (λ) >0 β ols   k Gk where pk and pk are respectively the number of positive and negative + − ˆols entries in β (γ). Gkcooperative-Lasso 31
  74. 74. Approximated degrees of freedom for the coop-Lasso Proposition Assuming that data are generated according to a linear regression model and that X is orthonormal, the following expression of df coop (λ) is an unbiased estimate of df(λ) +   K ˆcoop β Gk (λ) k 1 + p+ − 1   df coop (λ) = 1 +  ˆcoop 1+γ + β G (λ) >0 ˆridge   k=1 k β Gk (γ) −   ˆcoop β Gk (λ) k 1 + p− − 1   +1 −  , ˆcoop 1+γ − β G (λ) >0 ˆridge   k β Gk (γ) where pk and pk are respectively the number of positive and negative + − entries in βˆridge (γ). Gkcooperative-Lasso 31
  75. 75. Approximated information criteria Following Zou et al, we extend the Cp stat to an “approximated” AIC y − y(λ) ˆ ˜ AIC(λ) = + 2df(λ), σ2 and from the AIC, there is (small) step to BIC: y − y(λ) ˆ ˜ BIC(λ) = + log(n)df(λ). σ2 The K–fold cross-validation works well but is computationally intensive. It is required when we do not meet the linear regression setup. . .cooperative-Lasso 32
  76. 76. Outline Definition Resolution Consistency Model selection Simulation studies Sibling probe sets and gene selectioncooperative-Lasso 33
  77. 77. Revisiting Elastic-Net experiments (1) q Generate data y = Xβ + σε, 70 q q q β = q q (0, . . . , 0, 2, . . . , 2, 0, . . . , 0, 2, . . . , 2) 60 q q q q q 10 10 10 10 50 G1 = {1, . . . , 10}, G2 = {11, . . . , 20},MSE G3 = {21, . . . , 30}, 40 G4 = {31, . . . , 40}. σ = 15, corr(xi , xj ) = 0.5, 30 training/validation/test/ = 100/100/400, 20 q average over 100 simulations. 10 lasso enet group coopcooperative-Lasso 34
  78. 78. Revisiting Elastic-Net experiments (2) Generate data y = Xβ + σε, β = (3, . . . , 3, 0, . . . , 0) q 250 15 25 q q σ = 15, 200 G1 = {1, . . . , 5}, G2 = {6, . . . , 10}, q G3 = {11, . . . , 15}, 150 G4 = {16, . . . , 40}.MSE xj = Z1 + ε, Z1 ∼ N (0, 1), ∀j ∈ G1 100 q q q q q xj = Z3 + ε, Z2 ∼ N (0, 1), ∀j ∈ G2 q xj = Z3 + ε, Z3 ∼ N (0, 1), ∀j ∈ G3 50 xj ∼ N (0, 1), ∀j ∈ G4 . training/validation/test/ = 50/50/400, 0 lasso enet group coop average over 100 simulations.cooperative-Lasso 35
  79. 79. Breiman’s setup Simulations setting A wave-like vector of parameters β p = 90 variables partitioned into K = 10 groups of size pk = 9, 3 (partially) active groups, 6 groups of zeros, in active groups, β j ∝ (h − |5 − j|) with h = 1, . . . , 5. 0 20 40 60 80 Figure: β with h = 1, |Sk | = 1 non-zero coefficients in each active group.cooperative-Lasso 36
  80. 80. Breiman’s setup Simulations setting A wave-like vector of parameters β p = 90 variables partitioned into K = 10 groups of size pk = 9, 3 (partially) active groups, 6 groups of zeros, in active groups, β j ∝ (h − |5 − j|) with h = 1, . . . , 5. 0 20 40 60 80 Figure: β with h = 2, |Sk | = 3 non-zero coefficients in each active group.cooperative-Lasso 36
  81. 81. Breiman’s setup Simulations setting A wave-like vector of parameters β p = 90 variables partitioned into K = 10 groups of size pk = 9, 3 (partially) active groups, 6 groups of zeros, in active groups, β j ∝ (h − |5 − j|) with h = 1, . . . , 5. 0 20 40 60 80 Figure: β with h = 3, |Sk | = 5 non-zero coefficients in each active group.cooperative-Lasso 36
  82. 82. Breiman’s setup Simulations setting A wave-like vector of parameters β p = 90 variables partitioned into K = 10 groups of size pk = 9, 3 (partially) active groups, 6 groups of zeros, in active groups, β j ∝ (h − |5 − j|) with h = 1, . . . , 5. 0 20 40 60 80 Figure: β with h = 4, |Sk | = 7 non-zero coefficients in each active group.cooperative-Lasso 36
  83. 83. Breiman’s setup Simulations setting A wave-like vector of parameters β p = 90 variables partitioned into K = 10 groups of size pk = 9, 3 (partially) active groups, 6 groups of zeros, in active groups, β j ∝ (h − |5 − j|) with h = 1, . . . , 5. 0 20 40 60 80 Figure: β with h = 5, |Sk | = 9 non-zero coefficients in each active group.cooperative-Lasso 36
  84. 84. Breiman’s setup Example of path of solution and signal recovery with BIC choice The signal strength is generated so as y = Xβ + σ , with σ = 1, n = 30 to 500, X ∼ N (0, Ψ) with Ψij = ρ|i−j| (ρ = 0.4 in the example), magnitude in β chosen so as R2 ≈ 0.75. Remark Covariance structure is purposely disconnected from the group structure. None of the support recovery conditions are fulfilled.cooperative-Lasso 37
  85. 85. Breiman’s setup Example of path of solution and signal recovery with BIC choice The signal strength is generated so as y = Xβ + σ , with σ = 1, n = 30 to 500, X ∼ N (0, Ψ) with Ψij = ρ|i−j| (ρ = 0.4 in the example), magnitude in β chosen so as R2 ≈ 0.75. One shot sample with n = 120cooperative-Lasso 37
  86. 86. Breiman’s setup Example of path of solution and signal recovery with BIC choice The signal strength is generated so as y = Xβ + σ , with σ = 1, n = 30 to 500, X ∼ N (0, Ψ) with Ψij = ρ|i−j| (ρ = 0.4 in the example), magnitude in β chosen so as R2 ≈ 0.75. 0.6 0.5 0.4 0.4 0.3 ˆlasso ˆlasso 0.2 True signal 0.2 β β Estimated signal 0.1 0.0 0.0 -0.2 -0.1 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 log10 (λ) i Figure: Lassocooperative-Lasso 37
  87. 87. Breiman’s setup Example of path of solution and signal recovery with BIC choice The signal strength is generated so as y = Xβ + σ , with σ = 1, n = 30 to 500, X ∼ N (0, Ψ) with Ψij = ρ|i−j| (ρ = 0.4 in the example), magnitude in β chosen so as R2 ≈ 0.75. 0.5 0.5 0.4 0.4 0.3 0.3 ˆgroup ˆgroup 0.2 True signal 0.2 β β 0.1 Estimated signal 0.1 0.0 0.0 -0.1 -0.1 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 0 20 40 60 80 log10 (λ) i Figure: Group-Lassocooperative-Lasso 37
  88. 88. Breiman’s setup Example of path of solution and signal recovery with BIC choice The signal strength is generated so as y = Xβ + σ , with σ = 1, n = 30 to 500, X ∼ N (0, Ψ) with Ψij = ρ|i−j| (ρ = 0.4 in the example), magnitude in β chosen so as R2 ≈ 0.75. 0.5 0.5 0.4 0.4 0.3 0.3 ˆcoop ˆcoop True signal 0.2 0.2 β β Estimated signal 0.1 0.1 0.0 0.0 -0.1 -0.1 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 0 20 40 60 80 log10 (λ) i Figure: Coop-Lassocooperative-Lasso 37
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×