SlideShare a Scribd company logo
1 of 20
Download to read offline
Group and Hierarchical
Variable Selection
Hai Nguyen
Bioinformatics center, Kyoto University
hai@kuicr.kyoto-u.ac.jp
haidnguyen0909@gmail.com
Introduction
q Response: 𝑦 = (𝑦$, 𝑦&, … , 𝑦()
*
q predictors : 𝑥, = (𝑥,$, 𝑥,&, …, 𝑥,-)
*
, 𝑖 = 1, . . , 𝑛
qLinear model: 𝑦, = 𝛽3 + ∑ 𝛽6 𝑥66 + 𝜀
-
68$
q2-way interaction model: 𝑦, = 𝛽3 + ∑ 𝛽6 𝑥,6 + ∑ 𝜃6: 𝑥,6 𝑥,: + 𝜀6;:
-
68$
1) ∑ 𝛽6 𝑥,6
-
68$ :	
  main	
  effect	
  term,	
   𝛽 ∈ ℝ-
2) ∑ 𝜃6: 𝑥,6 𝑥,:6;: : interaction term, 𝜃 ∈ ℝ-J-
Introduction
q Problems to be addressed in high dimensional data:
1) Predictive performance
2) Interpretability
3) Highly correlated variables
Sparsity assumption: # of nonzero coeffs 𝛽6
K
𝑠 and/or interaction 𝜃6:
K
𝑠 is very
few.
Introduction
Variable selection Group selection Hierarchical selection
LASSO GROUP	
  LASSO HIERARCHICAL	
  LASSO
Introduction
q Shrinkage methods based on regularization
𝛽M = 𝑎𝑟𝑔𝑚𝑖𝑛R 	
   𝑙 𝛽 + 𝜆 U
|𝛽|$, 	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
   𝐿 𝑎𝑠𝑠𝑜
||𝛽||&
&
, 	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
   𝑅 𝑖𝑑𝑔𝑒
Where 𝑙 𝛽 is the loss function wr.t. 𝛽, e.g., square, logistic, hinge
losses
1) Ridge: prevent overfitting but not variable selection
2) Lasso: variable selection but only select one for each group of
correlated variables.
Group selection
q Group Lasso (Yuan et al., 2006)
Coefficients are organized into K groups (known in advance):
𝑔$, 𝑔&, …, 𝑔 	
  ⊆ 1,2,… , 𝑝 , disjoint and then the Group-Lasso pelnaty:
𝜆 ∑ 𝑑:||𝛽_`
||&: , 	
  	
  	
  	
  	
  	
  	
  where ||𝛽_`
||& = ∑ 𝛽,
&
,∈_`
q Properties:
1) Group-size = 1 -> LASSO
2) Convex penalty
3) Encourage to select or remove the entire group
How	
  to	
  do	
  group	
  selection	
  without	
  prior	
  knowledge	
  of	
  group	
  structures?
Group selection: automatic feature group
q Elastic Net (Zou et al., 2005)
A linear combination of ridge and LASSO penalties for group selection
via the penalty:
	
  	
  	
  	
  	
  	
  	
   𝛼 c |𝛽6|
-
68$
+ (1 − 𝛼) c 𝛽6
&
-
68$
q Properties:
1) L1 term leads to a sparse solution
2) L2 term forces highly correlated variables to be averaged
Group selection: automatic feature group
(cont. )
q OSCAR (Bondell et al., 2008)
A combination of LASSO penalties and 𝐿e for	
  each	
  pair	
  of	
  vars
c |𝛽6|
-
68$
+ 𝑐 c max	
  {|𝛽6|, |𝛽:|}
6;:
q Properties:
1) Encourage equality of coeffs
Group selection: automatic feature group
(cont.)
q Fused LASSO (Friedman et al., 2007)
A lasso term + fused penalty
	
  	
  	
  	
  	
  	
  	
   𝛼 c |𝛽6|
-
68$
+ (1 − 𝛼) c |𝛽6 − 𝛽6o$|
-
68&
q Properties:
1) Encourage sparsity in the differences of coffs.
2) Introduced to account for 1-d correlation of predictors
Group selection: automatic feature group
(cont.)
q HORSE (Friedman et al., 2007)
Extension of fused LASSO
	
  	
  	
  	
  	
  	
  	
   𝛼 c |𝛽6|
-
68$
+ (1 − 𝛼) c |𝛽6 − 𝛽6o$|
6;:
q Properties:
1) Encourage sparsity in the differences of coffs.
2) Fused lasso for pairs of vars
Hierarchy selection
q Hierarchy restriction for interaction models
1) Strong hierarchy: 𝜃6: ≠ 0 → 𝛽6 ≠ 0 and 𝛽: ≠ 0 (SH)
2) Weak hierarchy: 𝜃6: ≠ 0 → 𝛽6 ≠ 0 or	
   𝛽: ≠ 0 (WH)
𝛽6 𝛽:
𝜃6:
𝛽s
𝜃:s
Hierarchy selection
q SHIM (Choi et al., 2010)
Simply reparameterize the coeffsof 2-way interaction model:
𝑦, = 𝛽3 + c 𝛽6 𝑥,6 + c 𝜃6: 𝑥,6 𝑥,: + 𝜀
6;:
-
68$
become: 𝑦, = 𝛽3 + ∑ 𝛽6 𝑥,6 + ∑ 𝛾6: 𝛽6 𝛽: 𝑥,6 𝑥,: + 𝜀6;:
-
68$
q Properties:
1) satisfy “strong hierarchy”
2) but “Non-convex”, alternative minimization strategy for optimization.
Hierarchy selection
q Composite Absolute Penalties (CAP) (Zhao et al., 2009)
Use overlapping group selection to induce hierarchy selection.
Consider X1, X2. Hierarchy X1->X2 can be induced by:
𝑇 𝛽 = ||(𝛽$, 𝛽&)||vw
+ ||(𝛽&)||vx
Hierarchy selection
q Composite Absolute Penalties (Zhao et al., 2009)
Hiearchical structured sparsity for 2-way interaction model can be
obtained by:
𝑇(𝛽, 𝜃) = ∑ {|𝜃6:| + ||(𝛽6, 𝛽:, 𝜃6:)||vy`
}6z:
𝛽6 𝛽:
𝜃6:
𝛽s
𝜃:s
Hierarchy selection
q Hierarchicalinteraction LASSO (Bien et al., 2013)
Addition of convex constraints to the lasso to produce sparse interaction
models inducing hierarchicalconditions. Start with the following:
	
  	
  	
  	
  	
  	
  	
   𝑚𝑖𝑛R,{ 𝑙 𝛽, 𝜃 + 𝜆||𝛽||$ +
𝜆
2
||𝜃||$
s.t. |
𝜃 = 𝜃*
||𝜃6||$ ≤ |𝛽6|
q Properties:
1) Automatically satisfy “strong hierarchy” (𝜃,6 ≠ 0 −> 𝛽, ≠ 0	
  & 𝛽6 ≠ 0)
2) But “Non-convex”
Hierarchy selection
q Hierarchical interaction LASSO (Bien et al., 2013)
Convex relaxation: replace 𝛽 by 𝛽€
− 𝛽o
(𝛽€
, 𝛽o
≥ 0), then:
	
  	
  	
  	
  	
  	
  	
   𝑚 𝑖𝑛R‚
,Rƒ
,{ 𝑙 𝛽€
− 𝛽o
, 𝜃 + 𝜆1*
(𝛽€
+ 𝛽o
) +
𝜆
2
||𝜃||$
s.t.
𝜃 = 𝜃*
||𝜃6||$ ≤ 𝛽6
€
+ 𝛽6
o
𝛽6
€
, 𝛽6
o
≥ 0
q Properties:
1) Still satisfy “strong hierarchy” (𝜃6: ≠ 0 −> 𝛽6 ≠ 0	
  & 𝛽: ≠ 0)
2) Equivalent to : 𝜆	
   ∑ 𝑚𝑎𝑥( 𝛽6 ,|𝜃6|)
-
68$ +
„
&
||𝜃||$
3) Optimization is bit hard due to symmetry constraint, but can use AMMD
Hierarchy selection
q Hierarchicalinteraction LASSO (Bien et al., 2013)
Removing symmetry constraint, then:
	
  	
  	
  	
  	
  	
  	
   𝑚 𝑖𝑛R,{ 𝑙 𝛽€
− 𝛽o
, 𝜃 + 𝜆1*
(𝛽€
+ 𝛽o
) +
𝜆
2
||𝜃||$
s.t. …
||𝜃6||$ ≤ 𝛽6
€
+ 𝛽6
o
𝛽6
€
, 𝛽6
o
≥ 0
q Properties:
1) Now only satisfy “weak hierarchy” (𝜃,6 ≠ 0 −> 𝛽, ≠ 0	
  & 𝛽6 ≠ 0)
2) “convex”
3) Optimization is easy because of separate 𝛽6
€
+ 𝛽6
o
	
  (Proximal Operator)
Hierarchy selection
q VANISH (Zhao et al., 2009)
1) Linear model: 𝑌 = ∑ 𝛽6 𝑋6 + ∑ 𝜃6: 𝑋6 ∘ 𝑋: +6;: 𝜀
-
68$
2) Nonlinear: 𝑌 = ∑ 𝑓6 + ∑ 𝑓6: +6;: 𝜀
-
68$
3) penalty: 𝑃 𝑓 = 𝜆$ ∑ (||𝑓6||&
+ ∑ ||𝑓6:||&
:z6 )
w
x+𝜆&
-
68$
∑ ||𝑓6:||6;:
Remark: if 𝑓6 = 𝑤6 𝑋6	
  , 𝑗 = 1, … , 𝑝, and X is normalized, then penalty
becomes:
𝑃 𝑤, 𝜃 = 𝜆$ c ||(𝛽6, 𝜃6)||& + 𝜆&
-
68$
c |𝜃6:|
6;:
Hierarchy selection
q GRESH (She et al., 2013)
Proposed a general model of previously mentioned regularization of the
following form:
min
•8[R,{]
𝑙 𝛽, 𝜃 + 𝜆$|𝜃|$ + 𝜆& c ||𝛽6, 𝑧(𝜃6)||‘
-
68$
s.t. 𝜃*
= 𝜃
q Remark:
1) If 𝑧 𝜃6 = 𝜃6
*
and 𝑞 = 2, 𝑡ℎ𝑒𝑛	
  it becomes VANISH
2) If 𝑧 𝜃6 = |𝜃6|$ and 𝑞 = ∞, 𝑡ℎ𝑒𝑛	
  it becomes HiLASSO
Conclusion
• Group	
  Selection
• Hierarchical	
  selection

More Related Content

What's hot

Etude Des Proprietes Physicochimiques Et Caracterisation Dune Argile Locale
Etude Des Proprietes Physicochimiques Et Caracterisation Dune Argile LocaleEtude Des Proprietes Physicochimiques Et Caracterisation Dune Argile Locale
Etude Des Proprietes Physicochimiques Et Caracterisation Dune Argile LocaleRaouf Alsaytara
 
近似ベイズ計算によるベイズ推定
近似ベイズ計算によるベイズ推定近似ベイズ計算によるベイズ推定
近似ベイズ計算によるベイズ推定Kosei ABE
 
Double integration final
Double integration finalDouble integration final
Double integration finalroypark31
 
Some properties of two-fuzzy Nor med spaces
Some properties of two-fuzzy Nor med spacesSome properties of two-fuzzy Nor med spaces
Some properties of two-fuzzy Nor med spacesIOSR Journals
 
Strongly Unique Best Simultaneous Coapproximation in Linear 2-Normed Spaces
Strongly Unique Best Simultaneous Coapproximation in Linear 2-Normed SpacesStrongly Unique Best Simultaneous Coapproximation in Linear 2-Normed Spaces
Strongly Unique Best Simultaneous Coapproximation in Linear 2-Normed SpacesIOSR Journals
 
Options Portfolio Selection
Options Portfolio SelectionOptions Portfolio Selection
Options Portfolio Selectionguasoni
 
Numerical solution of boundary value problems by piecewise analysis method
Numerical solution of boundary value problems by piecewise analysis methodNumerical solution of boundary value problems by piecewise analysis method
Numerical solution of boundary value problems by piecewise analysis methodAlexander Decker
 
Lecture 3: Stochastic Hydrology
Lecture 3: Stochastic HydrologyLecture 3: Stochastic Hydrology
Lecture 3: Stochastic HydrologyAmro Elfeki
 
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...Joe Suzuki
 
(α ψ)- Construction with q- function for coupled fixed point
(α   ψ)-  Construction with q- function for coupled fixed point(α   ψ)-  Construction with q- function for coupled fixed point
(α ψ)- Construction with q- function for coupled fixed pointAlexander Decker
 
Lecture 2: Stochastic Hydrology
Lecture 2: Stochastic Hydrology Lecture 2: Stochastic Hydrology
Lecture 2: Stochastic Hydrology Amro Elfeki
 
A Conjecture on Strongly Consistent Learning
A Conjecture on Strongly Consistent LearningA Conjecture on Strongly Consistent Learning
A Conjecture on Strongly Consistent LearningJoe Suzuki
 
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...Joe Suzuki
 
11.a focus on a common fixed point theorem using weakly compatible mappings
11.a focus on a common fixed point theorem using weakly compatible mappings11.a focus on a common fixed point theorem using weakly compatible mappings
11.a focus on a common fixed point theorem using weakly compatible mappingsAlexander Decker
 
A focus on a common fixed point theorem using weakly compatible mappings
A focus on a common fixed point theorem using weakly compatible mappingsA focus on a common fixed point theorem using weakly compatible mappings
A focus on a common fixed point theorem using weakly compatible mappingsAlexander Decker
 
Some Other Properties of Fuzzy Filters on Lattice Implication Algebras
Some Other Properties of Fuzzy Filters on Lattice Implication AlgebrasSome Other Properties of Fuzzy Filters on Lattice Implication Algebras
Some Other Properties of Fuzzy Filters on Lattice Implication Algebrasijceronline
 

What's hot (20)

Etude Des Proprietes Physicochimiques Et Caracterisation Dune Argile Locale
Etude Des Proprietes Physicochimiques Et Caracterisation Dune Argile LocaleEtude Des Proprietes Physicochimiques Et Caracterisation Dune Argile Locale
Etude Des Proprietes Physicochimiques Et Caracterisation Dune Argile Locale
 
近似ベイズ計算によるベイズ推定
近似ベイズ計算によるベイズ推定近似ベイズ計算によるベイズ推定
近似ベイズ計算によるベイズ推定
 
QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...
QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...
QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...
 
Double integration final
Double integration finalDouble integration final
Double integration final
 
A
AA
A
 
Some properties of two-fuzzy Nor med spaces
Some properties of two-fuzzy Nor med spacesSome properties of two-fuzzy Nor med spaces
Some properties of two-fuzzy Nor med spaces
 
Strongly Unique Best Simultaneous Coapproximation in Linear 2-Normed Spaces
Strongly Unique Best Simultaneous Coapproximation in Linear 2-Normed SpacesStrongly Unique Best Simultaneous Coapproximation in Linear 2-Normed Spaces
Strongly Unique Best Simultaneous Coapproximation in Linear 2-Normed Spaces
 
Options Portfolio Selection
Options Portfolio SelectionOptions Portfolio Selection
Options Portfolio Selection
 
Numerical solution of boundary value problems by piecewise analysis method
Numerical solution of boundary value problems by piecewise analysis methodNumerical solution of boundary value problems by piecewise analysis method
Numerical solution of boundary value problems by piecewise analysis method
 
Galerkin method
Galerkin methodGalerkin method
Galerkin method
 
Lecture 3: Stochastic Hydrology
Lecture 3: Stochastic HydrologyLecture 3: Stochastic Hydrology
Lecture 3: Stochastic Hydrology
 
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
 
(α ψ)- Construction with q- function for coupled fixed point
(α   ψ)-  Construction with q- function for coupled fixed point(α   ψ)-  Construction with q- function for coupled fixed point
(α ψ)- Construction with q- function for coupled fixed point
 
Lecture 2: Stochastic Hydrology
Lecture 2: Stochastic Hydrology Lecture 2: Stochastic Hydrology
Lecture 2: Stochastic Hydrology
 
A Conjecture on Strongly Consistent Learning
A Conjecture on Strongly Consistent LearningA Conjecture on Strongly Consistent Learning
A Conjecture on Strongly Consistent Learning
 
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...
 
Boolean algebra laws
Boolean algebra lawsBoolean algebra laws
Boolean algebra laws
 
11.a focus on a common fixed point theorem using weakly compatible mappings
11.a focus on a common fixed point theorem using weakly compatible mappings11.a focus on a common fixed point theorem using weakly compatible mappings
11.a focus on a common fixed point theorem using weakly compatible mappings
 
A focus on a common fixed point theorem using weakly compatible mappings
A focus on a common fixed point theorem using weakly compatible mappingsA focus on a common fixed point theorem using weakly compatible mappings
A focus on a common fixed point theorem using weakly compatible mappings
 
Some Other Properties of Fuzzy Filters on Lattice Implication Algebras
Some Other Properties of Fuzzy Filters on Lattice Implication AlgebrasSome Other Properties of Fuzzy Filters on Lattice Implication Algebras
Some Other Properties of Fuzzy Filters on Lattice Implication Algebras
 

Similar to Hierarchical selection

Learning a nonlinear embedding by preserving class neibourhood structure 최종
Learning a nonlinear embedding by preserving class neibourhood structure   최종Learning a nonlinear embedding by preserving class neibourhood structure   최종
Learning a nonlinear embedding by preserving class neibourhood structure 최종WooSung Choi
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelineChenYiHuang5
 
Mimo system-order-reduction-using-real-coded-genetic-algorithm
Mimo system-order-reduction-using-real-coded-genetic-algorithmMimo system-order-reduction-using-real-coded-genetic-algorithm
Mimo system-order-reduction-using-real-coded-genetic-algorithmCemal Ardil
 
block-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfblock-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfJunghyun Lee
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Regularisation & Auxiliary Information in OOD Detection
Regularisation & Auxiliary Information in OOD DetectionRegularisation & Auxiliary Information in OOD Detection
Regularisation & Auxiliary Information in OOD Detectionkirk68
 
Support Vector Machine Classifiers
Support Vector Machine ClassifiersSupport Vector Machine Classifiers
Support Vector Machine ClassifiersAerofoil Kite
 
PCB_Lect02_Pairwise_allign (1).pdf
PCB_Lect02_Pairwise_allign (1).pdfPCB_Lect02_Pairwise_allign (1).pdf
PCB_Lect02_Pairwise_allign (1).pdfssusera1eccd
 
Regularization and variable selection via elastic net
Regularization and variable selection via elastic netRegularization and variable selection via elastic net
Regularization and variable selection via elastic netKyusonLim
 
Optimization of positive linear systems via geometric programming
Optimization of positive linear systems via geometric programmingOptimization of positive linear systems via geometric programming
Optimization of positive linear systems via geometric programmingMasaki Ogura
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics JCMwave
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-opticsA machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-opticsJCMwave
 
Face verification techniques: how to speed up dataset creation
Face verification techniques: how to speed up dataset creationFace verification techniques: how to speed up dataset creation
Face verification techniques: how to speed up dataset creationDeep Learning Italia
 

Similar to Hierarchical selection (20)

Learning a nonlinear embedding by preserving class neibourhood structure 최종
Learning a nonlinear embedding by preserving class neibourhood structure   최종Learning a nonlinear embedding by preserving class neibourhood structure   최종
Learning a nonlinear embedding by preserving class neibourhood structure 최종
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
Mimo system-order-reduction-using-real-coded-genetic-algorithm
Mimo system-order-reduction-using-real-coded-genetic-algorithmMimo system-order-reduction-using-real-coded-genetic-algorithm
Mimo system-order-reduction-using-real-coded-genetic-algorithm
 
Modifed my_poster
Modifed my_posterModifed my_poster
Modifed my_poster
 
block-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfblock-mdp-masters-defense.pdf
block-mdp-masters-defense.pdf
 
硕士论文
硕士论文硕士论文
硕士论文
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Deepa seminar
Deepa seminarDeepa seminar
Deepa seminar
 
Regression
RegressionRegression
Regression
 
Regularisation & Auxiliary Information in OOD Detection
Regularisation & Auxiliary Information in OOD DetectionRegularisation & Auxiliary Information in OOD Detection
Regularisation & Auxiliary Information in OOD Detection
 
Support Vector Machine Classifiers
Support Vector Machine ClassifiersSupport Vector Machine Classifiers
Support Vector Machine Classifiers
 
PCB_Lect02_Pairwise_allign (1).pdf
PCB_Lect02_Pairwise_allign (1).pdfPCB_Lect02_Pairwise_allign (1).pdf
PCB_Lect02_Pairwise_allign (1).pdf
 
Exponential decay for the solution of the nonlinear equation induced by the m...
Exponential decay for the solution of the nonlinear equation induced by the m...Exponential decay for the solution of the nonlinear equation induced by the m...
Exponential decay for the solution of the nonlinear equation induced by the m...
 
Four Point Gauss Quadrature Runge – Kuta Method Of Order 8 For Ordinary Diffe...
Four Point Gauss Quadrature Runge – Kuta Method Of Order 8 For Ordinary Diffe...Four Point Gauss Quadrature Runge – Kuta Method Of Order 8 For Ordinary Diffe...
Four Point Gauss Quadrature Runge – Kuta Method Of Order 8 For Ordinary Diffe...
 
Regularization and variable selection via elastic net
Regularization and variable selection via elastic netRegularization and variable selection via elastic net
Regularization and variable selection via elastic net
 
Optimization of positive linear systems via geometric programming
Optimization of positive linear systems via geometric programmingOptimization of positive linear systems via geometric programming
Optimization of positive linear systems via geometric programming
 
Linkedin_PowerPoint
Linkedin_PowerPointLinkedin_PowerPoint
Linkedin_PowerPoint
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-opticsA machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics
 
Face verification techniques: how to speed up dataset creation
Face verification techniques: how to speed up dataset creationFace verification techniques: how to speed up dataset creation
Face verification techniques: how to speed up dataset creation
 

More from Dai-Hai Nguyen

Advanced machine learning for metabolite identification
Advanced machine learning for metabolite identificationAdvanced machine learning for metabolite identification
Advanced machine learning for metabolite identificationDai-Hai Nguyen
 
Metrics for generativemodels
Metrics for generativemodelsMetrics for generativemodels
Metrics for generativemodelsDai-Hai Nguyen
 
Brief introduction on GAN
Brief introduction on GANBrief introduction on GAN
Brief introduction on GANDai-Hai Nguyen
 
Semi-supervised learning model for molecular property prediction
Semi-supervised learning model for molecular property predictionSemi-supervised learning model for molecular property prediction
Semi-supervised learning model for molecular property predictionDai-Hai Nguyen
 

More from Dai-Hai Nguyen (8)

Advanced machine learning for metabolite identification
Advanced machine learning for metabolite identificationAdvanced machine learning for metabolite identification
Advanced machine learning for metabolite identification
 
Metrics for generativemodels
Metrics for generativemodelsMetrics for generativemodels
Metrics for generativemodels
 
IBSB tutorial
IBSB tutorialIBSB tutorial
IBSB tutorial
 
Brief introduction on GAN
Brief introduction on GANBrief introduction on GAN
Brief introduction on GAN
 
Semi-supervised learning model for molecular property prediction
Semi-supervised learning model for molecular property predictionSemi-supervised learning model for molecular property prediction
Semi-supervised learning model for molecular property prediction
 
DL for molecules
DL for moleculesDL for molecules
DL for molecules
 
Seminar
SeminarSeminar
Seminar
 
Collaborative DL
Collaborative DLCollaborative DL
Collaborative DL
 

Recently uploaded

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Hierarchical selection

  • 1. Group and Hierarchical Variable Selection Hai Nguyen Bioinformatics center, Kyoto University hai@kuicr.kyoto-u.ac.jp haidnguyen0909@gmail.com
  • 2. Introduction q Response: 𝑦 = (𝑦$, 𝑦&, … , 𝑦() * q predictors : 𝑥, = (𝑥,$, 𝑥,&, …, 𝑥,-) * , 𝑖 = 1, . . , 𝑛 qLinear model: 𝑦, = 𝛽3 + ∑ 𝛽6 𝑥66 + 𝜀 - 68$ q2-way interaction model: 𝑦, = 𝛽3 + ∑ 𝛽6 𝑥,6 + ∑ 𝜃6: 𝑥,6 𝑥,: + 𝜀6;: - 68$ 1) ∑ 𝛽6 𝑥,6 - 68$ :  main  effect  term,   𝛽 ∈ ℝ- 2) ∑ 𝜃6: 𝑥,6 𝑥,:6;: : interaction term, 𝜃 ∈ ℝ-J-
  • 3. Introduction q Problems to be addressed in high dimensional data: 1) Predictive performance 2) Interpretability 3) Highly correlated variables Sparsity assumption: # of nonzero coeffs 𝛽6 K 𝑠 and/or interaction 𝜃6: K 𝑠 is very few.
  • 4. Introduction Variable selection Group selection Hierarchical selection LASSO GROUP  LASSO HIERARCHICAL  LASSO
  • 5. Introduction q Shrinkage methods based on regularization 𝛽M = 𝑎𝑟𝑔𝑚𝑖𝑛R   𝑙 𝛽 + 𝜆 U |𝛽|$,                               𝐿 𝑎𝑠𝑠𝑜 ||𝛽||& & ,                               𝑅 𝑖𝑑𝑔𝑒 Where 𝑙 𝛽 is the loss function wr.t. 𝛽, e.g., square, logistic, hinge losses 1) Ridge: prevent overfitting but not variable selection 2) Lasso: variable selection but only select one for each group of correlated variables.
  • 6. Group selection q Group Lasso (Yuan et al., 2006) Coefficients are organized into K groups (known in advance): 𝑔$, 𝑔&, …, 𝑔  ⊆ 1,2,… , 𝑝 , disjoint and then the Group-Lasso pelnaty: 𝜆 ∑ 𝑑:||𝛽_` ||&: ,              where ||𝛽_` ||& = ∑ 𝛽, & ,∈_` q Properties: 1) Group-size = 1 -> LASSO 2) Convex penalty 3) Encourage to select or remove the entire group How  to  do  group  selection  without  prior  knowledge  of  group  structures?
  • 7. Group selection: automatic feature group q Elastic Net (Zou et al., 2005) A linear combination of ridge and LASSO penalties for group selection via the penalty:               𝛼 c |𝛽6| - 68$ + (1 − 𝛼) c 𝛽6 & - 68$ q Properties: 1) L1 term leads to a sparse solution 2) L2 term forces highly correlated variables to be averaged
  • 8. Group selection: automatic feature group (cont. ) q OSCAR (Bondell et al., 2008) A combination of LASSO penalties and 𝐿e for  each  pair  of  vars c |𝛽6| - 68$ + 𝑐 c max  {|𝛽6|, |𝛽:|} 6;: q Properties: 1) Encourage equality of coeffs
  • 9. Group selection: automatic feature group (cont.) q Fused LASSO (Friedman et al., 2007) A lasso term + fused penalty               𝛼 c |𝛽6| - 68$ + (1 − 𝛼) c |𝛽6 − 𝛽6o$| - 68& q Properties: 1) Encourage sparsity in the differences of coffs. 2) Introduced to account for 1-d correlation of predictors
  • 10. Group selection: automatic feature group (cont.) q HORSE (Friedman et al., 2007) Extension of fused LASSO               𝛼 c |𝛽6| - 68$ + (1 − 𝛼) c |𝛽6 − 𝛽6o$| 6;: q Properties: 1) Encourage sparsity in the differences of coffs. 2) Fused lasso for pairs of vars
  • 11. Hierarchy selection q Hierarchy restriction for interaction models 1) Strong hierarchy: 𝜃6: ≠ 0 → 𝛽6 ≠ 0 and 𝛽: ≠ 0 (SH) 2) Weak hierarchy: 𝜃6: ≠ 0 → 𝛽6 ≠ 0 or   𝛽: ≠ 0 (WH) 𝛽6 𝛽: 𝜃6: 𝛽s 𝜃:s
  • 12. Hierarchy selection q SHIM (Choi et al., 2010) Simply reparameterize the coeffsof 2-way interaction model: 𝑦, = 𝛽3 + c 𝛽6 𝑥,6 + c 𝜃6: 𝑥,6 𝑥,: + 𝜀 6;: - 68$ become: 𝑦, = 𝛽3 + ∑ 𝛽6 𝑥,6 + ∑ 𝛾6: 𝛽6 𝛽: 𝑥,6 𝑥,: + 𝜀6;: - 68$ q Properties: 1) satisfy “strong hierarchy” 2) but “Non-convex”, alternative minimization strategy for optimization.
  • 13. Hierarchy selection q Composite Absolute Penalties (CAP) (Zhao et al., 2009) Use overlapping group selection to induce hierarchy selection. Consider X1, X2. Hierarchy X1->X2 can be induced by: 𝑇 𝛽 = ||(𝛽$, 𝛽&)||vw + ||(𝛽&)||vx
  • 14. Hierarchy selection q Composite Absolute Penalties (Zhao et al., 2009) Hiearchical structured sparsity for 2-way interaction model can be obtained by: 𝑇(𝛽, 𝜃) = ∑ {|𝜃6:| + ||(𝛽6, 𝛽:, 𝜃6:)||vy` }6z: 𝛽6 𝛽: 𝜃6: 𝛽s 𝜃:s
  • 15. Hierarchy selection q Hierarchicalinteraction LASSO (Bien et al., 2013) Addition of convex constraints to the lasso to produce sparse interaction models inducing hierarchicalconditions. Start with the following:               𝑚𝑖𝑛R,{ 𝑙 𝛽, 𝜃 + 𝜆||𝛽||$ + 𝜆 2 ||𝜃||$ s.t. | 𝜃 = 𝜃* ||𝜃6||$ ≤ |𝛽6| q Properties: 1) Automatically satisfy “strong hierarchy” (𝜃,6 ≠ 0 −> 𝛽, ≠ 0  & 𝛽6 ≠ 0) 2) But “Non-convex”
  • 16. Hierarchy selection q Hierarchical interaction LASSO (Bien et al., 2013) Convex relaxation: replace 𝛽 by 𝛽€ − 𝛽o (𝛽€ , 𝛽o ≥ 0), then:               𝑚 𝑖𝑛R‚ ,Rƒ ,{ 𝑙 𝛽€ − 𝛽o , 𝜃 + 𝜆1* (𝛽€ + 𝛽o ) + 𝜆 2 ||𝜃||$ s.t. 𝜃 = 𝜃* ||𝜃6||$ ≤ 𝛽6 € + 𝛽6 o 𝛽6 € , 𝛽6 o ≥ 0 q Properties: 1) Still satisfy “strong hierarchy” (𝜃6: ≠ 0 −> 𝛽6 ≠ 0  & 𝛽: ≠ 0) 2) Equivalent to : 𝜆   ∑ 𝑚𝑎𝑥( 𝛽6 ,|𝜃6|) - 68$ + „ & ||𝜃||$ 3) Optimization is bit hard due to symmetry constraint, but can use AMMD
  • 17. Hierarchy selection q Hierarchicalinteraction LASSO (Bien et al., 2013) Removing symmetry constraint, then:               𝑚 𝑖𝑛R,{ 𝑙 𝛽€ − 𝛽o , 𝜃 + 𝜆1* (𝛽€ + 𝛽o ) + 𝜆 2 ||𝜃||$ s.t. … ||𝜃6||$ ≤ 𝛽6 € + 𝛽6 o 𝛽6 € , 𝛽6 o ≥ 0 q Properties: 1) Now only satisfy “weak hierarchy” (𝜃,6 ≠ 0 −> 𝛽, ≠ 0  & 𝛽6 ≠ 0) 2) “convex” 3) Optimization is easy because of separate 𝛽6 € + 𝛽6 o  (Proximal Operator)
  • 18. Hierarchy selection q VANISH (Zhao et al., 2009) 1) Linear model: 𝑌 = ∑ 𝛽6 𝑋6 + ∑ 𝜃6: 𝑋6 ∘ 𝑋: +6;: 𝜀 - 68$ 2) Nonlinear: 𝑌 = ∑ 𝑓6 + ∑ 𝑓6: +6;: 𝜀 - 68$ 3) penalty: 𝑃 𝑓 = 𝜆$ ∑ (||𝑓6||& + ∑ ||𝑓6:||& :z6 ) w x+𝜆& - 68$ ∑ ||𝑓6:||6;: Remark: if 𝑓6 = 𝑤6 𝑋6  , 𝑗 = 1, … , 𝑝, and X is normalized, then penalty becomes: 𝑃 𝑤, 𝜃 = 𝜆$ c ||(𝛽6, 𝜃6)||& + 𝜆& - 68$ c |𝜃6:| 6;:
  • 19. Hierarchy selection q GRESH (She et al., 2013) Proposed a general model of previously mentioned regularization of the following form: min •8[R,{] 𝑙 𝛽, 𝜃 + 𝜆$|𝜃|$ + 𝜆& c ||𝛽6, 𝑧(𝜃6)||‘ - 68$ s.t. 𝜃* = 𝜃 q Remark: 1) If 𝑧 𝜃6 = 𝜃6 * and 𝑞 = 2, 𝑡ℎ𝑒𝑛  it becomes VANISH 2) If 𝑧 𝜃6 = |𝜃6|$ and 𝑞 = ∞, 𝑡ℎ𝑒𝑛  it becomes HiLASSO
  • 20. Conclusion • Group  Selection • Hierarchical  selection