SlideShare a Scribd company logo
Robust parametric classification and variable
     selection with minimum distance estimation

              Eric Chia,b,1 with David W. Scotta,2
                      a Department  of Statistics,
                            Rice University
                     b Baylor   College of Medicine


                          June 17, 2010




1
    DOE DE-FG02-97ER25308
2
    NSF DMS-09-07491
Outline


   The binary regression problem

   The L2 E Method

   Estimation

   Variable Selection

   Simulations

   Conclusion
Outline


   The binary regression problem

   The L2 E Method

   Estimation

   Variable Selection

   Simulations

   Conclusion
Logistic Regression




   Suppose we wish to predict y ∈ {0, 1}n using X ∈ Rn×p .
   The number of features p could be very large.
Univariate Logistic Regression: MLE



              1.0                                  qq q q q q
                                                   q qq         q    q qq
                                                                       q    q q qq qq qq qqqq q qq q
                                                                               qq   q q qq
                                                                                    qq         q qq
                                                                                                  q




              0.8



              0.6
    Pr(Y=1)




              0.4



              0.2



              0.0             qq q q q q q qq q q q q q q
                                q q   q  q q q q qq
                                         qq          q      q q qq
                                                            q q
                                                             q          q q q q qq
                                                                             q q q
                                                                                 q        q


                    −6   −4                −2                       0                    2             4   6
                                                                X
MLE is sensitive to outliers



              1.0   q    qq   q   q                             qq q q q q
                                                                q qq         q    q qq
                                                                                    q    q q qq qq qq qqqq q qq q
                                                                                            qq   q q qq
                                                                                                 qq         q qq
                                                                                                               q




              0.8



              0.6
    Pr(Y=1)




              0.4



              0.2



              0.0                          qq q q q q q qq q q q q q q
                                             q q   q  q q q q qq
                                                      qq          q      q q qq
                                                                         q q
                                                                          q          q q q q qq
                                                                                          q q q
                                                                                              q        q


                    −6                −4                −2                       0                    2             4   6
                                                                             X
MLE is sensitive to outliers




   Likelihood based choice
       Outlier or not, MLE puts mass wherever data lies.
       Cost: MLE puts mass over regions where there is no data.
MLE is sensitive to outliers
                      1.0   q    qq   q   q                             qq q q q q
                                                                        q qq         q    q qq
                                                                                            q    q q qq qq qq qqqq q qq q
                                                                                                    qq   q q qq
                                                                                                         qq         q qq
                                                                                                                       q




                      0.8



                      0.6
            Pr(Y=1)




                      0.4



                      0.2



                      0.0                          qq q q q q q qq q q q q q q
                                                     q q   q  q q q q qq
                                                              qq          q      q q qq
                                                                                 q q
                                                                                  q          q q q q qq
                                                                                                  q q q
                                                                                                      q        q


                            −6                −4                −2                       0                    2             4   6
                                                                                     X




   There are no ’ones’ between -4 and -2.

                                      But P(Y = 1|X ∈ (−4, −2)) ↑.

   There are no ’zeros’ between 4 and 6.

                                          But P(Y = 0|X ∈ (4, 6)) ↑.
Outline


   The binary regression problem

   The L2 E Method

   Estimation

   Variable Selection

   Simulations

   Conclusion
The L2 distance as an alternative to the deviance loss.




       g : unknown true density.
       fθ : putative parametric density.
       Find θ that minimizes the ISE

                      ˆ
                      θ = argmin     (fθ (x) − g (x))2 dx.
                              θ
The L2 E Method



      The equivalent empirical criterion:
                                                    n
               ˆ                               2
               θ = argmin       fθ (x)2 dx −             fθ (Xi ) ,
                       θ                       n
                                                   i=1

      where Xi ∈ Rp is the covariate vector of the i th observation.
      The L2 Estimator or L2 E [Scott, 2001].
      Familar quantity: Smoothing parameter selection in
      non-parametric density estimation.
Density-power divergence


   The L2 E and MLE are empirical minimizers of two different points
   in a spectrum divergence measures [Basu et al, 1998].


                                      1                    1 1+γ
   dγ (g , fθ ) =   fθ1+γ (z) − 1 +       g (z)fθγ (z) +     g   (z) dz,
                                      γ                    γ

       γ > 0 trades off efficiency for robustness.
       γ = 1 =⇒ L2 loss.
       γ → 0 =⇒ Kullback - Leibler divergence.
Robustness of the L2 distance


                  ˆ
                  θ = argmin      (fθ (x) − g (x))2 dx.
                              θ

      The L2 distance is zero-forcing:

                         g (x) = 0 forces fθ (x) = 0.

      Puts premium on avoiding “false positives”.
      L2 E balances:

                       mass where data is
                       v.s.
                       no mass where data is absent.
Partial Densities: An extra degree of freedom


       Expand the search space [Scott, 2001]:

                             (wfθ (x) − g (x))2 dx.

       Fit a parametric model to only a fraction, w , of the data
       (Hopefully the fraction described well by the parametric
       model!)

                                                      n
          ˆ ˆ                   2     2     2w
         (θ, w ) = argmin     w fθ (x) dx −               fθ (Xi ) .
                    θ,w                      n
                                                  i=1
Logistic L2 E loss



   Let F (u) = 1/(1 + exp(−u)), logistic function, then

                              n
      ˆ ˆ               w2
     (β, w ) = argmin              F (xiT β)2 + (1 − F (xiT β))2
              β,w ∈[0,1] n   i=1
                              n
                         w
                    −2             yi F (xiT β) + (1 − yi )(1 − F (xiT β)) .
                         n
                             i=1
Two dimensional example


                          4


                          2


                          0

                    X2
                         −2


                         −4




                              −5   0        5   10
                                       X1




      n = 300 and p = 2.
      Three clusters each of size 100
          Two are labelled 0
          One is labelled 1
5                           5




      0                           0
X2




                            X2
     −5                          −5




          0        5   10                   0        5   10
              X1                                X1


          (a) MLE                     (b) L2 E A :w = 1.026
                                                  ˆ
5                                  5




      0                                  0
X2




                                   X2
     −5                                 −5




                0        5    10                   0        5   10
                    X1                                 X1


          (c) L2 E B: w = 0.666
                      ˆ                      (d) L2 E C: w = 0.668
                                                         ˆ
Outline


   The binary regression problem

   The L2 E Method

   Estimation

   Variable Selection

   Simulations

   Conclusion
The optimization problem



   Challenges
       L2 E loss is not convex.
              Hessian of the L2 E loss is non-definite.
              Standard Newton-Raphson fails.
       Scalability and stability as p increases?

   Solution
       Majorization-Minimization
Majorization-Minimization




   Strategy
   Minimize a surrogate function, majorization.
   Choose surrogate such that
       ↓ surrogate =⇒ ↓ objective.
       surrogate is easier to minimize than objective.
Majorization-Minimization




   Definition
   Given f and g , real-valued functions on Rp , g majorizes f at x if
    1. g (x) = f (x) and
    2. g (u) ≥ f (u) for all u.
More
Lack of fit




              Less



                     very bad                  optimal            less bad

                                The spectrum of logistic models
More
Lack of fit




              Less



                     very bad                  optimal            less bad

                                The spectrum of logistic models
More
Lack of fit




              Less



                     very bad                  optimal            less bad

                                The spectrum of logistic models
More
Lack of fit




              Less



                     very bad                  optimal            less bad

                                The spectrum of logistic models
More
Lack of fit




              Less



                     very bad                  optimal            less bad

                                The spectrum of logistic models
More
Lack of fit




              Less



                     very bad                  optimal            less bad

                                The spectrum of logistic models
More
Lack of fit




              Less



                     very bad                  optimal            less bad

                                The spectrum of logistic models
More
Lack of fit




              Less



                     very bad                  optimal            less bad

                                The spectrum of logistic models
More
Lack of fit




              Less



                     very bad                  optimal            less bad

                                The spectrum of logistic models
More
Lack of fit




              Less



                     very bad                  optimal            less bad

                                The spectrum of logistic models
Quadratic majorization of the logistic L2 E loss

   The loss has bounded curvature with respect to β. Fix w .
   Majorize the exact second order Taylor expansion.
                                     1 T −1 T (m)
                 β (m+1) = β (m) −     (X X ) X Z ,
                                     K
   where
                 1            3 4                      w
            K≥      max         wz − z 3 − 2wz 2 + z +      .
                 4 z∈[−1,1]   2                        2

       K controls the step size. Its lower bound is related to the
       maximum curvature of the loss.
       Z (m) is a working response that depends on Y and X β (m) .
Outline


   The binary regression problem

   The L2 E Method

   Estimation

   Variable Selection

   Simulations

   Conclusion
Continuous variable selection with the LASSO



   Minimize
                                           p
                          “L2 E loss ”+λ         |βi |
                                           i=1

   Penalized majorization of loss majorizes the penalized loss.
   Minimize
                                                         p
                  “majorization of L2 E loss ”+λ               |βi |
                                                         i=1
Coordinate Descent



   Suppose X is standardized, then

                   (m+1)          (m)       1 T (m)
                  βk       = S βk       −    X Z ,λ ,
                                            K (k)

   where S is the soft threshold function

                    S(x, λ) = sign(x) max(|x| − λ, 0).

   Extension to elastic net is straightforward.
Heuristic Model Selection



   Regularization Path
   Calculate penalized regression coefficients for range of λ values.

   Information Criterion
       For each λ, calculate deviance loss using L2 E coefficients and
       add correction term (AIC and BIC).
       Select model with lowest AIC/BIC value.
       Use number of non-zero penalized regression coefficients for
       degrees of freedom [Zou et al, 2007].
Heuristic Model Selection


                                                                         151   127    111   97          69   38       7 4 4 4 4 3 2 0




                                                                   800
           1.5




                                                                   700
           1.0




                                                                   600
           0.5




                                                         L2E BIC
      βj




                                                                   500
           0.0




                                                                   400
           -0.5




                                                                   300
           -1.0




                                                                   200
                  -3.0   -2.5              -2.0   -1.5                         -3.0              -2.5               -2.0      -1.5

                                log10(λ)                                                                 log10(λ)
Outline


   The binary regression problem

   The L2 E Method

   Estimation

   Variable Selection

   Simulations

   Conclusion
Simulations: Estimation




      n = 200, p = 4
      Xi | Group 1 ∼ i.i.d. N(µ, σ)
      Xi | Group 2 ∼ i.i.d. N(−µ, σ)
      β = (1, 1/2, 1, 2)
      Yi |Xi ∼ i.d. Bern(F (XiT β))
      1,000 replicates.
Case 1




   Vary position of 1 outlier.
Distributions of fitted coefficients


                                                             1                                                                               2

                   6

                   4                                                                                       q   q       q     q       q   q
                                                                                                                                                     q   q
                                                                                                                                                                 q    q
                       q   q   q       q     q       q   q           q   q       q    q       q    q
                           q   q       q     q       q   q           q   q       q    q       q    q                                                                               q
                       q   q
                           q   q
                               q       q
                                       q     q
                                             q       q
                                                     q   q
                                                         q           q
                                                                     q   q
                                                                         q       q
                                                                                 q    q
                                                                                      q       q
                                                                                              q    q
                                                                                                   q           q                                                              q
                       q
                       q   q
                           q   q   q   q     q
                                             q       q
                                                     q   q           q
                                                                     q   q       q
                                                                                 q    q
                                                                                      q       q    q   q   q
                                                                                                           q   q       q     q
                                                                                                                             q       q   q
                                                                                                                                         q           q
                                                                                                                                                     q   q
                                                                                                                                                         q       q
                                                                                                                                                                 q    q       q    q
                       q   q
                           q   q
                               q   q
                                   q   q
                                       q     q
                                             q       q   q
                                                         q           q
                                                                     q   q
                                                                         q       q    q
                                                                                      q       q
                                                                                              q    q
                                                                                                   q       q   q   q   q
                                                                                                                       q     q       q
                                                                                                                                     q   q           q   q       q    q
                                                                                                                                                                      q       q    q
                                                                                                                                                                                   q
                   2   q
                       q   q
                           q   q   q
                                   q
                                   q
                                       q     q   q
                                                 q
                                                 q
                                                 q
                                                 q
                                                     q
                                                     q   q
                                                         q
                                                                 q
                                                                 q
                                                                     q   q       q
                                                                                 q    q
                                                                                      q       q    q
                                                                                                   q                             q                                            q
                                                 q               q
                                                                 q           q            q                                                      q
                                                                 q           q
                                                                             q            q                                                                  q
                                                                             q
                                                                             q            q                                                                               q
                                                                                          q
                                                                                          q
                                                                                          q
                       q   q   q   q   q     q       q   q           q   q       q    q            q
                   0   q
                       q
                       q
                       q
                           q
                           q   q
                               q
                               q
                                   q
                                   q
                                   q
                                       q
                                       q
                                       q     q
                                             q
                                             q
                                                 q
                                                 q
                                                 q
                                                 q
                                                     q
                                                     q   q
                                                         q
                                                         q       q
                                                                 q
                                                                     q
                                                                     q   q
                                                                         q
                                                                         q
                                                                                 q
                                                                                 q    q
                                                                                      q       q
                                                                                              q
                                                                                              q
                                                                                              q
                                                                                                   q
                                                                                                   q
                                                                                                   q
                       q   q   q   q   q     q   q   q   q       q   q   q   q
                                                                             q   q    q       q
                                                                                              q
                                                                                              q    q
                                                                                                   q
                                                                                                   q
                           q   q       q     q       q   q       q   q   q   q   q    q   q   q    q
                                                                                                                                                                                       Method
    Fitted Value




                                                                             q
                                                                             q            q
                                                                                          q            q   q   q   q   q     q   q   q   q           q   q       q    q
                                                                                          q   q    q                                             q                            q    q
                                                                                                                                                             q
                                                                                                                                                             q            q
                                                                                                                                                                                                     MLE
                                                             3                                                                               4
                                                                                                                                                                                                L2E: w = 1
                                                                                                               q             q           q
                                                                                                           q           q             q                   q
                                                                                                                                                     q
                                                                                                                                                                 q
                                                                                                                                                                      q                     L2E: w = wopt
                   6                                                                                                                                                               q
                                                                                                                                                                              q
                           q   q       q     q           q               q
                                                     q               q
                                                                                 q    q
                   4       q   q       q     q       q   q           q   q       q    q
                                                                                              q    q
                                                                                                   q
                       q   q   q       q     q       q   q           q   q       q    q
                                   q

                   2

                   0



                       −0.25           1.5           3               6           12           24       −0.25           1.5           3               6           12           24
                                                                                          Outlier position
Estimation




   MLE regression coefficients driven to zero (implosion breakdown)
Case 2




   Vary number of outliers at a fixed position.
Distributions of fitted coefficients


                                                      1                                                                               2
                                                                                             q                                                                               q
                                                                                                                                                                q
                                                                                q                                                                  q
                                                                                          q                                       q
                    4                                                           q         q
                                                                                                  4     q q       q q
                                                                   q            q         q
                                                                                          q                                   q               q
                                                                   q            q         q
                                                                                          q                                                                q            q
                                                  q                q            q
                                                                                q         q
                    3   q q q
                          q q
                                  q   q       q   q
                                                  q           q    q
                                                                   q
                                                                   q
                                                                           q    q
                                                                                q
                                                                                q
                                                                                          q
                                                                                        q q       3                                                          q            q
                                                                                                                                                                          q
                          q q     q
                                  q   q       q   q           q    q       q    q
                                                                                q         q
                                                                                        q q
                                                                                          q
                                                                                          q                                                     q            q            q
                        q q q     q
                                  q   q
                                      q       q
                                              q
                                                  q
                                                  q
                                                  q           q    q
                                                                   q
                                                                   q       q    q
                                                                                q       q q           q q q       q q           q
                                                                                                                              q q             q q          q q
                                                                                                                                                             q            q
                                                                                                                                                                        q q
                          q q
                          q q
                        q q q     q   q
                                      q       q   q           q
                                                              q    q       q
                                                                           q    q       q q
                                                                                        q q             q q       q q         q q             q q          q q          q
                        q q q
                        q q q     q
                                  q   q       q
                                              q   q           q    q       q    q
                                                                                q       q q             q q       q q         q q             q            q            q
                    2   q q q
                        q
                        q q q
                                  q
                                q q
                                  q
                                  q
                                      q
                                      q
                                      q
                                      q
                                              q
                                              q
                                              q
                                              q
                                                  q
                                                  q
                                                  q
                                                  q
                                                              q
                                                              q
                                                              q
                                                              q
                                                                   q
                                                                   q
                                                                   q       q
                                                                           q
                                                                           q
                                                                           q
                                                                                q       q
                                                                                        q
                                                                                        q
                                                                                        q         2           q
                        q q q   q q
                                q             q               q            q            q
                                q
                                q
                                q
                                          q
                                          q
                                          q               q
                                                          q            q
                                                                       q            q
                                                                                    q             1                       q
                                                                                                                                          q            q            q
                    1                     q
                                          q
                                          q
                                          q
                                                          q
                                                          q
                                                          q
                                                                       q
                                                                       q
                                                                       q
                                                                       q
                                                                                    q
                                                                                    q
                                                                                    q
                                                                                    q
                                                          q            q
                                                                       q            q
                                                                                    q
                        q                   q
                                            q q             q q          q            q q
                                                                                                  0
                    0   q q q
                        q q q
                        q
                        q q q   q q q
                                q q q
                                q q q       q q
                                            q q             q q
                                                            q q          q
                                                                         q
                                                                                q
                                                                                q
                                                                                q     q
                                                                                      q q
                        q q q
                        q       q q q
                                q
                                q   q       q q             q q
                                                            q q          q
                                                                         q      q
                                                                                q     q q
                                                                                      q q
                                                                                        q
                        q q q   q q q     q q
                                          q                 q            q            q          −1   q q q   q q q           q q             q            q            q
                                          q q q           q q q                 q                                                                                                Method
    Fitted Value




                          q q     q q     q   q           q            q q
                                                                       q            q q q                                                          q            q
                                          q                   q                     q                                                                                        q
                   −1                     q               q
                                                          q
                                                          q
                                                                       q
                                                                       q
                                                                       q
                                                                       q
                                                                                q   q
                                                                                    q
                                                                                    q
                                                                                    q
                                                                                        q                                 q
                                                                                                                          q               q
                                                                                                                                          q            q            q
                                                                                                                                                                                               MLE
                                                      3                                                                               4
                                                                                                                                                                                          L2E: w = 1
                                                                                             q                                                                               q
                                                                                                                                                                q            q
                                                                                             q    8                                                q                                  L2E: w = w.opt
                                                                                             q                                    q
                    6                                              q
                                                                                q
                                                                                q                                     q                                                      q
                                                  q                                                     q q       q           q               q                 q            q
                                                                   q
                          q q     q q         q q                                                 6                                                q       q            q
                                                              q            q q          q q                                       q
                    4     q q     q q         q               q              q
                                                                           q q          q q
                                                                q
                        q q q     q q         q q             q            q            q
                                                                                                  4
                    2
                                                                                                  2
                    0
                                                                                                  0

                         0       1            5               10           15           20             0          1           5               10           15           20
                                                                                    Number of outliers
Simulations: Variable Selection


       n = 200, p = 1000
       Xi | Group 1 ∼ i.i.d. N(µ, σ)
       Xi | Group 2 ∼ i.i.d. N(−µ, σ)
       β = (1, 1, 1, 1, 0, . . . , 0)
       Yi |Xi ∼ i.d. Bern(F (XiT β))
       1,000 replicates.

   Single Outlier
   Moved along ray starting at centroid of one group and moving
   away along (1, 1, 1, 1, 0, . . . , 0).
Average number of correct variables selected


                                           AIC                                                 BIC

                  4




                  3


                                                                                                                              method
    Expectation




                                                                                                                                            MLE
                  2
                                                                                                                                       L2E: w = 1
                                                                                                                                   L2E: w = wopt



                  1




                  0

                      0   1.06 2.11 3.17 4.22 5.28 6.33 7.39 8.44   9.5   0   1.06 2.11 3.17 4.22 5.28 6.33 7.39 8.44   9.5
                                                          Outlier Relative Position
Average number of incorrect variables selected


                                             AIC                                                 BIC

                  140


                  120


                  100

                                                                                                                                method
    Expectation




                   80
                                                                                                                                              MLE
                                                                                                                                         L2E: w = 1
                   60                                                                                                                L2E: w = wopt


                   40


                   20


                    0

                        0   1.06 2.11 3.17 4.22 5.28 6.33 7.39 8.44   9.5   0   1.06 2.11 3.17 4.22 5.28 6.33 7.39 8.44   9.5
                                                           Outlier Relative Position
Variable Selection




   Implosion breakdown =⇒ reduced SNR =⇒ missed detections
Outline


   The binary regression problem

   The L2 E Method

   Estimation

   Variable Selection

   Simulations

   Conclusion
Summary




     MLE logistic regression is sensitive to implosion breakdown.
     Estimation and variable selection are affected: contaminants
     reduce SNR.
     L2 E is robust because it is zero forcing.
     Majorization-Minimization + Coordinate Descent facilitate
     fast and stable optimization.
Future work




      Is w worth optimizing over?
      What is the correct AIC or BIC formulation?
      What are the degrees of freedom in the L2 E loss model?
References


      D.W. Scott.
      Parametric statistical modeling by minimum integrated square
      error.
      Technometrics, 43(3):274–285, 2001.
      A. Basu et al.
      Robust and efficient estimation by minimising a density power
      divergence.
      Biometrika, 85(3):549–559, 1998
      H. Zou et al.
      On the “degrees of freedom” of the lasso.
      Annals of Statistics, 35(5):2173–2192, 2007

More Related Content

Similar to Robust parametric classification and variable selection with minimum distance estimation

Regression Modelling Overview
Regression Modelling OverviewRegression Modelling Overview
Regression Modelling Overview
Paul Hewson
 
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & ApplicationsNavigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & ApplicationsRajarshi Guha
 
Introduction to power laws
Introduction to power lawsIntroduction to power laws
Introduction to power lawsColin Gillespie
 
Detecting Drug Effects in the Brain
Detecting Drug Effects in the BrainDetecting Drug Effects in the Brain
Detecting Drug Effects in the Brainhtstatistics
 
Time series compare
Time series compareTime series compare
Time series comparelrhutyra
 
Characterizing the Density of Chemical Spaces and its Use in Outlier Analysis...
Characterizing the Density of Chemical Spaces and its Use in Outlier Analysis...Characterizing the Density of Chemical Spaces and its Use in Outlier Analysis...
Characterizing the Density of Chemical Spaces and its Use in Outlier Analysis...Rajarshi Guha
 
Stat7840 hao wu
Stat7840 hao wuStat7840 hao wu
Stat7840 hao wu
Hao Wu
 

Similar to Robust parametric classification and variable selection with minimum distance estimation (11)

Regression Modelling Overview
Regression Modelling OverviewRegression Modelling Overview
Regression Modelling Overview
 
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & ApplicationsNavigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
 
Clustering Plot
Clustering PlotClustering Plot
Clustering Plot
 
Slides mcneil
Slides mcneilSlides mcneil
Slides mcneil
 
Introduction to power laws
Introduction to power lawsIntroduction to power laws
Introduction to power laws
 
Slides alexander-mcneil
Slides alexander-mcneilSlides alexander-mcneil
Slides alexander-mcneil
 
Detecting Drug Effects in the Brain
Detecting Drug Effects in the BrainDetecting Drug Effects in the Brain
Detecting Drug Effects in the Brain
 
Time series compare
Time series compareTime series compare
Time series compare
 
Characterizing the Density of Chemical Spaces and its Use in Outlier Analysis...
Characterizing the Density of Chemical Spaces and its Use in Outlier Analysis...Characterizing the Density of Chemical Spaces and its Use in Outlier Analysis...
Characterizing the Density of Chemical Spaces and its Use in Outlier Analysis...
 
Stat7840 hao wu
Stat7840 hao wuStat7840 hao wu
Stat7840 hao wu
 
Slides lyon-2011
Slides lyon-2011Slides lyon-2011
Slides lyon-2011
 

Recently uploaded

Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 

Recently uploaded (20)

Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 

Robust parametric classification and variable selection with minimum distance estimation

  • 1. Robust parametric classification and variable selection with minimum distance estimation Eric Chia,b,1 with David W. Scotta,2 a Department of Statistics, Rice University b Baylor College of Medicine June 17, 2010 1 DOE DE-FG02-97ER25308 2 NSF DMS-09-07491
  • 2. Outline The binary regression problem The L2 E Method Estimation Variable Selection Simulations Conclusion
  • 3. Outline The binary regression problem The L2 E Method Estimation Variable Selection Simulations Conclusion
  • 4. Logistic Regression Suppose we wish to predict y ∈ {0, 1}n using X ∈ Rn×p . The number of features p could be very large.
  • 5. Univariate Logistic Regression: MLE 1.0 qq q q q q q qq q q qq q q q qq qq qq qqqq q qq q qq q q qq qq q qq q 0.8 0.6 Pr(Y=1) 0.4 0.2 0.0 qq q q q q q qq q q q q q q q q q q q q q qq qq q q q qq q q q q q q q qq q q q q q −6 −4 −2 0 2 4 6 X
  • 6. MLE is sensitive to outliers 1.0 q qq q q qq q q q q q qq q q qq q q q qq qq qq qqqq q qq q qq q q qq qq q qq q 0.8 0.6 Pr(Y=1) 0.4 0.2 0.0 qq q q q q q qq q q q q q q q q q q q q q qq qq q q q qq q q q q q q q qq q q q q q −6 −4 −2 0 2 4 6 X
  • 7. MLE is sensitive to outliers Likelihood based choice Outlier or not, MLE puts mass wherever data lies. Cost: MLE puts mass over regions where there is no data.
  • 8. MLE is sensitive to outliers 1.0 q qq q q qq q q q q q qq q q qq q q q qq qq qq qqqq q qq q qq q q qq qq q qq q 0.8 0.6 Pr(Y=1) 0.4 0.2 0.0 qq q q q q q qq q q q q q q q q q q q q q qq qq q q q qq q q q q q q q qq q q q q q −6 −4 −2 0 2 4 6 X There are no ’ones’ between -4 and -2. But P(Y = 1|X ∈ (−4, −2)) ↑. There are no ’zeros’ between 4 and 6. But P(Y = 0|X ∈ (4, 6)) ↑.
  • 9. Outline The binary regression problem The L2 E Method Estimation Variable Selection Simulations Conclusion
  • 10. The L2 distance as an alternative to the deviance loss. g : unknown true density. fθ : putative parametric density. Find θ that minimizes the ISE ˆ θ = argmin (fθ (x) − g (x))2 dx. θ
  • 11. The L2 E Method The equivalent empirical criterion: n ˆ 2 θ = argmin fθ (x)2 dx − fθ (Xi ) , θ n i=1 where Xi ∈ Rp is the covariate vector of the i th observation. The L2 Estimator or L2 E [Scott, 2001]. Familar quantity: Smoothing parameter selection in non-parametric density estimation.
  • 12. Density-power divergence The L2 E and MLE are empirical minimizers of two different points in a spectrum divergence measures [Basu et al, 1998]. 1 1 1+γ dγ (g , fθ ) = fθ1+γ (z) − 1 + g (z)fθγ (z) + g (z) dz, γ γ γ > 0 trades off efficiency for robustness. γ = 1 =⇒ L2 loss. γ → 0 =⇒ Kullback - Leibler divergence.
  • 13. Robustness of the L2 distance ˆ θ = argmin (fθ (x) − g (x))2 dx. θ The L2 distance is zero-forcing: g (x) = 0 forces fθ (x) = 0. Puts premium on avoiding “false positives”. L2 E balances: mass where data is v.s. no mass where data is absent.
  • 14. Partial Densities: An extra degree of freedom Expand the search space [Scott, 2001]: (wfθ (x) − g (x))2 dx. Fit a parametric model to only a fraction, w , of the data (Hopefully the fraction described well by the parametric model!) n ˆ ˆ 2 2 2w (θ, w ) = argmin w fθ (x) dx − fθ (Xi ) . θ,w n i=1
  • 15. Logistic L2 E loss Let F (u) = 1/(1 + exp(−u)), logistic function, then n ˆ ˆ w2 (β, w ) = argmin F (xiT β)2 + (1 − F (xiT β))2 β,w ∈[0,1] n i=1 n w −2 yi F (xiT β) + (1 − yi )(1 − F (xiT β)) . n i=1
  • 16. Two dimensional example 4 2 0 X2 −2 −4 −5 0 5 10 X1 n = 300 and p = 2. Three clusters each of size 100 Two are labelled 0 One is labelled 1
  • 17. 5 5 0 0 X2 X2 −5 −5 0 5 10 0 5 10 X1 X1 (a) MLE (b) L2 E A :w = 1.026 ˆ
  • 18. 5 5 0 0 X2 X2 −5 −5 0 5 10 0 5 10 X1 X1 (c) L2 E B: w = 0.666 ˆ (d) L2 E C: w = 0.668 ˆ
  • 19. Outline The binary regression problem The L2 E Method Estimation Variable Selection Simulations Conclusion
  • 20. The optimization problem Challenges L2 E loss is not convex. Hessian of the L2 E loss is non-definite. Standard Newton-Raphson fails. Scalability and stability as p increases? Solution Majorization-Minimization
  • 21. Majorization-Minimization Strategy Minimize a surrogate function, majorization. Choose surrogate such that ↓ surrogate =⇒ ↓ objective. surrogate is easier to minimize than objective.
  • 22. Majorization-Minimization Definition Given f and g , real-valued functions on Rp , g majorizes f at x if 1. g (x) = f (x) and 2. g (u) ≥ f (u) for all u.
  • 23. More Lack of fit Less very bad optimal less bad The spectrum of logistic models
  • 24. More Lack of fit Less very bad optimal less bad The spectrum of logistic models
  • 25. More Lack of fit Less very bad optimal less bad The spectrum of logistic models
  • 26. More Lack of fit Less very bad optimal less bad The spectrum of logistic models
  • 27. More Lack of fit Less very bad optimal less bad The spectrum of logistic models
  • 28. More Lack of fit Less very bad optimal less bad The spectrum of logistic models
  • 29. More Lack of fit Less very bad optimal less bad The spectrum of logistic models
  • 30. More Lack of fit Less very bad optimal less bad The spectrum of logistic models
  • 31. More Lack of fit Less very bad optimal less bad The spectrum of logistic models
  • 32. More Lack of fit Less very bad optimal less bad The spectrum of logistic models
  • 33. Quadratic majorization of the logistic L2 E loss The loss has bounded curvature with respect to β. Fix w . Majorize the exact second order Taylor expansion. 1 T −1 T (m) β (m+1) = β (m) − (X X ) X Z , K where 1 3 4 w K≥ max wz − z 3 − 2wz 2 + z + . 4 z∈[−1,1] 2 2 K controls the step size. Its lower bound is related to the maximum curvature of the loss. Z (m) is a working response that depends on Y and X β (m) .
  • 34. Outline The binary regression problem The L2 E Method Estimation Variable Selection Simulations Conclusion
  • 35. Continuous variable selection with the LASSO Minimize p “L2 E loss ”+λ |βi | i=1 Penalized majorization of loss majorizes the penalized loss. Minimize p “majorization of L2 E loss ”+λ |βi | i=1
  • 36. Coordinate Descent Suppose X is standardized, then (m+1) (m) 1 T (m) βk = S βk − X Z ,λ , K (k) where S is the soft threshold function S(x, λ) = sign(x) max(|x| − λ, 0). Extension to elastic net is straightforward.
  • 37. Heuristic Model Selection Regularization Path Calculate penalized regression coefficients for range of λ values. Information Criterion For each λ, calculate deviance loss using L2 E coefficients and add correction term (AIC and BIC). Select model with lowest AIC/BIC value. Use number of non-zero penalized regression coefficients for degrees of freedom [Zou et al, 2007].
  • 38. Heuristic Model Selection 151 127 111 97 69 38 7 4 4 4 4 3 2 0 800 1.5 700 1.0 600 0.5 L2E BIC βj 500 0.0 400 -0.5 300 -1.0 200 -3.0 -2.5 -2.0 -1.5 -3.0 -2.5 -2.0 -1.5 log10(λ) log10(λ)
  • 39. Outline The binary regression problem The L2 E Method Estimation Variable Selection Simulations Conclusion
  • 40. Simulations: Estimation n = 200, p = 4 Xi | Group 1 ∼ i.i.d. N(µ, σ) Xi | Group 2 ∼ i.i.d. N(−µ, σ) β = (1, 1/2, 1, 2) Yi |Xi ∼ i.d. Bern(F (XiT β)) 1,000 replicates.
  • 41. Case 1 Vary position of 1 outlier.
  • 42. Distributions of fitted coefficients 1 2 6 4 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 2 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Method Fitted Value q q q q q q q q q q q q q q q q q q q q q q q q q q MLE 3 4 L2E: w = 1 q q q q q q q q q q L2E: w = wopt 6 q q q q q q q q q q q q 4 q q q q q q q q q q q q q q q q q q q q q q q q q 2 0 −0.25 1.5 3 6 12 24 −0.25 1.5 3 6 12 24 Outlier position
  • 43. Estimation MLE regression coefficients driven to zero (implosion breakdown)
  • 44. Case 2 Vary number of outliers at a fixed position.
  • 45. Distributions of fitted coefficients 1 2 q q q q q q q 4 q q 4 q q q q q q q q q q q q q q q q q q q q q 3 q q q q q q q q q q q q q q q q q q q q q 3 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 2 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 2 q q q q q q q q q q q q q q q q q q q q q q q 1 q q q q 1 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0 0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q −1 q q q q q q q q q q q q q q q q q q Method Fitted Value q q q q q q q q q q q q q q q q q q q −1 q q q q q q q q q q q q q q q q q q q q MLE 3 4 L2E: w = 1 q q q q q 8 q L2E: w = w.opt q q 6 q q q q q q q q q q q q q q q q q q q q 6 q q q q q q q q q 4 q q q q q q q q q q q q q q q q q q q q q q 4 2 2 0 0 0 1 5 10 15 20 0 1 5 10 15 20 Number of outliers
  • 46. Simulations: Variable Selection n = 200, p = 1000 Xi | Group 1 ∼ i.i.d. N(µ, σ) Xi | Group 2 ∼ i.i.d. N(−µ, σ) β = (1, 1, 1, 1, 0, . . . , 0) Yi |Xi ∼ i.d. Bern(F (XiT β)) 1,000 replicates. Single Outlier Moved along ray starting at centroid of one group and moving away along (1, 1, 1, 1, 0, . . . , 0).
  • 47. Average number of correct variables selected AIC BIC 4 3 method Expectation MLE 2 L2E: w = 1 L2E: w = wopt 1 0 0 1.06 2.11 3.17 4.22 5.28 6.33 7.39 8.44 9.5 0 1.06 2.11 3.17 4.22 5.28 6.33 7.39 8.44 9.5 Outlier Relative Position
  • 48. Average number of incorrect variables selected AIC BIC 140 120 100 method Expectation 80 MLE L2E: w = 1 60 L2E: w = wopt 40 20 0 0 1.06 2.11 3.17 4.22 5.28 6.33 7.39 8.44 9.5 0 1.06 2.11 3.17 4.22 5.28 6.33 7.39 8.44 9.5 Outlier Relative Position
  • 49. Variable Selection Implosion breakdown =⇒ reduced SNR =⇒ missed detections
  • 50. Outline The binary regression problem The L2 E Method Estimation Variable Selection Simulations Conclusion
  • 51. Summary MLE logistic regression is sensitive to implosion breakdown. Estimation and variable selection are affected: contaminants reduce SNR. L2 E is robust because it is zero forcing. Majorization-Minimization + Coordinate Descent facilitate fast and stable optimization.
  • 52. Future work Is w worth optimizing over? What is the correct AIC or BIC formulation? What are the degrees of freedom in the L2 E loss model?
  • 53. References D.W. Scott. Parametric statistical modeling by minimum integrated square error. Technometrics, 43(3):274–285, 2001. A. Basu et al. Robust and efficient estimation by minimising a density power divergence. Biometrika, 85(3):549–559, 1998 H. Zou et al. On the “degrees of freedom” of the lasso. Annals of Statistics, 35(5):2173–2192, 2007