On Foundations of Parameter Estimation for Generalized Partial Linear Models with B–Splines and Continuous Optimization
Upcoming SlideShare
Loading in...5
×
 

On Foundations of Parameter Estimation for Generalized Partial Linear Models with B–Splines and Continuous Optimization

on

  • 822 views

The presentation by Gerhard Wilhelm Weber, Pakize Taylan and Lian Liu.

The presentation by Gerhard Wilhelm Weber, Pakize Taylan and Lian Liu.

Statistics

Views

Total Views
822
Views on SlideShare
807
Embed Views
15

Actions

Likes
0
Downloads
3
Comments
0

1 Embed 15

http://summerschool.ssa.org.ua 15

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

On Foundations of Parameter Estimation for Generalized Partial Linear Models with B–Splines and Continuous Optimization On Foundations of Parameter Estimation for Generalized Partial Linear Models with B–Splines and Continuous Optimization Presentation Transcript

  • 5th International Summer School Achievements and Applications of Contemporary Informatics, Mathematics and Physics National University of Technology of the Ukraine Kiev, Ukraine, August 3-15, 2010 On Foundations of Parameter Estimation for Generalized Partial Linear Models with B–Splines and Continuous Optimization Gerhard-Wilhelm WEBER Institute of Applied Mathematics, METU, Ankara,Turkey Faculty of Economics, Business and Law, University of Siegen, Germany Center for Research on Optimization and Control, University of Aveiro, Portugal Universiti Teknologi Malaysia, Skudai, Malaysia Pakize TAYLAN Department of Mathematics, Dicle University, Diyarbakır, Turkey Lian LIU Roche Pharma Development Center in Asia Pacific, Shangai China
  • Outline • Introduction • Estimation for Generalized Linear Models • Generalized Partial Linear Model (GPLM) • Newton-Raphson and Scoring Methods • Penalized Maximum Likelihood • Penalized Iteratively Reweighted Least Squares (P-IRLS) • An Alternative Solution for (P-IRLS) with CQP • Solution Methods • Linear Model + MARS, and Robust CMARS • Conclusion
  • Introduction The class of Generalized Linear Models (GLMs) has gained popularity as a statistical modeling tool. This popularity is due to: • The flexibility of GLM in addressing a variety of statistical problems, • The availability of software (Stata, SAS, S-PLUS, R) )to fit the models. The class of GLM is an extension of traditional linear models allows:  The mean of a dependent variable to depend on a linear predictor by a nonlinear link function......  The probability distribution of the response, to be any member of an exponential family of distributions.  Many widely used statistical models belong to GLM: o linear models with normal errors, o logistic and probit models for binary data, o log-linear models for multinomial data.
  • Introduction Many other useful statistical models such as with • Poisson, binomial, • Gamma or normal distributions, can be formulated as GLM by the selection of an appropriate link function and response probability distribution. A GLM looks as follows: i  H ( i )  xiT  ; • i  E(Yi ) : expected value of the response variable Yi , • H: smooth monotonic link function, • xi : observed value of explanatory variable for the i-th case, •  : vector of unknown parameters.
  • Introduction • Assumptions: Yi are independent and can have any distribution from exponential family density Yi ~ fY ( yi ,i , ) i  y   b ( )   exp  i i i i  ci ( yi , )  (i  1, 2,..., n),  ai ( )  • ai , bi , ci are arbitrary “scale” parameters, and i is called a natural parameter . • General expressions for mean and variance of dependent variable Yi : i  E (Yi )  bi' (i ), Var (Yi )  V ( i ) , V ( i )  bi" (i ) i , ai ( ) :  / i .
  • Estimation for GLM • Estimation and inference for GLM is based on the theory of • Maximum Likelihood Estimation • Least –Squares approach: n l (  ) :  (  y  i 1 i i i  bi (i )   ci ( yi , )). • The dependence of the right-hand side on  is solely through the dependence of the i on  . • Score equations: n  i  -1 x   Vi  yi  i   0,  i ij i 1   i i  xij   i  j  , xi 0  1 (i = 1, 2,..., n; j = 0,1,..., m) . • Solution for score equations given by Fisher scoring procedure based on the a Newton-Raphson algorithm.
  • Generalized Partial Linear Models (GPLMs) • Particular semiparametric models are the Generalized Partial Linear Models (GPLMs) : They extend the GLMs in that the usual parametric terms are augmented by a single nonparametric component:  E Y X , T   G X T    T  ;          m  is a vector of parameters, T • and    is a smooth function, which we try to estimate by splines. • Assumption: m-dimensional random vector X which represents (typically discrete) covariates, q-dimensional random vector T of continuous covariates, which comes from a decomposition of explanatory variables. Other interpretations of    : role of the environment, expert opinions, Wiener processes, etc..
  • Newton-Raphson and Scoring Methods The Newton-Raphson algorithm is based on a quadratic Taylor series approximation. • An important statistical application of the Newton-Raphson algorithm is given by maximum likelihood estimation: l ( , y )  l a ( , y ) l ( , y )  l ( 0 , y )  (   0 )   0  2l ( , y )  (   0 )2 ; 0 : startingvalue;  T 0 • l ( , y)  log L( , y) : log likelihood function of  T is based on the observed data y = ( y1 , y2 ,..., yn ) . • Next, determine the new iterate  1 from  l a  ( , y)   0 : 1 :  0  C 1r , r := l ( , y)  , C :=  2l ( , y)  T . • The Fisher’s scoring method replaces C by the expectation E(C).
  • Penalized Maximum Likelihood • Penalized Maximum Likelihood criterion for GPLM: b j (  ,  ) : l ( , y)  1   ( (t )) 2 dt . 2 a • l : log likelihood of the linear predictor and the second term penalizes the integrated squared curvature of   t  over the given interval  a, b  . •  : smoothing parameter controlling the trade-off between accuracy of the data fitting and its smoothness (stability, robustness or regularity). • Maximization of j (  ,  ) given by B-splines through the local scoring algorithm. For this, we write a k degree B-spline with knots at the values ti (i  1, 2, ..., n) for  t  : v   t     j B j ,k (t ), j 1 where j are coefficients, and B j ,k are k degree B-spline basis functions.
  • Penalized Maximum Likelihood • Zero and k degree B-splines bases are defined by 1, t j  t  t j 1 B j ,0 (t )   0, otherwise, t tj t j  k 1  t B j ,k (t )  B j ,k 1 (t )  B j 1,k 1 (t )  k  1 . t j k  t j t j  k 1  t j 1   t  :  (t1 ),...,  (tn )  and define an n  v matrix Β Bij : B j (ti ); T • We write by then,   t  = Β     1  2  v  . • Further, define a vv matrix  by b K kl :  Bk (t ) Bl(t ) dt.  a
  • Penalized Maximum Likelihood • Then, the j (  ,  ) criterion can be written as j (  ,  )  l (, y)  1      2 • If we insert the least-squares estimation  = ( BT B)1 BT   t  , ˆ we get: j (  ,  )  l (, y)  1      , 2 where  := ( ΒΤ Β)  ( ΒΤ Β)  ΒΤ . • Now, we will find ˆ and ˆ to solve the optimization problem of maximizing j( ,  ) . • Let H ( )   ( X , t )  g1  g2 ; g1 : X   g2 :   t 
  • Penalized Maximum Likelihood • To maximize j (  ,  ) with respect to g1 and g2 , we solve the following system of equations: T j (  ,  )    l ( , y )    0, g1  g1   T j (  ,  )    l ( , y )      g 2  0, g 2  g 2   which we treat by Newton-Raphson method. • These system equations are nonlinear in  and g2 . We linearize them around a current guess  by l ( , y) l ( , y)  2l (, y)                0. 
  • Penalized Maximum Likelihood • We use this equation in the system equations : C C   g1  g1   1 0 r  l ( , y)  2l ( , y)   1 0  0 ; r := , C :=  , C C +   g2  g2   r    g2    0  2  where g1 , g 0  g1 , g1 1 2  is a Newton-Raphson step and C and r are evaluated at   . • More simple form: C C   g1   C  1 (A *)    1     h; h :=    C 1r , S B := (C +  M )1 C ,  SB I   g2   S B  which can be resolved for  g1   X     h  g1  1  1 1  2 1  .  g2      S B (h  g1 ) 
  • Penalized Maximum Likelihood • ˆ  and ˆ can be found explicitly without iteration (inner loop backfitting): g1  X  ˆ ˆ  X { X T C ( I  S B ) X }1 X T C ( I  S B )h , ˆ g2  ˆ  S B (h  X  ). ˆ • Here, X represents the regression matrix for the input data xi , S B computes a weighted B-spline smoothing on the variable ti , with weights given by C   2l (, y)  and h is the adjusted dependent variable.
  • Penalized Maximum Likelihood • From the updated    ˆ  , the outer loop must be iterated to update ˆ  and, hence, h and C ; then, the loop is repeated until sufficient convergence is achieved. Step size optimization is performed by  ( )   1  (1   ) 0 , and we turn to maximize j ( ( ) ). • Standard results on the Newton-Raphson procedure ensure local convergence. • Asymptotic properties of the   RB (  C 1r ), ˆ ˆ ˆ  RB h, r = l  , y)   , ˆ ˆ where RB is the weighted additive fit operator: If we replace h, RB and C by their asymptotic versions h0 , RB0 and C0 , then, we get the covariance matrix for   ˆ
  • Penalized Maximum Likelihood Cov( )  RB0 C0 1 RB0  ˆ  T ''  '' : asymptotically  RBC 1 RB  T and Cov( gs )  RBs C 1RBs  ˆ T (s  1, 2). Here, h  h0 has mean   and variance C0   C 1 1 • , and RB j is the matrix ˆ that produces g j from h based on B-splines. • Furthermore, ˆ is asymptotically distributed as    RB C 01 RB   0 T 0
  • Penalized Iteratively Reweighted Least Squares (P-IRLS) The penalized likelihood is maximized by the penalized iteratively reweighted least squares to find the  p  1 th estimate of the linear predictor  [ p +1] , which is given by  B * i[ p ]  X iT   ˆ T  , 2 C [ p ] (h[ p ]   )       ˆ i[ p ]  H 1 (i[ p ] ), where h[ p ] is the iteratively adjusted dependent variable, given by hi[ p ] : i[ p ]  H (i[ p ] )( yi  i[ p ] ); here, H p] represents the derivative of H with respect to  , and C[ is a diagonal weight matrix with entries Cii p ] : 1 V ( i[ p ] ) H ( i[ p ] )2 , [ V (i[ p ] ) is proportional to the variance of Yi according to the current estimate i . [ p] where
  • Penalized Iteratively Reweighted Least Squares (P-IRLS) • If we use γ  t  = Βλ in  B * , which we rewrite as 2 C [ p] (h[ p]  X   Β )     . • With Green and Yandell (1985), we suppose that K is of rank z  v. Two matrices J and T can be formed such that J T KJ =I , T T KT =0 and J T T =0, where J and T have v rows and with full column ranks z and v-z, respectively. Then, rewriting  as  C *     J  with vectors  ,  of dimensions z and v - z , respectively. Then,  B * becomes 2   C [ p ] (h[ p ]   X , ΒT     ΒJ )       
  • Penalized Iteratively Reweighted Least Squares (P-IRLS) • Using Householder decomposition, the minimization can be split by separating the solution with respect to      from the one on .  D * Q1 C [ p ]  X , ΒT   R, T Q2 C [ p ]  X , ΒT   0, T  where Q = Q1 , Q2  is orthogonal and R is nonsingular, upper triangular and of full rank m  v  z. Then, we get the bilevel minimization problem of 2    E * upper  Q1T C [ k ]h[ k ]  R    Q1T C [ k ] BJ  (upper level)   with respect to     , given  based on minimizing  E * lower  2 Q2 C [ k ]h[ k ]  Q2 C [ k ] BJ  T T     (lower level).
  • Penalized Iteratively Reweighted Least Squares (P-IRLS) • The term  E * upper  can be set to 0. • If we put   Q2 C [ k ]h[ k ] , T V  Q2 C [ k ] BJ , T  E * lower  becomes the problem of minimizing H  V     , 2 which is a ridge regression problem. The solution is  E *   V  V   I V  H . The other parameters can be found as   1 T [ k ]    R Q2 C ( H  BJ ).   • Now, we can compute  using  C *  and, finally, [ p+1]  X   Β 
  • An Alternative Solution for (P-IRLS) with CQP • Both penalized maximum likelihood and P-IRLS methods contain a smoothing parameter . This parameter can be estimated by o Generalized Cross Validation (GCV), o Minimization of an UnBiased Risk Estimator (UBRE). • Different Method to solve P-IRLS by Conic Quadratic Programming (CQP). Use Cholesky decomposition vv matrix K in  B * such that K  U U . T Then,  B *  becomes   (  T ,  T )T  F * W  v 2  U 2 . W  C [ p] ( X , B) v  C [ p ] h[ p ] • The regression problem  F * can be reinterpreted as  H * min G( ), G ( ) : W   v 2  where g(  )  0. g ( ) : U  2 M M 0
  • An Alternative Solution for (P-IRLS) with CQP • Then, our optimization problem  H * is equivalent to min t, t , W  v  t 2 , t  0, 2 where U  M; 2 Here, W and U are n  (m  v) and vv matrices,  and v are (m  v) and n vectors, respectively. • This means: min t,  I * t , where W   v  t, U  M .
  • An Alternative Solution for (P-IRLS) with CQP • Conic Quadratic Programming (CQP) problem: min cT x, x where Di x  d i  piT x  qi (i  1, 2,..., k ) ; our problem is from CQP with c  (1, 0T v )T , x  (t ,  T )T  (t ,  m , vT )T , D1  (0n , W ), d1  v, p1  (1,0,...,0)T , m T q1  0, D2  (0 v( m+1), U ), d 2  0v , p2  0mv 1 , q2  M ; k  2. • We first reformulate  I *  as a Primal Problem: min t, t , 0 W  t   v  such that  :  n T      ,  1 0m  v      0  0 0 v m U  t   0v   :  v    , 0 0Tm 0T      M  v      Ln 1 ,   Lv 1 ,
  • An Alternative Solution for (P-IRLS) with CQP with ice-cream (or second order or Lorentz) cones:  Ll 1 : x  ( x1,..., xl 1)T  Rl 1 | xl 1  x1  x2  ...  xl2 . 2 2  • The corresponding Dual Problem is  max (v T , 0)  1  0T ,  M  2 v   0T 0  0  0T 1   1  v     1   0mv 0m   2   n such that  T , W 0m  v   UT 0v   0m  v     1  Ln 1 ,  2  Lv 1.
  • Solution Methods • Polynomial time algorithms requested. – Usually, only local information on the objective and the constraints given. – This algorithm cannot utilize a priori knowledge of the problem’s structure. – CQPs belong to the well-structured convex problems. • Interior Point algorithms: – We use the structure of problem. – Yield better complexity bounds. – Exhibit much better practical performance.
  • Outlook Important new class of GPLs: E Y X , T    G XT   T  , e.g., GPLM (X ,T ) = LM (X ) + MARS (T ) y                    c-(x,)=[(x)] c+(x,)=[(x)]  x * 2 * 2 X  *   L * CMARS
  • Outlook Robust CMARS: confidence interval (T j  ) ... ...................... . outlier ...   outlier    semi-length of confidence interval RCMARS
  • References [1] Aster, A., Borchers, B., and Thurber, C., Parameter Estimation and Inverse Problems, Academic Press, 2004. [2] Craven, P., and Wahba, G., Smoothing noisy data with spline functions, Numer. Math. 31, Linear Models, (1979), 377-403. [3] De Boor, C., Practical Guide to Splines, Springer Verlag, 2001. [4] Dongarra, J.J., Bunch, J.R., Moler, C.B., and Stewart, G.W., Linpack User’s Guide, Philadelphia, SIAM, 1979. [5] Friedman, J.H., Multivariate adaptive regression splines, (1991), The Annals of Statistics 19, 1, 1-141. [6] Green, P.J., and Yandell, B.S., Semi-Parametric Generalized Linear Models, Lecture Notes in Statistics, 32 (1985). [7] Hastie, T.J., and Tibshirani, R.J., Generalized Additive Models, New York, Chapman and Hall, 1990. [8] Kincaid, D., and Cheney, W., Numerical Analysis: Mathematics of Scientific computing, Pacific Grove, 2002. [9] Müller, M., Estimation and testing in generalized partial linear models – A comparive study, Statistics and Computing 11 (2001) 299-309, 2001. [10] Nelder, J.A., and Wedderburn, R.W.M., Generalized linear models, Journal of the Royal Statistical Society A, 145, (1972) 470-484. [11] Nemirovski, A., Lectures on modern convex optimization, Israel Institute of Technology http://iew3.technion.ac.il/Labs/Opt/opt/LN/Final.pdf.
  • References [12] Nesterov, Y.E , and Nemirovskii, A.S., Interior Point Methods in Convex Programming, SIAM, 1993. [13] Ortega, J.M., and Rheinboldt, W.C., Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New York, 1970. [14] Renegar, J., Mathematical View of Interior Point Methods in Convex Programming, SIAM, 2000. [15] Sheid, F., Numerical Analysis, McGraw-Hill Book Company, New-York, 1968. [16] Taylan, P., Weber, G.-W., and Beck, A., New approaches to regression by generalized additive and continuous optimization for modern applications in finance, science and technology, Optimization 56, 5-6 (2007), pp. 1-24. [17] Taylan, P., Weber, G.-W., and Liu, L., On foundations of parameter estimation for generalized partial linear models with B-splines and continuous optimization, in the proceedings of PCO 2010, 3rd Global Conference on Power Control and Optimization, February 2-4, 2010, Gold Coast, Queensland, Australia. [18] Weber, G.-W., Akteke-Öztürk, B., İşcanoğlu, A., Özöğür, S., and Taylan, P., Data Mining: Clustering, Classification and Regression, four lectures given at the Graduate Summer School on New Advances in Statistics, Middle East Technical University, Ankara, Turkey, August 11-24, 2007 (http://www.statsummer.com/). [19] Wood, S.N., Generalized Additive Models, An Introduction with R, New York, Chapman and Hall, 2006.
  • Thank you very much for your attention! http://www3.iam.metu.edu.tr/iam/images/7/73/Willi-CV.pdf