© Experian Limited 2007. All rights reserved. Experian and the marks used herein are service marks or registered trademark...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 2
Agenda
Introduction
Applications of the Logi...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 3
Agenda
Introduction
Applications of the Logi...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 4
Introduction
Applications of the Logistic Re...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 5
Introduction
System Under Investigation
Indi...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 6
Introduction
System Identification Stages
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 7
Agenda
Introduction
Applications of the Logi...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 8
Agenda
Introduction
Applications of the Logi...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 9
Part I. Logistic Regression Model Developmen...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 10
k
kyˆ
ky
N
– index of current individual – ...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 11
Part I. Logistic Regression Model Developme...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 12
Part I. Logistic Regression Model Developme...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 13
Agenda
Introduction
Applications of the Log...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 14
Part I. Logistic Regression Model Developme...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 15
Part I. Logistic Regression Model Developme...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 16
Tailor Series Expansion
Cost Function Model...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 17
Part I. Logistic Regression Model Developme...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 18
Steepest Newton-
Descent Raphson
(NR)
NR wi...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 19
Agenda
Introduction
Applications of the Log...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 20
Numerical Problems
Matrix inversion, hence ...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 21
Agenda
Introduction
Applications of the Log...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 22
Part I. Logistic Regression Model Developme...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 23
Part I. Logistic Regression Model Developme...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 24
Agenda
Introduction
Applications of the Log...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 25
Agenda
Introduction
Applications of the Log...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 26
Part II. Stepwise Logistic Regression
Stepw...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 27
Part II. Stepwise Logistic Regression
Stepw...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 28
Part II. Stepwise Logistic Regression
Stepw...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 29
Forward Selection
Part II. Stepwise Logisti...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 30
1
2 3
Part II. Stepwise Logistic Regression...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 31
2 3
Part II. Stepwise Logistic Regression
S...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 32
Agenda
Introduction
Applications of the Log...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 33
Part II. Stepwise Logistic Regression
Step ...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 34
Part II. Stepwise Logistic Regression
Step ...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 35
Part II. Stepwise Logistic Regression
Step ...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 36
Part II. Stepwise Logistic Regression
Stepw...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 37
Part II. Stepwise Logistic Regression
Step ...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 38
3. Statistics for Model Analysis (part 2)
O...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 39
Agenda
Introduction
Applications of the Log...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 40
Part II. Stepwise Logistic Regression
Poten...
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 41
Summary
Introduction
Applications of the Lo...
© Experian Limited 2007. All rights reserved. Experian and the marks used herein are service marks or registered trademark...
Upcoming SlideShare
Loading in …5
×

Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics/

2,675 views
2,539 views

Published on

Published in: Technology, Economy & Finance
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,675
On SlideShare
0
From Embeds
0
Number of Embeds
57
Actions
Shares
0
Downloads
93
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics/

  1. 1. © Experian Limited 2007. All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian Limited. Other product and company names mentioned herein may be the trademarks of their respective owners. No part of this copyrighted work may be reproduced, modified, or distributed in any form or manner without the prior written permission of Experian Limited. Confidential and proprietary. Stepwise Logistic Regression Lecture for FMI Students 27.05.2010 Alexander Efremov
  2. 2. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 2 Agenda Introduction Applications of the Logistic Regression System Identification & Stepwise Regression Part I. Logistic Regression Model Development Logistic Model Maximum Likelihood Estimator Potential Problems Model Analysis and Validation Part II. Stepwise Logistic Regression (SWR) Basic Idea SWR Algorithm Potential Problems Summary
  3. 3. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 3 Agenda Introduction Applications of the Logistic Regression System Identification & Stepwise Regression Part I. Logistic Regression Model Development Logistic Model Maximum Likelihood Estimator Potential Problems Model Analysis and Validation Part II. Stepwise Logistic Regression (SWR) Basic Idea SWR Algorithm Potential Problems Summary
  4. 4. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 4 Introduction Applications of the Logistic Regression Medicine – diagnostics, modeling of disease growth, treatment effect Psychology – learn process modeling, psychological tests evaluation Economics – risk analysis, countries debt investigation, occupational choices Marketing – products consumption, retailers actions effect Criminology – risk factors for performing of criminal act Sociology – employment, graduation, vote analysis Ecology – modeling population growth linguistics – language changes Chemistry – reaction models Media – news effects, copycat reaction Finance – credit scoring, fraud detection Physics, Biology, etc. The Logistic Model
  5. 5. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 5 Introduction System Under Investigation Individuals /rough data/ => System => Model => =>
  6. 6. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 6 Introduction System Identification Stages
  7. 7. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 7 Agenda Introduction Applications of the Logistic Regression System Identification & Stepwise Regression Part I. Logistic Regression Model Development Logistic Model Maximum Likelihood Estimator Potential Problems Model Analysis and Validation Part II. Stepwise Logistic Regression (SWR) Basic Idea SWR Algorithm Potential Problems Summary
  8. 8. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 8 Agenda Introduction Applications of the Logistic Regression System Identification & Stepwise Regression Part I. Logistic Regression Model Development Logistic Model Maximum Likelihood Estimator Potential Problems Model Analysis and Validation Part II. Stepwise Logistic Regression (SWR) Basic Idea SWR Algorithm Potential Problems Summary
  9. 9. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 9 Part I. Logistic Regression Model Development Logistic Model Linear relation Logistic relation
  10. 10. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 10 k kyˆ ky N – index of current individual – intercept – number of observations – the i+1-th model parameter – dependent variable – the i-th independent variable /prob. of good/ – model output – i-th independent variable /predicted prob. of good/ Part I. Logistic Regression Model Development Logistic Model Logistic Relation – General Form “Linear” Log. Regression Model k k M M k e e y + = 1 ˆ kMk e y − + = 1 1 ˆ knnkk xxM ,,110 ... θθθ +++= )...( ,,110 1 1 ˆ knnk xxk e y θθθ +++− + = knnky y xx k k ,,110ˆ1 ˆ ...ln θθθ +++=− 0θ iθ kix , ni ,1= Nk ,1=
  11. 11. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 11 Part I. Logistic Regression Model Development Logistic Model Notation Parameters vector Regression vector Logistic model 1+ ∈ n Rθ 1+ ∈ n k Rϕ T n ]...[ 10 θθθθ = T knkk xx ]...1[ ,,1=ϕ θϕθθθ T kknnk ee y xxk −+++− + = + = 1 1 1 1 ˆ )...( ,,110
  12. 12. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 12 Part I. Logistic Regression Model Development Residual The Residual kkkk eye e y T k +=+ + = − ˆ 1 1 θϕ    =− =− =−= 0,ˆ 1,ˆ1 ˆ for for kk kk kkk yy yy yye Sources of Uncertainty Unavailable significant factors Simplified relations Time-varying performance Database errors Fraud
  13. 13. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 13 Agenda Introduction Applications of the Logistic Regression System Identification & Stepwise Regression Part I. Logistic Regression Model Development Logistic Model Maximum Likelihood Estimator Potential Problems Model Analysis and Validation Part II. Stepwise Logistic Regression (SWR) Basic Idea SWR Algorithm Potential Problems Summary
  14. 14. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 14 Part I. Logistic Regression Model Development Maximum Likelihood Estimator Cost Function Model output Likelihood contribution Likelihood function Log-likelihood function Maximum Likelihood Criterion kk y k y kk yyl − −= 1 , )ˆ1(ˆθ θ θ θ θ LL ln2minlnmax −⇔ ∏ = = N k klL 1 ,θθ ∑ = −−+= N k kkkk yyyyL 1 ))ˆ1ln()1(ˆln(ln θ )|1(ˆ kkk yPy ϕ==
  15. 15. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 15 Part I. Logistic Regression Model Development Maximum Likelihood Estimator Cost Function /-2 Log L/ for a Real Life Case
  16. 16. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 16 Tailor Series Expansion Cost Function Models Linear model Quadratic model Part I. Logistic Regression Model Development Maximum Likelihood Estimator )()()1( ˆˆ iii θθθ ∆+=+ )()()( ˆ )( )( iTiii gfM θ θ ∆+= )()()( 2 1)()()( ˆ )( )()( iiTiiTiii HgfM θθθ θ ∆∆+∆+= 3 )()()( 2 1)()()( ˆ )( ˆ )()( OHgff iiTiiTiii +∆∆+∆+= ∆+ θθθ θθθ )( ˆ )( iTi fg θ ∇= )( ˆ 2)( ii fH θ ∇= Cost function Gradient Hessian )( ˆ )( ˆ ln ii Lf θθ −= ?)( =∆ i θ Estimates Update
  17. 17. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 17 Part I. Logistic Regression Model Development Maximum Likelihood Estimator Gradient Hessian I-st Order Methods II-nd Order Method /e.g. Steepest Descent/ /e.g. Newton-Raphson/ gαθ −=∆ gH 1− −=∆ αθ [ ] 1 10 + ∂ ∂ ∂ ∂ ∂ ∂ ∈= nTfff Rg nθθθ L 11 2 2 1 2 0 2 1 2 2 1 2 01 2 0 2 10 2 2 0 2 +×+ ∂ ∂ ∂∂ ∂ ∂∂ ∂ ∂∂ ∂ ∂ ∂ ∂∂ ∂ ∂∂ ∂ ∂∂ ∂ ∂ ∂ ∈                   = nn fff fff fff RH nnn n n θθθθθ θθθθθ θθθθθ L MOMM L L θ (0) 1 2 θ*θopt 1 2 θ (0) θ* θopt
  18. 18. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 18 Steepest Newton- Descent Raphson (NR) NR with NR with Line Search Quadratic Interpolation 1 2 θ (0) θ* θopt θ (0) 1 2 θ*θopt Part I. Logistic Regression Model Development Maximum Likelihood Estimator gαθ −=∆ gH 1− −=∆ αθ gH 1* − −=∆ αθ gH 1* − −=∆ αθ θ (0) 1 2 θ*θopt θ (0) 1 2 θ*θopt
  19. 19. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 19 Agenda Introduction Applications of the Logistic Regression System Identification & Stepwise Regression Part I. Logistic Regression Model Development Logistic Model Maximum Likelihood Estimator Potential Problems Model Analysis and Validation Part II. Stepwise Logistic Regression (SWR) Basic Idea SWR Algorithm Potential Problems Summary
  20. 20. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 20 Numerical Problems Matrix inversion, hence SVD, EVD, QR, etc. Local Minima Part I. Logistic Regression Model Development Potential problems Model Overfitting αθθ −=+ )()1( ˆˆ ii 1− H g -2lnL k y2,k yk 1,ky
  21. 21. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 21 Agenda Introduction Applications of the Logistic Regression System Identification & Stepwise Regression Part I. Logistic Regression Model Development Logistic Model Maximum Likelihood Estimator Potential Problems Model Analysis and Validation Part II. Stepwise Logistic Regression (SWR) Basic Idea SWR Algorithm Potential Problems Summary
  22. 22. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 22 Part I. Logistic Regression Model Development Frequently Used Statistics for Model Analysis Individual Estimate Measures Standard error Wald statistic p-value Overall Model Measures Coefficient of determination (R2) generalized R2 gen. max. resc. R2 Cost function 2 1 ˆ)ˆ( ~2 ˆ 2 2 ˆ 2 χ θθ σ θ σ θθ i i i ii iW == − N LL eR θθ ˆln0 ˆln 2 12 − −= 1 0 ˆln2 1 −−= N L esR θ Rs R mR 22 = )( ˆ )( ˆ ln2 ii Lf θθ −= iH i )][diag( 1 ˆ − =θ σ 2 1Pr χ> χ p-value WWi
  23. 23. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 23 Part I. Logistic Regression Model Development Frequently Used Statistics for Model Analysis Modified criteria Akaike Information Criterion (AIC) Schwarz Criterion (SC) Minimum Description Length (MDL), Final Prediction Error (FPE), etc. Model Validation Data split into development and validation samples nLAIC 2ln2 ˆˆ +−= θθ )1ln(ln2 ˆˆ −+−= NnLSC θθ AIC -2lnL
  24. 24. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 24 Agenda Introduction Applications of the Logistic Regression System Identification & Stepwise Regression Part I. Logistic Regression Model Development Logistic Model Maximum Likelihood Estimator Potential Problems Model Analysis and Validation Part II. Stepwise Logistic Regression (SWR) Basic Idea SWR Algorithm Potential Problems Summary
  25. 25. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 25 Agenda Introduction Applications of the Logistic Regression System Identification & Stepwise Regression Part I. Logistic Regression Model Development Logistic Model Maximum Likelihood Estimator Potential Problems Model Analysis and Validation Part II. Stepwise Logistic Regression (SWR) Basic Idea SWR Algorithm Potential Problems Summary
  26. 26. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 26 Part II. Stepwise Logistic Regression Stepwise Logistic Regression – Basic Idea xo, xe – sets of all variables, out/entered in the model xoi, xei – the most/less significant variable SLE – Significance Level to Enter SLS – Significance Level to Stay SWR
  27. 27. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 27 Part II. Stepwise Logistic Regression Stepwise Logistic Regression – Basic Idea Available information
  28. 28. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 28 Part II. Stepwise Logistic Regression Stepwise Logistic Regression – Basic Idea 1 Initialization
  29. 29. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 29 Forward Selection Part II. Stepwise Logistic Regression Stepwise Logistic Regression – Basic Idea 1 2
  30. 30. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 30 1 2 3 Part II. Stepwise Logistic Regression Stepwise Logistic Regression – Basic Idea Forward Selection
  31. 31. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 31 2 3 Part II. Stepwise Logistic Regression Stepwise Logistic Regression – Basic Idea Backward Elimination
  32. 32. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 32 Agenda Introduction Applications of the Logistic Regression System Identification & Stepwise Regression Part I. Logistic Regression Model Development Logistic Model Maximum Likelihood Estimator Potential Problems Model Analysis and Validation Part II. Stepwise Logistic Regression (SWR) Basic Idea SWR Algorithm Potential Problems Summary
  33. 33. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 33 Part II. Stepwise Logistic Regression Step 0. Initialization Logistic model 1. Intercept Model 2. Full model 3. One Factor Model Check for Enter Score Chi-Sq for all potential models Maximum Score Chi-Square p-value & threshold Model Determination (Optimization) θϕT ke yk − + = 1 1 ˆ ii T ii gHgS 1− = R∈θ 1=kϕ 1+ ∈ n Rθ T knkk xx ]1[ ,,1 K=ϕ i i Smaxarg1 =l SLEvalue-p 1 <l T kk x ]1[ ,1l=ϕ2 R∈θ
  34. 34. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 34 Part II. Stepwise Logistic Regression Step 1. Forward Selection 1. Check for Enter Score Chi-Square of all potential models Maximum Score Chi-Square p-value & threshold 2. Model Determination (Optimization) 3. Statistics for Model Analysis Individual Estimate Measures standard error Wald statistic & p-value ii T ii gHgS 1− = i i i Smaxarg=l SLEvalue-p <il T kkk i xx ]1[ ,,1 ll K=ϕ1+ ∈ i Rθ
  35. 35. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 35 Part II. Stepwise Logistic Regression Step 1. Forward Selection 3. Statistics for Model Analysis (part 2) Overall Model Measures Coefficients of determination Cost function Modified criteria Akaike Information Criterion (AIC) Schwarz Criterion (SC)
  36. 36. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 36 Part II. Stepwise Logistic Regression Stepwise Logistic Regression SWR
  37. 37. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 37 Part II. Stepwise Logistic Regression Step 2. Backward Elimination 1. Check for Leave Wald statistic & p-value of all potential models p-value & threshold 2. Model Determination (Optimization) 3. Statistics for Model Analysis Individual Estimate Measures standard error Wald statistic & p-value T kkkkk ijj xxxx ]1[ ,,,, 111 llll KK +− =ϕi R∈θ SLLvalue-pmax >il
  38. 38. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 38 3. Statistics for Model Analysis (part 2) Overall Model Measures Coefficients of determination Cost function Modified criteria Akaike Information Criterion (AIC) Schwarz Criterion (SC) Part II. Stepwise Logistic Regression Step 2. Backward Elimination
  39. 39. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 39 Agenda Introduction Applications of the Logistic Regression System Identification & Stepwise Regression Part I. Logistic Regression Model Development Logistic Model Maximum Likelihood Estimator Potential Problems Model Analysis and Validation Part II. Stepwise Logistic Regression (SWR) Basic Idea SWR Algorithm Potential Problems Summary
  40. 40. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 40 Part II. Stepwise Logistic Regression Potential problems in the Stepwise Regression Local Minima & Initial Conditions Numerical Problems /SVD, EVD, QR, etc./ Model Overfitting
  41. 41. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 41 Summary Introduction Applications of the Logistic Regression System Identification & Stepwise Regression Part I. Logistic Regression Model Development Logistic Model Maximum Likelihood Estimator Potential Problems Model Analysis and Validation Part II. Stepwise Logistic Regression (SWR) Basic Idea SWR Algorithm Potential Problems Summary
  42. 42. © Experian Limited 2007. All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian Limited. Other product and company names mentioned herein may be the trademarks of their respective owners. No part of this copyrighted work may be reproduced, modified, or distributed in any form or manner without the prior written permission of Experian Limited. Confidential and proprietary. Stepwise Logistic Regression Lecture for FMI Students 27.05.2010 Alexander Efremov Thank You! http://anp.tu-sofia.bg/aefremov/index.htm

×