Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Logistic Regression

6,296 views

Published on

introduce logistic regression, inference with maximize likelihood with gradient descent, compare L1 and L2 regularization, generalized linear model

Published in: Technology, Education
  • Be the first to comment

Logistic Regression

  1. 1. Machine  learning  workshop   guodong@hulu.com   Machine  learning  introduc7on   Logis&c  regression   Feature  selec7on   Boos7ng,  tree  boos7ng     See  more  machine  learning  post:  h>p://dongguo.me    
  2. 2. Overview  of  machine  learning       Machine  Learning   Unsupervised   Learning   Supervised   Learning   Classifica7on   Logis7c   regression   Semi-­‐supervised   Learning   Regression  
  3. 3. How  to  choose  a  suitable  model?   Characteris&c   Naïve   Bayes   Trees   K  Nearest   neighbor   Logis&c   regression   Neural   SVM   Networks   Computa7onal   scalability   3   3   1   3   1   1   Interpretability    2   2     1    2   1   1   Predic7ve  power   1   1    3   2   3   3   Natural  handling   data  of  “mixed”  type   1   3   1   1   1   1   Robustness  to   outliers  in  input   space    3   3   3   3     1   1   <Elements  of  Sta-s-cal  Learning>  II  P351      
  4. 4. Why  model  can’t  perform  perfectly  on  unseen  data   •  Expected  risk   •  Empirical  risk   •  Choose  func7on  family  for  predic7on  func7ons     •  Error  
  5. 5. Logis7c  regression  
  6. 6. Outline   •  •  •  •  •  Introduc7on   Inference   Regulariza7on   Experiments   More   –  Mul7-­‐nominal  LR   –  Generalized  linear  model   •  Applica7on  
  7. 7. Logit  func7on  and  logis7c  func7on   •  Logit  func7on     •  logis7c  func7on:  Inversed  logit  
  8. 8. Logis7c  regression   •  Predic7on  func7on  
  9. 9. Inference  with  maximize  likelihood  (1)   •  Likelihood   •  Inference  
  10. 10. Inference  with  maximize  likelihood  (2)   •  Inference  cont.   •  Use  gradient  descent     •  Stochas7c  gradient  descent  
  11. 11. Regulariza7on   •  Penalize  large  weight  to  avoid  overfi`ng     –  L2  regulariza7on     –  L1  regulariza7on  
  12. 12. Regulariza7on:  Maximum  a  posteriori   •  MAP  
  13. 13. L2  regulariza7on  :  Gaussian  Prior     •  Gaussian  prior     •  MAP     •  Gradient  descent  step  
  14. 14. L1  regulariza7on  :  Laplace  Prior     •  Laplace  prior   •  MAP     •  Gradient  descent  step  
  15. 15. Implementa7on   •  L2  LR   _weightOfFeatures[fea] += step * (feaValue * error - reguParam * _weightOfFeatures[fea]);   •  L1  LR   if (_weightOfFeatures[fea] > 0) { _weightOfFeatures[fea] += step * (feaValue * error) - step * reguParam; if (_weightOfFeatures[fea] < 0) _weightOfFeatures[fea] = 0; }else if (_weightOfFeatures[fea] < 0) { _weightOfFeatures[fea] += step * (feaValue * error) + step * reguParam; if (_weightOfFeatures[fea] > 0) _weightOfFeatures[fea] = 0; }else{ _weightOfFeatures[fea] += step * (feaValue * error); }
  16. 16. L2  VS.  L1   •  L2  regulariza7on   –  Almost  all  weights  are  not  equal  to  zero   –  Not  suitable  when  training  samples  are  scarce   •  L1  regulariza7on   –  Produces  sparse  parameter  vectors   –  More  suitable  when  most  features  are  irrelevant   –  Could  handle  scarce  training  samples  be>er  
  17. 17. Experiments   •  Dataset   –  Goal:  gender  predic7on   –  Dataset:  train  samples  (431k),  test  samples  (167k)   •  Comparison  algorithms   –  A:  gradient  descent  with  L1  regulariza7on   –  B:  gradient  descent  with  L2  regulariza7on   –  C:  OWL-­‐QN  (L-­‐BFGS  based  op7miza7on  with  L1  regulariza7on)   •  Parameters  choice   –  –  –  –  Regulariza7on  value   Step(learning  speed)   Decay  ra7o   Itera7on  over  condi7on   •  Max  itera7on  7mes(50)  ||    AUC  change  <=0.0005  
  18. 18. Experiments  (cont.)   •  Experiments  results   Parameters  and   metrics   gradient  descent  with   gradient  descent  with   L1   L2   OWL-­‐QN   ‘best’  regulariza7on   0.001~0.005   term   0.0002~0.001   1   Best  step   0.05   0.02~0.05   -­‐   Best  decay  ra7o   0.85   0.85   -­‐   Itera7on  7mes   26   20~26   48   Not  zero  feature  /   all  feature   10492/10938   10938/10938   6629/10938   AUC   0.8470   0.8463   0.8467  
  19. 19. Mul7-­‐nominal  logis7c  regression   •  Predic7on  func7on     •  Inference  with  maximize  likelihood   •  Gradient  descent  step  (L2)  
  20. 20. More  Link  func7ons   •  Inference  with  maximize  likelihood     •  Link  func7on   •  Link  func7ons  for  binomial  distribu7on   –  Logit  func7on   –  Probit  func7on   –  Log-­‐log  func7on  
  21. 21. Generalized  linear  model   •  What  is  GLM   –  Generaliza7on  of  linear  regression   –  Connect  linear  model  with  response  variable  by  link  func7on   –  More  distribu7on  for  response  variable   •  Typical  GLM   •  Overview       –  Linear  regression  ,  Logis7c  regression,  Poisson  regression  
  22. 22. Applica7on   •  Yahoo   –  <Personalized  Click  Predic7on  in  Sponsored  Search>  WSDM’10   •  Microsoq   –  <Scalable  Training  of  L1-­‐Regularized  Log-­‐Linear  Models>  ICML’07   •  Baidu   –  Contextual  ads  CTR  predic7on   •  h>p://www.docin.com/p-­‐376254439.html   •  Hulu   –  –  –  –  Demographic  targe7ng   Other  ad-­‐targe7ng  project   Custom  churn  predic7on   More…  
  23. 23. Reference   •  ‘Scalable  Training  of  L1-­‐Regularized  Log-­‐Linear   Models’  ICML’07   –  h>p://www.docin.com/p-­‐376254439.html#   •  ‘Genera-ve  and  discrimina-ve  classifiers:  Naïve   Bayes  and  logis-c  regression’  by  Mitchell   •  ‘Feature  selec-on,  L1  vs.  L2  regulariza-on,  and   rota-onal  invariance’  ICML’04  
  24. 24. Recommended  resources   •  Machine  Learning  open  class  –  by  Andrew  Ng   –  //10.20.0.130/TempShare/Machine-­‐Learning  Open  Class   •  h>p://www.cnblogs.com/vivounicorn/archive/ 2012/02/24/2365328.html   •  logis7c  regression  Implementa7on[link]   –  //10.20.0.130/TempShare/guodong/Logis7c  regression  Implementa7on/   –  Support  binomial  and  mul7nominal  LR  with  L1  and  L2  regulariza7on   •  OWL-­‐QN   –  //10.20.0.130/TempShare/guodong/OWL-­‐QN/  
  25. 25. Thanks  

×