Regression

1,120 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,120
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
53
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Figure 1: scatter(1:20,10+(1:20)+2*randn(1,20),'k','filled'); a=axis; a(3)=0; axis(a);
  • Figure 1: scatter(1:20,10+(1:20)+2*randn(1,20),'k','filled'); a=axis; a(3)=0; axis(a);
  • Figure 1: scatter(1:20,10+(1:20)+2*randn(1,20),'k','filled'); a=axis; a(3)=0; axis(a);
  • Link with perceptron
  • A linear program is fast, but much slower than solving a linear system
  • Link with kNN
  • Regression

    1. 1. Regression CS294 Practical Machine Learning Romain Thibaux 09/18/06
    2. 2. Outline <ul><li>Ordinary Least Squares regression </li></ul><ul><ul><li>Derivation from minimizing the sum of squares </li></ul></ul><ul><ul><li>Probabilistic interpretation </li></ul></ul><ul><ul><li>Online version (LMS) </li></ul></ul><ul><li>Overfitting and Regularization </li></ul><ul><li>Numerical stability </li></ul><ul><li>L 1 Regression </li></ul><ul><li>Kernel Regression, Spline Regression </li></ul><ul><li>Multiple Adaptive Regression Splines (MARS) </li></ul>
    3. 3. Classification (reminder) X ! Y <ul><li>Anything: </li></ul><ul><li>continuous (  ,  d , …) </li></ul><ul><li>discrete ({0,1}, {1,…k}, …) </li></ul><ul><li>structured (tree, string, …) </li></ul><ul><li>… </li></ul><ul><li>discrete: </li></ul><ul><ul><li>{0,1} binary </li></ul></ul><ul><ul><li>{1,…k} multi-class </li></ul></ul><ul><ul><li>tree, etc. structured </li></ul></ul>
    4. 4. Classification (reminder) X <ul><li>Anything: </li></ul><ul><li>continuous (  ,  d , …) </li></ul><ul><li>discrete ({0,1}, {1,…k}, …) </li></ul><ul><li>structured (tree, string, …) </li></ul><ul><li>… </li></ul>
    5. 5. Classification (reminder) X <ul><li>Anything: </li></ul><ul><li>continuous (  ,  d , …) </li></ul><ul><li>discrete ({0,1}, {1,…k}, …) </li></ul><ul><li>structured (tree, string, …) </li></ul><ul><li>… </li></ul>Perceptron Logistic Regression Support Vector Machine Decision Tree Random Forest Kernel trick
    6. 6. Regression X ! Y <ul><li>continuous: </li></ul><ul><ul><li> ,  d </li></ul></ul><ul><li>Anything: </li></ul><ul><li>continuous (  ,  d , …) </li></ul><ul><li>discrete ({0,1}, {1,…k}, …) </li></ul><ul><li>structured (tree, string, …) </li></ul><ul><li>… </li></ul>1
    7. 7. Examples <ul><li>Voltage ! Temperature </li></ul><ul><li>Processes, memory ! Power consumption </li></ul><ul><li>Protein structure ! Energy [next week] </li></ul><ul><li>Robot arm controls ! Torque at effector </li></ul><ul><li>Location, industry, past losses ! Premium </li></ul>
    8. 8. Linear regression 0 10 20 30 40 0 10 20 30 20 22 24 26 Temperature 0 10 20 0 20 40 [start Matlab demo lecture2.m] Given examples Predict given a new point
    9. 9. Linear regression 0 20 0 20 40 Temperature 0 10 20 30 40 0 10 20 30 20 22 24 26 Prediction Prediction
    10. 10. Ordinary Least Squares (OLS) 0 20 0 Error or “residual” Prediction Observation Sum squared error
    11. 11. Minimize the sum squared error Sum squared error Linear equation Linear system
    12. 12. Alternative derivation n d Solve the system (it’s better not to invert the matrix)
    13. 13. LMS Algorithm (Least Mean Squares) where Online algorithm
    14. 14. Beyond lines and planes everything is the same with still linear in 0 10 20 0 20 40
    15. 15. Geometric interpretation [Matlab demo] 0 10 20 0 100 200 300 400 -10 0 10 20
    16. 16. Ordinary Least Squares [summary] n d Let For example Let Minimize by solving Given examples Predict
    17. 17. Probabilistic interpretation 0 20 0 Likelihood
    18. 18. Assumptions vs. Reality Voltage Intel sensor network data Temperature 0 1 2 3 4 5 6 7 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08
    19. 19. Overfitting 0 2 4 6 8 10 12 14 16 18 20 -15 -10 -5 0 5 10 15 20 25 30 [Matlab demo] Degree 15 polynomial
    20. 20. Ridge Regression (Regularization) 0 2 4 6 8 10 12 14 16 18 20 -10 -5 0 5 10 15 Effect of regularization (degree 19) with “small” Minimize by solving
    21. 21. Probabilistic interpretation Likelihood Prior Posterior
    22. 22. Numerical Accuracy Condition number vs <ul><li>We want covariates as perpendicular as possible, and roughly the same scale </li></ul><ul><li>Regularization </li></ul><ul><li>Preconditioning </li></ul>
    23. 23. Errors in Variables (Total Least Squares) 0 0
    24. 24. Sensitivity to outliers High weight given to outliers Influence function 0 10 20 30 40 0 10 20 30 5 10 15 20 25 Temperature at noon
    25. 25. L 1 Regression Linear program Influence function
    26. 26. Kernel Regression 0 2 4 6 8 10 12 14 16 18 20 -10 -5 0 5 10 15 Kernel regression (sigma=1)
    27. 27. Spline Regression Regression on each interval 5200 5400 5600 5800 50 60 70
    28. 28. Spline Regression With equality constraints 5200 5400 5600 5800 50 60 70
    29. 29. Spline Regression With L 1 cost 5200 5400 5600 5800 50 60 70
    30. 30. 0 1 2 0 #requests per minute Time (days) 5000 Heteroscedasticity
    31. 31. MARS Multivariate Adaptive Regression Splines … on the board…
    32. 32. Further topics <ul><li>Generalized Linear Models </li></ul><ul><li>Gaussian process regression </li></ul><ul><li>Local Linear regression </li></ul><ul><li>Feature Selection [next class] </li></ul>

    ×