Upcoming SlideShare
×

# Regression

1,120 views

Published on

1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
1,120
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
53
0
Likes
1
Embeds 0
No embeds

No notes for slide
• Figure 1: scatter(1:20,10+(1:20)+2*randn(1,20),&apos;k&apos;,&apos;filled&apos;); a=axis; a(3)=0; axis(a);
• Figure 1: scatter(1:20,10+(1:20)+2*randn(1,20),&apos;k&apos;,&apos;filled&apos;); a=axis; a(3)=0; axis(a);
• Figure 1: scatter(1:20,10+(1:20)+2*randn(1,20),&apos;k&apos;,&apos;filled&apos;); a=axis; a(3)=0; axis(a);
• A linear program is fast, but much slower than solving a linear system
• ### Regression

1. 1. Regression CS294 Practical Machine Learning Romain Thibaux 09/18/06
2. 2. Outline <ul><li>Ordinary Least Squares regression </li></ul><ul><ul><li>Derivation from minimizing the sum of squares </li></ul></ul><ul><ul><li>Probabilistic interpretation </li></ul></ul><ul><ul><li>Online version (LMS) </li></ul></ul><ul><li>Overfitting and Regularization </li></ul><ul><li>Numerical stability </li></ul><ul><li>L 1 Regression </li></ul><ul><li>Kernel Regression, Spline Regression </li></ul><ul><li>Multiple Adaptive Regression Splines (MARS) </li></ul>
3. 3. Classification (reminder) X ! Y <ul><li>Anything: </li></ul><ul><li>continuous (  ,  d , …) </li></ul><ul><li>discrete ({0,1}, {1,…k}, …) </li></ul><ul><li>structured (tree, string, …) </li></ul><ul><li>… </li></ul><ul><li>discrete: </li></ul><ul><ul><li>{0,1} binary </li></ul></ul><ul><ul><li>{1,…k} multi-class </li></ul></ul><ul><ul><li>tree, etc. structured </li></ul></ul>
4. 4. Classification (reminder) X <ul><li>Anything: </li></ul><ul><li>continuous (  ,  d , …) </li></ul><ul><li>discrete ({0,1}, {1,…k}, …) </li></ul><ul><li>structured (tree, string, …) </li></ul><ul><li>… </li></ul>
5. 5. Classification (reminder) X <ul><li>Anything: </li></ul><ul><li>continuous (  ,  d , …) </li></ul><ul><li>discrete ({0,1}, {1,…k}, …) </li></ul><ul><li>structured (tree, string, …) </li></ul><ul><li>… </li></ul>Perceptron Logistic Regression Support Vector Machine Decision Tree Random Forest Kernel trick
6. 6. Regression X ! Y <ul><li>continuous: </li></ul><ul><ul><li> ,  d </li></ul></ul><ul><li>Anything: </li></ul><ul><li>continuous (  ,  d , …) </li></ul><ul><li>discrete ({0,1}, {1,…k}, …) </li></ul><ul><li>structured (tree, string, …) </li></ul><ul><li>… </li></ul>1
7. 7. Examples <ul><li>Voltage ! Temperature </li></ul><ul><li>Processes, memory ! Power consumption </li></ul><ul><li>Protein structure ! Energy [next week] </li></ul><ul><li>Robot arm controls ! Torque at effector </li></ul><ul><li>Location, industry, past losses ! Premium </li></ul>
8. 8. Linear regression 0 10 20 30 40 0 10 20 30 20 22 24 26 Temperature 0 10 20 0 20 40 [start Matlab demo lecture2.m] Given examples Predict given a new point
9. 9. Linear regression 0 20 0 20 40 Temperature 0 10 20 30 40 0 10 20 30 20 22 24 26 Prediction Prediction
10. 10. Ordinary Least Squares (OLS) 0 20 0 Error or “residual” Prediction Observation Sum squared error
11. 11. Minimize the sum squared error Sum squared error Linear equation Linear system
12. 12. Alternative derivation n d Solve the system (it’s better not to invert the matrix)
13. 13. LMS Algorithm (Least Mean Squares) where Online algorithm
14. 14. Beyond lines and planes everything is the same with still linear in 0 10 20 0 20 40
15. 15. Geometric interpretation [Matlab demo] 0 10 20 0 100 200 300 400 -10 0 10 20
16. 16. Ordinary Least Squares [summary] n d Let For example Let Minimize by solving Given examples Predict
17. 17. Probabilistic interpretation 0 20 0 Likelihood
18. 18. Assumptions vs. Reality Voltage Intel sensor network data Temperature 0 1 2 3 4 5 6 7 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08
19. 19. Overfitting 0 2 4 6 8 10 12 14 16 18 20 -15 -10 -5 0 5 10 15 20 25 30 [Matlab demo] Degree 15 polynomial
20. 20. Ridge Regression (Regularization) 0 2 4 6 8 10 12 14 16 18 20 -10 -5 0 5 10 15 Effect of regularization (degree 19) with “small” Minimize by solving
21. 21. Probabilistic interpretation Likelihood Prior Posterior
22. 22. Numerical Accuracy Condition number vs <ul><li>We want covariates as perpendicular as possible, and roughly the same scale </li></ul><ul><li>Regularization </li></ul><ul><li>Preconditioning </li></ul>
23. 23. Errors in Variables (Total Least Squares) 0 0
24. 24. Sensitivity to outliers High weight given to outliers Influence function 0 10 20 30 40 0 10 20 30 5 10 15 20 25 Temperature at noon
25. 25. L 1 Regression Linear program Influence function
26. 26. Kernel Regression 0 2 4 6 8 10 12 14 16 18 20 -10 -5 0 5 10 15 Kernel regression (sigma=1)
27. 27. Spline Regression Regression on each interval 5200 5400 5600 5800 50 60 70
28. 28. Spline Regression With equality constraints 5200 5400 5600 5800 50 60 70
29. 29. Spline Regression With L 1 cost 5200 5400 5600 5800 50 60 70
30. 30. 0 1 2 0 #requests per minute Time (days) 5000 Heteroscedasticity
31. 31. MARS Multivariate Adaptive Regression Splines … on the board…
32. 32. Further topics <ul><li>Generalized Linear Models </li></ul><ul><li>Gaussian process regression </li></ul><ul><li>Local Linear regression </li></ul><ul><li>Feature Selection [next class] </li></ul>