Curve fitting


Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Curve fitting

  1. 1. Machine Learning Srihari Basic Concepts in Machine Learning Sargur N. Srihari 1
  2. 2. Machine Learning Srihari Introduction to ML: Topics1. Polynomial Curve Fitting2. Probability Theory of multiple variables3. Maximum Likelihood4. Bayesian Approach5. Model Selection6. Curse of Dimensionality 2
  3. 3. Machine Learning Srihari Polynomial Curve Fitting Sargur N. Srihari 3
  4. 4. Machine Learning Srihari Simple Regression Problem•  Observe Real-valued input variable x•  Use x to predict value of target variable t Target•  Synthetic data Variable generated from sin(2π x)•  Random noise in Input Variable target values 4
  5. 5. Machine Learning Srihari Notation•  N observations of x x = (x1,..,xN)T t = (t1,..,tN)T•  Goal is to exploit training set to predict value of from x Data Generation:•  Inherently a difficult N = 10 problem Spaced uniformly in range [0,1] Generated from sin(2πx) by adding•  Probability theory allows small Gaussian noise us to make a prediction Noise typical due to unobserved variables 5
  6. 6. Machine Learning Srihari Polynomial Curve Fitting•  Polynomial function•  Where M is the order of the polynomial•  Is higher value of M better? We’ll see shortly!•  Coefficients w0 ,…wM are denoted by vector w•  Nonlinear function of x, linear function of coefficients w•  Called Linear Models 6
  7. 7. Machine Learning Srihari Error Function•  Sum of squares of the errors between the Red line is best polynomial fit predictions y(xn,w) for each data point xn and target value tn•  Factor ½ included for later convenience•  Solve by choosing value of w for which E(w) is as small as possible 7
  8. 8. Machine Learning Srihari Minimization of Error Function•  Error function is a quadratic in coefficients w•  Derivative with respect to Since y(x,w) = ∑ w j x j M coefficients will be linear in j= 0 elements of w ∂E(w) N = ∑{y(x n ,w) − t n }x n i•  Thus error function has a unique ∂w i n=1 minimum denoted w* N M = ∑{∑ w j x nj − t n }x n i n =1 j=0•  Resulting polynomial is y(x,w*) Setting equal to zero N M N ∑∑w j x i+ j n = ∑ tn x n i n=1 j= 0 n=1 Set of M +1 equations (i = 0,.., M) are solved to get elements of w * 8 €
  9. 9. Machine Learning Srihari Choosing the order of M•  Model Comparison or Model Selection•  Red lines are best fits with –  M = 0,1,3,9 and N=10 Poor representations of sin(2πx) Over Fit Best Fit Poor to representation sin(2πx) of sin(2πx) 9
  10. 10. Machine Learning Srihari Generalization Performance •  Consider separate test set of 100 points •  For each value of M evaluate 1 N E(w*) = ∑{y(x n ,w*) − t n }2 2 n=1 M y(x,w*) = ∑ w *j x j j= 0 for training data and test data€ •  Use RMS error € M=9 means ten degrees of Small freedom. Poor due to Tuned –  Division by N allows different Error Inflexible exactly to 10 sizes of N to be compared on polynomials training points equal footing (wild –  Square root ensures ERMS is oscillations measured in same units as t in polynomial) 10
  11. 11. Machine Learning Srihari Values of Coefficients w* fordifferent polynomials of order M As M increases magnitude of coefficients increases At M=9 finely tuned to random noise in target values 11
  12. 12. Machine Learning Srihari Increasing Size of Data SetN=15, 100For a given model complexity overfitting problem is less severe as size of data set increasesLarger the data set, the more complex we can afford to fit the dataData should be no less than 5 to 10 times adaptive parameters in model 12
  13. 13. Machine Learning Srihari Least Squares is case of Maximum Likelihood•  Unsatisfying to limit the number of parameters to size of training set•  More reasonable to choose model complexity according to problem complexity•  Least squares approach is a specific case of maximum likelihood –  Over-fitting is a general property of maximum likelihood•  Bayesian approach avoids over-fitting problem –  No. of parameters can greatly exceed no. of data points –  Effective no. of parameters adapts automatically to size of data set 13
  14. 14. Machine Learning SrihariRegularization of Least Squares•  Using relatively complex models with data sets of limited size•  Add a penalty term to error function to discourage coefficients from reaching large values •  λ determines relative importance of regularization term to error term •  Can be minimized exactly in closed form •  Known as shrinkage in statistics Weight decay in neural networks 14
  15. 15. Machine Learning Srihari Effect of RegularizerM=9 polynomials using regularized error function Optimal No Large Regularizer Regularizer λ = 0 λ = 1 Large Regularizer No Regularizer λ = 0 15
  16. 16. Machine Learning Srihari Impact of Regularization on Error•  λ controls the complexity of the model and hence degree of over- fitting –  Analogous to choice of M•  Suggested Approach:•  Training set –  to determine coefficients w M=9 polynomial –  For different values of (M or λ)•  Validation set (holdout) –  to optimize model complexity (M or λ) 16
  17. 17. Machine Learning SrihariConcluding Remarks on Regression•  Approach suggests partitioning data into training set to determine coefficients w•  Separate validation set (or hold-out set) to optimize model complexity M or λ•  More sophisticated approaches are not as wasteful of training data•  More principled approach is based on probability theory•  Classification is a special case of regression where target value is discrete values 17