Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SJUT/Mat210/Regression/Intro 2013-14S2

177 views

Published on

  • Be the first to comment

  • Be the first to like this

SJUT/Mat210/Regression/Intro 2013-14S2

  1. 1. St. John's University of Tanzania MAT210 NUMERICAL ANALYSIS 2013/14 Semester II REGRESSION Introduction Kaw, Chapter 6.01-6.02
  2. 2. MAT210 2013/14 Sem II 2 of 14 ● Regression is related to Interpolation Degrees of freedom underlies the link ● Degrees of freedom Mechanical view # of independent parameters that define the configuration of a system Statistical view # of values in calculation of a statistic that may vary Mathematical view # of dimensions a vector subspace Degrees of Freedom
  3. 3. MAT210 2013/14 Sem II 3 of 14 Freedom ↔ Constraints ● So, degrees of freedom indicate something about “varying independently” ● Consider ideal gas law: PV=nRT – Setting P, V and n dictates T – 3 degrees of freedom ● A constraint reduces degrees of freedom ● For fixed n, PV ~ T ● (x,y,z) define a 3D vector, z=0 constrains it to x,y plane where (x,y) define a 2D vector
  4. 4. MAT210 2013/14 Sem II 4 of 14 So What? ● Two points define a line, three points define a parabola, an infinite family of parabolas pass through any two points ● In other words – n+1 points define a unique nth order polynomial – Order n polynomial has n+1 degrees of freedom ● Two points under-define a parabola – n points under-define an nth order polynomial
  5. 5. MAT210 2013/14 Sem II 5 of 14
  6. 6. MAT210 2013/14 Sem II 6 of 14 Nothing fits exactly ● More than n+1 points means it may not be possible to fit an nth order polynomial to pass through each and every point ● The points are constraints and there are just too many of them ● n + a constraints, n + 1 degrees of freedom ● a = 1 : one unique polynomial ● a < 1 : infinite family of polynomials ● a > 1 : over constrained, what is to be done?
  7. 7. MAT210 2013/14 Sem II 7 of 14 Regression ● In the case where there are extra constraints, then we need to find the best polynomial ● What is best? – Best is minimizing the gap between the polynomial and all the points ● The process for finding this polynomial that has the best fit is regression ● Part of a broader idea of curve fitting – Any function, but still with n+1 parameters
  8. 8. MAT210 2013/14 Sem II 8 of 14 Iterative Curve Fitting http://en.wikipedia.org/w/index.php?title=Curve_fitting&oldid=613302547
  9. 9. MAT210 2013/14 Sem II 9 of 14 Regression Analysis ● A statistical process ● Estimating relationship among variables y=β0 +β1 x1 +β2 x2 +⋯+βn xn +ϵ y = response variable xi = predictor variables β0 ,β1 ,β2 ,⋯,βn = regression coefficients ϵ = error due to variability in the data
  10. 10. MAT210 2013/14 Sem II 10 of 14 Example ● Cooking a potato: heat is applied for some specific time ● Uncooked portion = y (untransformed starch) ● Linear function of time (t) and temperature (θ) of cooking? Alternatively y=β0 +β1 t+β2 θ+ϵ y=β0 +β1 tθ+β2 θ+ϵ y=β0 +β1 t+β2 tθ+β3 θ+ϵ
  11. 11. MAT210 2013/14 Sem II 11 of 14 Uses ● Prediction ● Model specification ● Parameter estimation Good predictions not possible if the model is not correctly specified and parameters are not accurate Parameter estimation is difficult - not only is the model required to be correctly specified, the prediction must also be accurate and the data should allow for good estimation
  12. 12. MAT210 2013/14 Sem II 12 of 14 Abuses ● Extrapolation ● Generalization ● Causation Regression analysis can only aid in confirmation or refutation of a causal model ● The model must have a theoretical basis ● Regression analysis cannot prove causality, rather it can only substantiate or contradict causal assumptions ● Nothing more!
  13. 13. MAT210 2013/14 Sem II 13 of 14 Danger of Extrapolation
  14. 14. MAT210 2013/14 Sem II 14 of 14 The Heart of Regression ● “Best” is defined as minimizing the sum of the squares of the residual ● Residual is the difference between the actual and the predicted values ● Squaring allows picking the “best of the best” – Many options may have a total residual of 0 – Squaring prevents bad (+) from cancelling bad (-) ● Regression tools allow finding that least squares fit

×