Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Transformation of variables


Published on

Transformation of variables in Regression Analysis

  • Be the first to comment

Transformation of variables

  2. 2. CONTENTS :  Introduction  Objectives  Kinds of transformations  Rules of Thumb with Transformations  Transformations to Achieve Linearity  Methods of transformation of variables  Logarithmic transformation  Square root transformation  Power transformation  Inverse transformation  Reciprocal & Cube root transformations  Precautions with transformations  References
  4. 4. Data do not always come in a form that is immediately suitable for analysis. We often have to transform the variables before carrying out the analysis. In some instances it can help us better examine a distribution.
  5. 5. OBJECTIVES  To achieve normality.  To stabilize the variance.  To ensure linearity. It often becomes necessary to fit a linear regression model to the transformed rather than the original variables. Transformation of a variable can change its distribution from a skewed distribution to a normal distribution (bell-shaped, symmetric about its centre).
  6. 6. KINDS OF TRANSFORMATIONS  Linear transformations  Nonlinear transformations
  7. 7.  Linear transformation : A linear transformation preserves linear relationships between variables. Therefore, the correlation between x and y would be unchanged after a linear transformation. Examples of a linear transformation to variable x would be - - multiplying x by a constant. - dividing x by a constant. - adding a constant to x.
  8. 8.  Nonlinear transformation : A nonlinear transformation changes (increases or decreases) linear relationships between variables and, thus, changes the correlation between variables. Examples of a nonlinear transformation of variable x can be taken as - square root of x - log of x - power of x - reciprocal of x
  9. 9. RULES OF THUMB WITH TRANSFORMATIONS  Transformations on a dependent variable will change the distribution of error terms in a model. Thus, incompatibility of model errors with an assumed distribution can sometimes be remedied with transformations of the dependent variable.  Non linearity between the dependent variable and an independent variable often can be linearized by transforming the independent variable. Transformations on an independent variable often do not change the distribution of error terms.
  10. 10. RULES OF THUMB WITH TRANSFORMATIONS  When a relationship between a dependent and independent variable requires extensive transformations to meet linearity and error distribution requirements, often there are alternative methods for estimating the parameters of the relation, namely, non-linear regression and generalized regression models.  Confidence intervals computed on transformed variables need to be computed by transforming back to the original units of interest.  Models can and should only be compared on the original units of the dependent variable, and not the transformed units. Thus prediction goodness of fit tests and similar should be calculated using the original units.
  11. 11. THANK YOU
  12. 12. HOW TO PERFORM A TRANSFORMATION TO ACHIEVE LINEARITY? Transforming a data set to enhance linearity is a multi-step, trial-and- error process.  Conduct a standard regression analysis on the raw data.  Construct a residual plot.  If the plot pattern is random ,then there is no need to transform data.  If the plot pattern is not random then continue.  Compute the coefficient of determination (R2).  Choose a transformation method.  Transform independent variables , dependent variables and both if needed.
  13. 13.  Conduct a regression analysis, using the transformed variables.  Compute the coefficient of determination (R2), based on the transformed variables.  If the transformed R2 is greater than the raw-score R2, the transformation was successful. Congratulations!  If not, try a different transformation method.  The best transformation method (exponential model, square root model, reciprocal model, etc.) will depend on nature of the original data. The only way to determine which method is best is to try each and compare the result (i.e., residual plots, correlation coefficients).
  14. 14. METHODS OF TRANSFORMATION OF VARIABLES  Logarithmic Transformation  Square root Transformation  Power Transformation  Inverse Transformation  Reciprocal Transformation  Cube root Transformation  Exponential Transformation
  15. 15. LOGARITHMIC TRANSFORMATION Most frequently used transformation is logarithmic transformation. Logarithmically transforming variables in a regression model is a very common way to handle situations where a non- linear relationship exists between the independent and dependent variables. Logarithmic transformations are also a convenient means of transforming a highly skewed variable into one that is more approximately normal.
  16. 16. SIMPLE EXAMPLES  For instance, If we plot the histogram of expenses we see a significant right skew in this data ,meaning the many of cases are bunched at lower values : -
  17. 17. If we plot the histogram of the logarithm of expenses, however, we see a distribution that looks much more like a normal distribution - Plot of histogram after applying log transformation
  18. 18. If the relationship between x and y is of the form – y = a xb taking log of both sides transforms it into a Linear from : ln(y) = ln(a) + b ln(x) or Y = b0 + b1 X Transformations : Y = ln y , X = ln x , b0 = lna , b1 = b .
  19. 19. SQUARE ROOT TRANSFORMATIONS The square root is a transformation with a moderate effect on distribution shape it is weaker than the logarithmic transformation. x to x^(1/2) = sqrt (x). It is also used for reducing right skewness, and also has the advantage that it can be applied to zero values. Note that the square root of an area has the units of a length. It is commonly applied to counted data , especially if the values are mostly rather small.
  20. 20. EXAMPLE  Below, the table shows data for independent and dependent variables :- x and y, respectively. X 1 2 3 4 5 6 7 8 9 y 2 1 6 14 15 30 40 74 75 When we apply a linear regression to the untransformed raw data and plot the residuals shows a non-random pattern (a U-shaped curve), which suggests that the data are nonlinear.
  21. 21. Plot of residuals
  22. 22. Suppose we repeat the analysis, using a square root model to transform the dependent variable. For this model, we use the square root of y, rather than y, as the dependent variable. Using the transformed data, our regression equation is y't = b0 + b1x where, yt = transformed dependent variable, which is equal to the square root of y y't = predicted value of the transformed dependent variable yt x = independent variable b0 = y-intercept of transformation regression line b1 = slope of transformation regression line
  23. 23. The table below shows the transformed data we analyzed x 1 2 3 4 5 6 7 8 9 yt 1.14 1.00 2.45 3.74 3.87 5.48 6.32 8.60 8.66 Since the transformation was based on the square root model (yt = the square root of y), the transformation regression equation can be expressed in terms of the original units of variable Y as: y' = ( b0 + b1x )2 where, y' = predicted value of y in its original units x = independent variable b0 = y-intercept of transformation regression line b1 = slope of transformation regression line
  24. 24. Plot of residuals for transformed variables
  25. 25. The residual plot shows residuals based on predicted raw scores from the transformation regression equation. The plot suggests that the transformation to achieve linearity was successful. The pattern of residuals is random, suggesting that the relationship between the independent variable (x) and the transformed dependent variable (square root of y) is linear. And the coefficient of determination was 0.96 with the transformed data versus only 0.88 with the raw data. Hence the transformed data resulted in a better model.
  26. 26. THANK YOU
  27. 27. POWER TRANSFORMATIONS In many cases data is drawn from a highly skewed distribution that is not well described by one of the common statistical families. Simple power transformation may map the data to a common distribution like the Gaussian or Gamma distribution. A suitable model can then be fitted to the transformed data making a distribution of the original data available by inverting a function of random variable. Formally, the power transform is defined as follows for non- negative data  where λ is, a real valued parameter, exponent term.
  28. 28. The reason for the specific definition above is that it is continuous in λ. That is the mapping fλ (x) defined above is continuous in both x and λ. When the data is allowed to take negative values the simplest extension is to shift all values to the right by adding a number large enough so all values are non-negative. Power transformations are only effective if the ratio of the largest data value to the smallest data value is large
  29. 29. BOX-COX POWER TRANSFORMATION  It is one form of power transformation.  It can be used as a remedial action to make the data normal. Following are the few Box-Cox transformations when lambda takes values between -2 to 2
  30. 30. COMMON BOX-COX TRANSFORMATIONS  λ : -2 -1 -0.5 0 0.5 1 2  x : 1/x2 1/x 1/ 𝑥 log(x) 𝑥 x x2
  31. 31. INVERSE TRANSFORMATIONS To take the inverse of a number (x) is to compute : -(1/x). What this does is essentially make very small numbers very large, and very large numbers very small. This transformation has the effect of reversing the order of your scores. Thus, one must be careful to reflect, or reverse the distribution prior to applying an inverse transformation. To reflect, one multiplies a variable by -1, and then adds a constant to the distribution to bring the minimum value. Then, once the inverse transformation is complete, the ordering of the values will be identical to the original data.
  32. 32. Computing the inverse transformation
  33. 33. SPECIFYING THE TRANSFORM VARIABLE NAME AND FORMULA First, in the Target Variable text box, type a name for the inverse transformation variable, e.g. “innetime“. Second, there is not a function for computing the inverse, so we type the formula directly into the Numeric Expression text box. Third, click on the OK button to complete the compute request.
  34. 34. THE TRANSFORMED VARIABLE The transformed variable which we requested SPSS compute is shown in the data editor in a column to the right of the other variables in the dataset.
  35. 35. OTHER TRANSFORMATIONS Reciprocal transformation : The reciprocal, x to 1/x. It can not be applied to zero values. Although it can be applied to negative values, it is not useful unless all values are positive. Cube root transformation : The cube root, x to x^(1/3). This is a fairly strong transformation with a substantial effect on distribution shape. It is weaker than the logarithmic transformation. It is also used for reducing right skewness, and has the advantage that it can be applied to zero and negative values. Note that the cube root of a volume has the units of a length. It is commly appiled to rain fall data.
  36. 36. PRECAUTIONS WITH USING TRANSFORMATIONS OF VARIABLES  Although transformations can result in improvement of a specific modelling assumption, such as linearity or homoscedasticity, they can often result in the violation of others. Thus, transformations must be used in an iterative fashion, with continued checking of other modelling assumptions as transformations are made.  Another difficulty arises when the response or dependent variable Y is transformed. In these cases a model results that is a statistical expression of the dependent variable in a form that was not of primary interest in the initial investigation, such as the log of Y, the square root of Y, or the inverse of Y. When comparing statistical models, the comparisons should always be made on the original untransformed scale of Y.
  37. 37. REFERENCES  Neter, John, Michael Kutner, Christopher Nachtsheim, and William Wasserman, , “Applied Linear Statistical Models”. 4th Edition.  
  38. 38. THANK YOU