Multiple linear regression

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

2 comments

Comments 1 - 2 of 2 previous next Post a comment

Post a comment
Embed Video
Edit your comment Cancel

Notes on slide 1

7126/6667 Survey Research & Design in Psychology Semester 1, 2009, University of Canberra, ACT, Australia James T. Neill Home page: http://ucspace.canberra.edu.au/display/7126 Lecture page: http://en.wikiversity.org/wiki/Survey_research_methods_and_design_in_psychology http://en.wikiversity.org/wiki/Multiple_linear_regression Image source: http://commons.wikimedia.org/wiki/File:Vidrarias_de_Laboratorio.jpg License: Public domain Description: Introduces and explains the use of linear regression and multiple linear regression in the context of psychology. This lecture is accompanied by a computer-lab based tutorial, the notes for which are available here: http://ucspace.canberra.edu.au/display/7126/Tutorial+-+Multiple+linear+regression ---- Note: Some slides don't show on Notes view – need to be cut and pasted manually into new slides

3 Favorites

Multiple linear regression - Presentation Transcript

  1. Lecture 8 Survey Research & Design in Psychology James Neill, 2009 Multiple Linear Regression
  2. Overview
    • Correlation (Review)‏
    • Regression
    • MLR
      • Assumptions
      • Types
      • R , coefficients
      • Partial correlations
      • Equation
  3. Correlation (Review) Linear relation between two or more variables
  4. Correlation is shared variance
  5. Linear correlation
    • Linear relations between continuous variables
    • Line of best fit on a scatterplot
    • Doesn't provide prediction equation.
  6. Correlation - Cross-products -ve cross-products -ve cross-products +ve cross- products +ve cross- products
  7. Correlation – Key points
    • Covariance is the sum of cross-products
    • Correlation is the standardised sum of cross-products, ranging from -1 to 1 (sign indicates direction, value indicates size)
    • Coefficient of determination ( r 2 ) indicates % of shared variance
    • Correlation does not necessarily equal causality
  8. Purposes of correlational statistics
  9. Regression Predicts/explains a DV based on linear relations with one or more IVs
  10. What is regression analysis?
    • An extension of correlation
    • A way of measuring the relationship between two or more variables.
    • Used to calculate the extent to which one variable changes (DV) when other variable(s) change (IV(s)).
    • Used to help understand possible causal effects of one variable on another.
  11. Purposes of regression Explanatory - e.g., hours of study -> academic grades Predictive - e.g., demographics -> life expectancy (demographics e.g., age, gender, diet, exercise, genetics … )
  12. What is linear regression (LR)? Involves:
    • one predictor (IV) and
    • one outcome (DV)‏
    Explains a relationship using a straight line fitted to the data.
  13. Least squares criterion
  14. Type of data
    • DV = Continuous (Interval or Ratio)‏
    • IV = Continuous or Dichotomous (may need to create dummy variables)‏
  15. Dummy variables
    • Regression can use continuous or dichotmous IVs.
    • To “dummy code” is to convert a more complex variable into several dichotomous variables.
    • Dichotmous = either having (x = 0) or not having (x = 1) a characteristic
  16. Dummy variables
    • Dummy variables are dichotomous variables created from a variable with a higher level of measurement
    • e.g., a 3-level categorical variable Religion (1 = Christian; 2 = Muslim; 3 = Atheist) -> can be recoded into three dummy variables
      • Christian (0 = no; 1 = yes)
      • Muslim (0 = no; 1 = yes)
      • Atheist (0 = no; 1 = yes)
  17. Example: Cigarettes & coronary heart disease IV = Cigarette consumption DV = Coronary Heart Disease IV = Cigarette consumption Example from Landwehr & Watkins (1987), cited in Howell (2004, pp. 216-218) and accompanying lecture notes.
  18. Example: Cigarettes & coronary heart disease
    • IV = Av. # of cigs per adult per day
    • DV = CHD mortality rate (deaths per 10,000 per year due to CHD)
    • Unit of analysis = Country
    • Research question: How fast does CHD mortality rise with a one unit increase in smoking?
  19. Data
  20. Scatterplot with Line of Best Fit
  21. Linear regression equation Variables:
    • Y hat (DV) = predicted values of Y (annual rate of CHD mortality)‏
    • X (IV) = mean # of cigarettes per adult per day per country
    Co-efficients:
    • b = slope = rate of increase/decrease of Y hat for each unit increase in X
    • a = Y-intercept = level of Y when X is 0 .
  22. Linear regression formula
  23. Y = bX + a + e X = IV values Y = DV values a = Y- axis intercept b = slope of line of best fit (regression coefficient) e = error Linear regression equation
  24. Regression coefficients - SPSS
  25. Making a prediction
    • Assume that we want to predict CHD mortality when cigarette consumption is 6.
    • We predict 14.61 people/10,000 in that country will die of coronary heart disease.
  26. Accuracy of prediction
    • Finnish smokers smoke 6 cigarettes/adult/day
    • We predict 14.61 deaths/10,000
    • They actually have 23 deaths/10,000
    • Our error (“residual”) = 23 - 14.61 = 8.39
  27. Residual Prediction
  28. Explained variance
    • r = .71
    • r 2 = .71 2 =.51
    • Approximately 50% in variability of incidence of CHD mortality is associated with variability in smoking.
  29. Hypothesis testing
    • Null hypotheses:
      • b = 0
      • a = 0
      • population correlation (  ) = 0
  30. Testing Slope and Intercept a b
  31. Example Does a tendency to ‘ignore problems’ (IV) predict level of ‘psychological distress’ (DV)?
  32. Line of best fit seeks to minimise sum of squared residuals
  33. Ignoring Problems accounts for ~10% of the variation in Psychological Distress
  34. It is unlikely that the population relationship between Ignoring Problems and Psychological Distress is 0%.
  35. PD = 119 -9.5*Ignore There is a sig. a or constant (Y-intercept). Ignoring Problems is a significant predictor of Psychological Distress
  36. a = 119 b = -9.5 Distress = 119 - 9.5*Ignore e = error
  37. Linear regression summary
    • Linear regression is for explaining or predicting the linear relationship between two variables
    • Y hat = b x + a ( b is the slope; a is the Y -intercept)
  38. Multiple Linear Regression Linear relations between two or more IVs and a single DV
    • ~50% of the variance in CHD mortality could be explained by cigarette smoking (using LR)
    • Strong effect - but what about the other 50% (‘unexplained’ variance)?
    • e.g., exercise and cholesterol?
    • Single predictor: LR Multiple predictors: MLR
    LR  MLR example: Cigarettes & coronary heart disease
  39. Linear Regression X Y Multiple Linear Regression X 1 X 2 X 3 Y X 4 X 5 Linear regression summary
  40. Correlation Regression Partial Correlation MLR Y Y X X 1 X 2
  41. What is MLR?
    • Use of several IVs to predict a DV
    • Provides a measure of overall fit ( R )
    • Makes adjustments for inter-relationships among predictors
      • e.g. IVs = height, gender DV = weight
    • Weights each predictor
  42. Research question Do:
    • # of cigarettes / day (IV 1 )
    • exercise (IV 2 ) and
    • cholesterol (IV 3 ) predict
    • CHD mortality (DV)?
    Cigarettes Exercise CHD Mortality Cholesterol
  43. Research question To what extent do personality factors (IVs) predict income (DV) over a lifetime? Extraversion Neuroticism Income Psychoticism
  44. Research question “Does the number of years of psychological study (IV1) and the number of years of counseling experience (IV2) predict clinical psychologists’ effectiveness in treating mental illness (DV)?” Study Experience Effectiveness
  45. 3-way scatterplot
  46. Y = b 1 x 1 + b 2 x 2 +.....+ b i x i + a + e
    • Y = observed dependent scores
    • b i = unstandardised regression coefficients (the B s in SPSS)
    • x 1 to x i = independent variable scores
    • a = Y axis intercept
    • e = error (residual)
    Regression equation
  47. Multiple correlation coefficient ( R )
    • Equivalent of r , but taking into account multiple predictors
    • Capitalise (i.e., R )
    • Always positive
  48. Coefficient of determination ( R 2 )
    • Or “squared multiple correlation coefficient”
    • Usually report R 2 instead of R
    • R 2 = % of variance in DV explained by combined effects of the IVs
    • Analogous to r 2
  49. Interpretation of R 2 R 2 = .30 is good for social sciences
    • R 2 = .10 = small
    • R 2 = .20 = medium
    • R 2 = .30 = large
  50. Adjusted R 2
    • Adjusted R 2 : Used for estimating explained variance in a population.
    • As number of predictors approaches N, R 2 is inflated
    • Hence report R 2 and adjusted R 2 , particularly for small N and where results are to be generalised
    • If N is small, take more note of adjusted R 2
  51. Regression coefficients Y = b 1 x 1 + b 2 x 2 +.....+ b i x i + a + e
    • Intercept ( a )
    • Slopes ( b ):
      • Unstandardised
      • Standardised
    • Slopes are the weighted loading of IV, adjusted for the other IVs in the model.
  52. Unstandardised regression coefficients
    • B = unstandardised regression coefficient
    • Used for regression equations
    • Used for predicting Y scores
    • Can’t be compared with one another unless all IVs are measured on the same scale
  53. Standardised regression coefficients
    • Beta ( b or  ) = standardised regression coefficient
    • Used for comparing the relative strength of predictors
    •  = r in LR but this is only true in MLR when the IVs are uncorrelated.
  54. Relative importance of IVs
    • Which IVs are the most important?
    • Compare the standardised regression coefficients (  ’s)
  55. Example “ Does ‘ignoring problems’ (IV 1 ) and ‘worrying’ (IV 2 ) predict ‘psychological distress’ (DV)”
  56.  
  57. .32 .52 .34 Y X 1 X 2
  58.  
  59.  
  60.  
  61.  
  62. .18 .32 .46 .52 .34 Y X 1 X 2
  63. Linear Regression Psych. Distress = 119 - 9.50*Ignore R 2 = .11 Multiple Linear Regression Psych. Distress = 139 - .4.7*Ignore - 11.5*Worry R 2 = .30 Prediction equations
  64. Confidence interval for the slope The est. average consumption of oil is reduced by between 4.7 gallons to 6.17 gallons per increase of 1 0 F. B 1  -5.44 The 95% CI: -6.17  B 1  -4.70
  65. Confidence interval for the slope Mental Health (PD) is reduced by between 8.5 and 14.5 units per increase of Worry units. Mental Health (PD) is reduced by between 1.2 and 8.2 units per increase in Ignore the Problem units.
  66. Example Effect of violence, stress, social support on internalising behaviour problems Kliewer, Lepore, Oskin, & Johnson, (1998) Internalising behaviour problems e.g., withdrawing, anxiety, inhibited, and depressed behaviours
  67. Study
    • Participants were children:
      • 8 - 12 yrs
      • Lived in high-violence areas, USA
    • Hypothesis: Violence and stress   internalising behavior, whereas social support would   internalising behaviour.
  68. Variables
    • Predictors
      • Degree of witnessing violence
      • Measure of life stress
      • Measure of social support
    • Outcome
      • Internalising behaviour (e.g., depression, anxiety symptoms) – measured using the Child Behavior Checklist (CBCL)
  69. Correlations
  70. R 2
  71. Test for overall significance
    • Shows if there is a linear relationship between all of the X variables taken together and Y
    • Hypothesis:
    H 0 :  1 =  2 = … =  p = 0 (No linear relationships)‏ H 1 : At least one  i  0 (At least one independent variable effects Y)‏
  72. Test for overall significance
    • Sig. test of R 2 given by ANOVA table
  73. Test for significance: Individual variables Shows if there is a linear relationship between each variable X i and Y . Hypotheses: H 0 :  i = 0 (No linear relationship)‏ H 1 :  i  0 (Linear relationship between X i and Y)‏
  74. Regression coefficients
  75. Regression equation
    • A separate coefficient or slope for each variable
    • An intercept (here its called b 0 )
  76. Interpretation
    • Slopes for Witness and Stress are +ve; slope for Social Support is -ve.
    • (Ignoring Stress and Social Support), a one unit increase in Witness would produce .038 unit increase in Internalising symptoms.
  77. Predictions If Witness = 20, Stress = 5, and SocSupp = 35, then we would predict that internalising symptoms would be..... .012.
  78.  
  79. Variables
    • IVs:
      • Human & Built Capital (Human Development Index)
      • Natural Capital (Ecosystem services per km 2 )
      • Social Capital (Press Freedom)
    • DV = Life satisfaction
    • Units of analysis: Countries ( N = 57; mostly developed countries, e.g., in Europe and America)
    • There are moderately strong positive and statistically significant linear relations between the IVs and the DV
    • The IVs have small to moderate positive inter-correlations.
    • R 2 = .35
    • Two sig. IVs (not Social Capital - dropped)
    • R 2 = .72 (after dropping 6 outliers)
  80.  
  81.  
  82. Types of MLR
    • Standard or direct (simultaneous)
    • Hierarchical or sequential
    • Stepwise (forward & backward)
    • Forward addition
    • Backward elimination
    • All predictor variables are entered together
    • Allows assessment of the relationship between all predictor variables and the criterion ( Y ) variable if there is good theoretical reason for doing so.
    • Manual technique & commonly used
    Direct or Standard
    • IVs are entered in blocks or stages.
      • Researcher defines order of entry for the variables, based on theory.
      • May enter ‘nuisance’ variables first to ‘control’ for them, then test ‘purer’ effect of next block of important variables.
    • R 2 change - additional variance in Y explained at each stage of the regression.
      • F test of R 2 change.
    Hierarchical (Sequential)
  83. Partial correlations Partial correlation - r p between X and Y after controlling for (partialling out) the influence of a 3rd variable from both X and Y. e.g., Does: # years of marriage predict (IV) marital satisfaction (DV) after # children is controlled for?
    • The ‘best’ predictor variables are entered, one by one, if they reach a criteria (e.g., p < .05)
    • Best predictor = IV with the highest r with Y
    • Computer-driven – controversial
    Forward selection
    • All predictor variables are entered, then the weakest predictors are removed, one by one, if they meet a criteria (e.g., p > .05)
    • Worst predictor = x with the lowest r with Y
    • Computer-driven – controversial
    Backward elimination
    • Combines both forward & backward.
    • At each step, variables may be entered or removed if they meet certain criteria.
    • Useful for developing the best prediction equation from the smallest no. of variables.
    • Redundant predictors removed.
    • Computer-driven – controversial
    Stepwise
  84. Which method?
    • Standard : To assess impact of all IVs simultaneously
    • Hierarchical : To test specific hypotheses derived from theory
    • Stepwise : If goal is accurate statistical prediction – computer driven
  85. Assumptions
    • IVs = metric (interval or ratio) or dichotomous
    • DV = metric (interval or ratio)
    • Linear relations exist between IVs & DVs
    • IVs are not overly correlated with one another (multicollinearity- e.g., over .7)
    • Normality enhances solution
    • Homoscedasticity
    • Ratio of cases to IVs; total N :
      • Min. 5:1; > 20 cases total
      • Ideal 20:1; > 100 cases total
  86. Dealing with outliers
    • Extreme cases should be deleted or modified.
    • Univariate outliers - detected via initial data screening
    • Bivariate outliers – detected via scatterplots
    • Multivariate outliers - unusual combination of predictors…
  87. Multivariate outliers
    • Can use Mahalanobis' distance or Cook’s D as a MV outlier screening procedure
    • A case may be within normal range on all variables, but represent a multivariate outlier which unduly influences multivariate test results
  88. Multivariate outliers
    • e.g., a person who:
      • Is 19 years old
      • Has 3 children
      • Has an undergraduate degree
    • Identify & check unusual cases
  89. Multivariate outliers
    • Mahalanobis distance is distributed as  2 with d. of f. equal to no. of predictors (  = .001)
    • If any cases have a Mahalanobis distance greater than critical level -> multivariate outlier.
    • Cook’s D - identifies influential variables, values >1 are considered unusual.
  90. Normality & homoscedasticity Normality
    • If non-normality, there will be heteroscedasticity
    Homoscedasticity
    • Variance around regression line is same throughout the distribution
    • Even spread in residual plots
  91. Homoscedasticity Heteroscedasticity is not fatal - but it weakens the analysis.
  92. Multicollinearity
    • Multicollinearity - high correlations (e.g., over .7) between IVs.
    • Singularity - perfect correlations among IVs.
    • This leads to unstable regression coefficients.
  93. Multicollinearity Detect via:
    • Correlation matrix - are there large correlations among IVs?
    • Tolerance statistics - if < .3 then exclude that variable.
    • Variance Inflation Factor (VIF) - looking for < 3, otherwise exclude variable.
  94. Causality
    • Like correlation, regression does not tell us about the causal relationship between variables.
    • In many analyses, the IVs and DVs could be switched – therefore, it is important:
      • Take a theoretical position
      • Acknowledge alternative explanations
  95. General MLR strategy
    • Check assumptions
    • Conduct MLR – choose type
    • Interpret the output
    • Develop a regression equation
  96. 1. Check assumptions
    • Assumptions ( X s not correlated, X-Y linear relations, normal distributions, homoscedasticity)‏
    • Check histograms (normality)‏
    • Check scatterplots (linearity & outliers)‏
    • Check correlation table (linearity & collinearity)‏
    • Check influential outlying cases (mv outliers)‏
    • Check residual plots
  97. 2. Conduct MLR Conduct a MLR:
    • Standard
    • Hierarchical
    • Stepwise
    • Forward
    • Backward
  98. 3. Interpret the results
    • Overall amount of Y predicted ( R, R 2, Adjusted R 2, the statistical significance of R )
    • Changes in R and F change (if hierarchical)
    • Coefficients for IVs Standardised and unstandardised regression coefficients for IVs in each model ( b , B ).
    • Relations between X predictors ( r )
    • 0-order and partial correlations for each IV in each model (draw a Venn diagram)
  99. 4. Regression equation
    • MLR is usually for explanation, sometimes prediction
    • If useful, develop a regression equation for the final model.
    • Interpret constant and slopes.
  100. References Howell, D. C. (2004). Chapter 9: Regression. In D. C. Howell.. Fundamental statistics for the behavioral sciences (5th ed.) (pp. 203-235). Belmont, CA: Wadsworth. Kliewer, W., Lepore, S.J., Oskin, D., & Johnson, P.D. (1998) The role of social and cognitive processes in children’s adjustment to community violence. Journal of Consulting and Clinical Psychology, 66 , 199-209. Landwehr, J.M. & Watkins, A.E. (1987) Exploring Data : Teacher’s Edition . Palo Alto, CA: Dale Seymour Publications. Vemuri, A. W., & Constanza, R. (2006). The role of human, social, built, and natural capital in explaining life satisfaction at the country level: Toward a National Well-Being Index (NWI). Ecological Economics , 58 (1), 119-133.

+ University of CanberraUniversity of Canberra, 2 years ago

custom

9188 views, 3 favs, 3 embeds more stats

Introduces and explains the use of multiple linear more

More info about this document

© All Rights Reserved

Go to text version

  • Total Views 9188
    • 8345 on SlideShare
    • 843 from embeds
  • Comments 2
  • Favorites 3
  • Downloads 389
Most viewed embeds
  • 836 views on http://ucspace.canberra.edu.au
  • 6 views on http://burakcihanurkmez.blogspot.com
  • 1 views on http://64.233.183.104

more

All embeds
  • 836 views on http://ucspace.canberra.edu.au
  • 6 views on http://burakcihanurkmez.blogspot.com
  • 1 views on http://64.233.183.104

less

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

Cancel
File a copyright complaint
Having problems? Go to our helpdesk?

Categories