72. ***Confidence interval for the slope The 95% CI: -6.17 1 -4.70 The est. average consumption of oil is reduced by between 4.7 gallons to 6.17 gallons per each increase of 1 0 F. 1 -5.44
73. Confidence interval for the slope Mental Health is reduced by between 8.5 and 14.5 units per increase of Worry units. Mental Health is reduced by between 1.2 and 8.2 units per increase in Ignore the Problem units.
74. Example – Effect of violence, stress, social support on internalizing behavior
Assumed knowledge: A solid understanding of linear correlation.
Regression tends not to be used for Exploratory or Descriptive purposes.
Linear Regression attempts to explain a relationship using a straight line fit to the data and then extending that line to predict future values.
A line of best fit can be applied using any method e.g., by eye/hand. Another way is to use the Method of Least Squares – a formula which minimizes the sum of the vertical deviations See also - http://www.hrma-agrh.gc.ca/hr-rh/psds-dfps/dafps_basic_stat2_e.asp#D
Example from Landwehr & Watkins (1987), cited in Howell (2004, pp. 216-218) and accompanying powerpoint lecture notes).
Compare the formula for b to the formula for r .
Answers are not exact due to rounding error and desire to match SPSS.
The intercept is labeled “constant.” Slope is labeled by name of predictor variable.
The variance of these residuals is indicated by the standard error in the regression coefficients table
Significance tests of the slope and intercept are given as t -tests. The t values in the second from right column are tests on slope and intercept. The associated p values are next to them. The slope is significantly different from zero, but not the intercept.
Ignoring problems is a coping strategy for dealing with stress.
R = correlation [multiple correlation in MLR] R 2 = % of variance explained Adjusted R 2 = % of variance, reduced estimate, bigger adjustments for small samples In this case, Ignoring Problems accounts for ~10% of the variation in Psychological Distress
The MLR ANOVA table provides a significance test of R It is NOT a “normal ANOVA” (test of mean differences) tests whether a significant (non-zero) amount of variance is explained? (null hypothesis is zero variance explained) In this case a significant amount of Psychological Distress variance is explained by Ignoring Problems, F (1,218) = 25.78, p < .01
Multiple regression coefficient table Analyses the relationship of each IV with the DV For each IV, examine B, Beta, t and sig. B = unstandardised regression coefficient [use in prediction equations] Beta (b) = standardised regression coefficient [use to compare predictors with one another] t-test & sig. shows the statistical likelihood of a DV-IV relationship being caused by chance alone
Y = a + b x + e X = predictor value (IV) = (ignore problems) Y = predicted value (DV) = (psychological distress) Note that high scores indicate good mental health, i.e., absence of distress) a = Y axis intercept (Y-intercept – starting level of psych. distress i.e., when X is 0) b = unstandardised regression coefficient (i.e. B in SPSS) (regression coefficient - slope – line of best fit – average rate at which Y changes with one unit change in X) e = error
Detailed Overview Readings LR vs MLR MLR Questions Multiple R Interpreting MLR Prediction Equations Partial Correlations Determining the relative importance of IVs Types of MLR Dummy Variables Assumptions Residuals General MLR Strategy Summary
In MLR there are: multiple predictor X variables (IVs) and a single predicted Y (DV)
Figure 11.2 Three-dimensional plot of teaching evaluation data (Howell, 2004, p. 248)
The MLR equation has multiple regression coefficients and a constant (intercept).
The coefficient of determination is a measure of how well the regression line represents the data. If the regression line passes exactly through every point on the scatter plot, it would be able to explain all of the variation and R 2 would be 1. The further the line is away from the points, the less it is able to explain. If the scatterplot is completely random and there is zero relationship between the IVs and the DV, then R 2 will be 0.
= r in LR but this is only true in MLR when the IVs are uncorrelated.
If IVs are uncorrelated (usually not the case) then you can simply use the correlations between the IVs and the DV to determine the strength of the predictors. If the IVs are standardised (usually not the case), then the unstandardised regression coefficients (B) can be compared to determine the strength of the predictors. If the IVs are measured using the same scale (sometimes the case), then the unstandardised regression coefficients (B) can meaningfully be compared.
It is a good idea to get into the habit of drawing Venn diagrams to represent the degree of linear relationship between variables.
The partial correlation between Worry and Distress is .46, which uniquely explains considerably more variance than the partial correlation between Ignore and Distress (.18).
95% CI
Kliewer, Lepore, Oskin, & Johnson, (1998) Image: http://cloudking.com/artists/noa-terliuc/family-violence.php
Data available at www.duxbury.com/dhowell/StatPages/More_Stuff/Kliewer.dat
CBCL = Child Behavior Checklist Predictors are largely independent of each other. Stress and Witnessing Violence are significantly correlated with Internalizing.
R 2 has same interpretation as r 2 . 13.5% of variability in Internal accounted for by variability in Witness, Stress, and SocSupp.
t test on two slopes (Violence and Stress) are positive and significant. SocSupp is negative and not significant. However the size of the effect is not much different from the two significant effects.
Re the 2 nd point - the same holds true for other predictors.
Vemuri & Constanza (2006).
Nigeria, India, Bangladesh, Ghana, China and Philippines were treated as outliers and excluded from the analysis.
e.g., some treatment variables may be less expensive and these could be entered first to find out whether or not there is additional justification for the more expensive treatments
If IVs are correlated then you should also examine the difference between the zero-order and partial correlations. Image: http://www.gseis.ucla.edu/courses/ed230bc1/notes1/con1.html
IVs = metric (interval or ratio) or dichotomous, e.g. age and gender DV = metric (interval or ratio), e .g., pulse Linear relations exist between IVs & DVs, e.g., check scatterplots IVs are not overly correlated with one another (e.g., over .7) – if so, apply cautiously Assumptions for correlation apply e.g., watch out for outliers, non-linear relationships, etc. Homoscedasticity – similar/even spread of data from line of best throughout the distribution For more on assumptions, see http://www.visualstatistics.net/web%20Visual%20Statistics/Visual%20Statistics%20Multimedia/correlation_assumtions.htm
Image: Textbook scan
Regression can establish correlational link, but cannot determine causation.