Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

No Downloads

Total views

1,280

On SlideShare

0

From Embeds

0

Number of Embeds

56

Shares

0

Downloads

37

Comments

0

Likes

2

No embeds

No notes for slide

- 1. T tests, ANOVAs and regression Tom Jenkins Ellen MeierottoSPM Methods for Dummies 2007
- 2. Why do we need t tests?
- 3. Objectives Types of error Probability distribution Z scores T tests ANOVAs
- 4. Error Null hypothesis Type 1 error (α): false positive Type 2 error (β): false negative
- 5. Normal distribution
- 6. Z scores Standardised normal distribution µ = 0, σ = 1 Z scores: 0, 1, 1.65, 1.96 Need to know population standard deviation Z=(x-μ)/σ for one point compared to pop.
- 7. T tests Comparing means 1 sample t 2 sample t Paired t
- 8. Different sample variances
- 9. 2 sample t tests x1 − x 2 t = s x1 −x2 2 2Pooled standard s1 s 2 = +error of the mean s x1 − x2 n1 n2
- 10. 1 sample t test
- 11. The effect of degrees offreedom on t distribution
- 12. Paired t tests
- 13. T tests in SPM: Did the observed signalchange occur by chance or is it stat.significant? Recall GLM. Y= X β + ε β1 is an estimate of signal change over time attributable to the condition of interest Set up contrast (cT) 1 0 for β1: 1xβ1+0xβ2+0xβn/s.d Null hypothesis: cTβ=0 No significant effect at each voxel for condition β1 Contrast 1 -1 : Is the difference between 2 conditions significantly non-zero? t = cTβ/sd[cTβ] – 1 sided
- 14. ANOVA Variances not means Total variance= model variance + error variance Results in F score- corresponding to a p valueVariance n ∑ ( xi − x ) 2 F test = Model variance /Error variances2 = i =1 n −1
- 15. Partitioning the varianceGroup Group Group Group Group Group1 2 1 2 1 2 Total = Model + Error (Between groups) (Within groups)
- 16. T vs F tests F tests- any differences between multiple groups, interactions Have to determine where differences are post-hoc SPM- T- one tailed (con) SPM- F- two tailed (ess)
- 17. Conclusions T tests describe how unlikely it is that experimental differences are due to chance Higher the t score, smaller the p value, more unlikely to be due to chance Can compare sample with population or 2 samples, paired or unpaired ANOVA/F tests are similar but use variances instead of means and can be applied to more than 2 groups and other more complex scenarios
- 18. Acknowledgements MfD slides 2004-2006 Van Belle, Biostatistics Human Brain Function Wikipedia
- 19. Correlation and Regression
- 20. Topics Covered: Is there a relationship between x and y? What is the strength of this relationship Pearson’s r Can we describe this relationship and use it to predict y from x? Regression Is the relationship we have described statistically significant? F- and t-tests Relevance to SPM GLM
- 21. Relationship between x and y Correlation describes the strength and direction of a linear relationship between two variables Regression tells you how well a certain independent variable predicts a dependent variable CORRELATION ≠ CAUSATION In order to infer causality: manipulate independent variable and observe effect on dependent variable
- 22. Scattergrams Y Y Y Y Y Y X X XPositive correlation Negative correlation No correlation
- 23. Variance vs. Covariance Do two variables change together? nVariance ~ ∑(x i − x) 2 S = 2 x i =1 DX * DX n nCovariance ~ ∑(x i − x)( yi − y ) DX * DY cov( x, y ) = i =1 n
- 24. Covariance n ∑(x i − x)( yi − y ) cov( x, y ) = i =1 n When X and Y : cov (x,y) = pos. When X and Y : cov (x,y) = neg. When no constant relationship: cov (x,y) =0
- 25. Example Covariance76 x y xi − x yi − y ( xi − x )( yi − y )5 0 3 -3 0 04 2 2 -1 -1 132 3 4 0 1 01 4 0 1 -3 -30 6 6 3 3 9 0 1 2 3 4 5 6 7 x=3 y=3 ∑= 7 n ∑(x i − x)( yi − y )) 7 What does thiscov( x, y ) = i =1 = = 1.4 number tell us? n 5
- 26. Example of how covariance value relies on variance High variance data Low variance data Subject x y x error * y x y X error * y error error 1 101 100 2500 54 53 9 2 81 80 900 53 52 43 61 60 100 52 51 14 51 50 0 51 50 05 41 40 100 50 49 16 21 20 900 49 48 47 1 0 2500 48 47 9Mean 51 50 51 50Sum of x error * y error : 7000 Sum of x error * y error : 28Covariance: 1166.67 Covariance: 4.67
- 27. Pearson’s R − ∞ ≤ cov( x, y ) ≤ ∞ Covariance does not really tell us anything Solution: standardise this measure Pearson’s R: standardise by adding std to equation: cov( x, y ) rxy = sx s y
- 28. Basic assumptions Normal distributions Variances are constant and not zero Independent sampling – no autocorrelations No errors in the values of the independent variable All causation in the model is one-way (not necessary mathematically, but essential for prediction)
- 29. Pearson’s R: degree of linear dependence n n ∑ (x i − x)( yi − y ) ∑ ( x − x)( y − y) i icov( x, y ) = i =1 rxy = i =1 n nsx s y −1 ≤ r ≤ 1 n ∑Z xi * Z yi rxy = i =1 n
- 30. Limitations of r ˆ r is actually r r = true r of whole population ˆ r = estimate of r based on data r is very sensitive to extreme values: 5 4 3 2 1 0 0 1 2 3 4 5 6
- 31. In the real world… r is never 1 or –1 interpretations for correlations in psychological research (Cohen)Correlation Negative PositiveSmall -0.29 to -0.10 00.10 to 0.29Medium -0.49 to -0.30 0.30 to 0.49Large -1.00 to -0.50 0.50 to 1.00
- 32. Regression Correlation tells you if there is an association between x and y but it doesn’t describe the relationship or allow you to predict one variable from the other. To do this we need REGRESSION!
- 33. Best-fit Line Aim of linear regression is to fit a straight line, ŷ = ax + b, to data that gives best prediction of y for any value of x This will be the line that ŷ = ax + b minimises distance between slope intercept data and fitted line, i.e. the residuals ε = ŷ, predicted value = y i , true value ε = residual error
- 34. Least Squares Regression To find the best line we must minimise the sum of the squares of the residuals (the vertical distances from the data points to our line) Model line: ŷ = ax + b a = slope, b = intercept Residual (ε) = y - ŷ Sum of squares of residuals = Σ (y – ŷ)2 we must find values of a and b that minimise Σ (y – ŷ)2
- 35. Finding b First we find the value of b that gives the min sum of squares b ε b ε b Trying different values of b is equivalent to shifting the line up and down the scatter plot
- 36. Finding a Now we find the value of a that gives the min sum of squares b b b Trying out different values of a is equivalent to changing the slope of the line, while b stays constant
- 37. Minimising sums of squares Need to minimise Σ(y–ŷ)2 ŷ = ax + b so need to minimise: sums of squares (S) Σ(y - ax - b)2 If we plot the sums of squares for all different values of a and b we get a parabola, because it is a squared term Gradient = 0 min S Values of a and b So the min sum of squares is at the bottom of the curve, where the gradient is zero.
- 38. The maths bit So we can find a and b that give min sum of squares by taking partial derivatives of Σ(y - ax - b)2 with respect to a and b separately Then we solve these for 0 to give us the values of a and b that give the min sum of squares
- 39. The solution Doing this gives the following equations for a and b: r sy r = correlation coefficient of x and y a= s sy = standard deviation of y x sx = standard deviation of x You can see that: A low correlation coefficient gives a flatter slope (small value of a) Large spread of y, i.e. high standard deviation, results in a steeper slope (high value of a) Large spread of x, i.e. high standard deviation, results in a flatter slope (high value of a)
- 40. The solution cont. Our model equation is ŷ = ax + b This line must pass through the mean so: y = ax + b b = y – ax We can put our equation into this giving: r sy r = correlation coefficient of x and y b=y- x sy = standard deviation of y sx sx = standard deviation of x The smaller the correlation, the closer the intercept is to the mean of y
- 41. Back to the model We can calculate the regression line for any data, but the important question is: How well does this line fit the data, or how good is it at predicting y from x?
- 42. How good is our model? ∑(y – y)2 SSy Total variance of y: sy2 = = n-1 dfy Variance of predicted y values (ŷ): ∑(ŷ – y)2 SSpred This is the variance sŷ2 = = explained by our n-1 dfŷ regression model Error variance: This is the variance of the error between our predicted y values ∑(y – ŷ)2 SSer and the actual y values, and serror = 2 = thus is the variance in y that is n-2 dfer NOT explained by the regression model
- 43. How good is our model cont. Total variance = predicted variance + error variance sy2 = sŷ2 + ser2 Conveniently, via some complicated rearranging sŷ2 = r2 sy2 r2 = sŷ2 / sy2 so r2 is the proportion of the variance in y that is explained by our regression model
- 44. How good is our model cont. Insert r2 sy2 into sy2 = sŷ2 + ser2 and rearrange to get: ser2 = sy2 – r2sy2 = sy2 (1 – r2) From this we can see that the greater the correlation the smaller the error variance, so the better our prediction
- 45. Is the model significant? i.e. do we get a significantly better prediction of y from our regression equation than by just predicting the mean? F-statistic: complicated rearranging sŷ2 r2 (n - 2)2 F(df ,df ) = =......= ŷ er ser2 1 – r2 And it follows that: r (n - 2) So all we need to(because F = t 2) t(n-2) = know are r and n ! √1 – r2
- 46. General Linear Model Linear regression is actually a form of the General Linear Model where the parameters are a, the slope of the line, and b, the intercept. y = ax + b +ε A General Linear Model is just any model that describes the data in terms of a straight line
- 47. Multiple regression Multiple regression is used to determine the effect of a number of independent variables, x1, x2, x3 etc., on a single dependent variable, y The different x variables are combined in a linear way and each has its own regression coefficient: y = a1x1+ a2x2 +…..+ anxn + b + ε The a parameters reflect the independent contribution of each independent variable, x, to the value of the dependent variable, y. i.e. the amount of variance in y that is accounted for by each x variable after all the other x variables have been accounted for
- 48. SPM Linear regression is a GLM that models the effect of one independent variable, x, on ONE dependent variable, y Multiple Regression models the effect of several independent variables, x1, x2 etc, on ONE dependent variable, y Both are types of General Linear Model GLM can also allow you to analyse the effects of several independent x variables on several dependent variables, y1, y2, y3 etc, in a linear combination This is what SPM does and will be explained soon…

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment