Upcoming SlideShare
×

# Statisticsforbiologists colstons

400 views

Published on

Published in: Technology, Education
1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
400
On SlideShare
0
From Embeds
0
Number of Embeds
34
Actions
Shares
0
13
0
Likes
1
Embeds 0
No embeds

No notes for slide

### Statisticsforbiologists colstons

1. 1. BIOLOGY Spacebar to continue
2. 2. Introduction • Biological studies deal with organisms which show variety • We cannot rely on a single measurement and so we must take a sample • This sample of data must be summarised and analyzed to find out if it is reliable Spacebar to continue
3. 3. Summarising data • MEAN Sum of samples ÷ sample size x ÷ n • MEDIAN Middle number in a list when arranged in rank order: 2, 5, 7, 7, 8, 23, 31 • MODE The measurement which occurs most frequently ; 2, 5, 7, 7, 8, 23, 31 Spacebar to continue
4. 4. Distribution Curves • A visual summary of data • They can be produced by; 1. Collect data 2. Split results into equal size classes 3. Make a tally chart 4. Plot a histogram of frequency against size class • Data can show normal distribution or skewed distribution Spacebar to continue
5. 5. Distribution curves • Normal distribution • Symmetrical bell shaped curve around the mean • Use parametric tests to analyse data 16 14 12 10 8 6 4 2 0 Spacebar to continue
6. 6. Distribution curves • Skewed data • Asymmetrical curve around the mode • Use non-parametric tests to analyse data 18 16 14 12 10 8 6 4 2 0 Spacebar to continue
7. 7. Standard Deviation • Standard deviation (SD) is a measure of the spread of the data Large SD Small SD
8. 8. Standard deviation • A high SD indicates data which shows great variation from the mean • A low SD indicates data which shows little variation from the mean value • By definition, 68% of all data values lie within the range MEAN 1SD • 95% of all values lie within 2SD Spacebar to continue
9. 9. SD and confidence limits 14 • 12 10 8 6 4 2 0 68% 95%
10. 10. Calculating SD • Can only be used for normally distributed data • Calculate as follows; – – – – – Sum the values for x2 ie ( x2) Sum the values for x, then square it ie ( x)2 Divide ( x)2 by n Take one from the other and divide by n Take the square root of this. (see hand-out) Spacebar to continue
11. 11. Calculating SD S= x2 - (( x)2/n) n Spacebar to continue
12. 12. Confidence limits • 95% of all values lie within 2SD of the mean • Any value which lies outside this range is said to be significantly different from the others • We say that we are working to 95% confidence limits or to a 5% significance level. Spacebar to continue
13. 13. Comparison tests • To compare two samples of data we look at the overlap between the two distribution curves. • This depends on; – The distance between the two mean values – The spread of each sample (standard deviation) • The greater the overlap, the more similar the two samples are. Spacebar to continue
14. 14. Comparison tests Mean Mean Sample 2 Overlap Sample 1 Spacebar to continue
15. 15. Comparison tests When the SD is small, the overlap is less; Sample 2 Overlap Sample 1 Spacebar to continue
16. 16. The null hypothesis • In order to compare two sets of data we must first assume that there is no difference between them. • This is called the null hypothesis • We must also produce an alternative hypothesis which states that there is a difference. Spacebar to continue
17. 17. The t-test • Used to compare the overlap of two sets of data • Samples must show normal distribution • Sample size (n) should be greater than 30 • This tests for differences between two sets of data Spacebar to continue
18. 18. The t-test • To calculate t; – Check data is normally distributed by drawing a tally chart – Work out difference in means |x1 – x2| – Calculate variance for each set of data (this is s2 n) – Put these into the equation for t: Spacebar to continue
19. 19. The t-test |x1 – x2| t= s12 n1 s22 n2 Spacebar to continue
20. 20. The t-test • Compare the value of t with the critical value at n1 + n2 – 2 degrees of freedom • Use a probability value of 5% • If t is greater than the critical value we can reject the null hypothesis… • … there is a significant difference between the two sets of data • … there is only a 5% chance that any similarity is due to chance
21. 21. Mann-Whitney u-test • Compares two sets of data • Data can be skewed • Sample size can be small; 5<n<30 • For details refer to stats book Spacebar to continue
22. 22. Chi squared • Some data is categoric • This means that it belongs to one or more categories • Examples include – eye colour – presence or absence data – texture of seeds • For these we use a chi squared test 2 • This tests for an association between two or more variables
23. 23. Chi squared • Draw a contingency table • These are the observed values Blue eyes Green eyes Row totals Fair hair a b a+b Ginger hair c d c+d Column totals a+c b+d a+b+c+d
24. 24. Chi squared • Now work out the expected values: • Where, (Row total) x (Column total) E= (Grand total)
25. 25. Chi squared Blue eyes Fair hair Ginger hair Column totals Green eyes (a+b)(a+c) (a+b+c+d) (c+d)(a+c) (a+b+c+d) (a+b)(b+d) (a+b+c+d) (c+d)(b+d) (a+b+c+d) a+c b+d Row totals a+b c+d a+b+c+d
26. 26. Chi squared • For each box work out (O-E)2 E • Find the sum of these to get 2 (O-E)2 2 = E
27. 27. Chi squared • Compare 2 with the critical value at 5% confidence limits • There will be (no. rows – 1) x (no. columns – 1) degrees of freedom • If 2 is greater than the critical value we can say that the variables are associated with one another in some way • We reject the null hypothesis
28. 28. Spearman Rank • Two sets of data may show a correlation • The data can be plotted on a scatter graph: Negative correlation Positive correlation No correlation
29. 29. Spearman Rank • We calculate the correlation by assigning a rank to the values: Data 1 Rank Data 2 Rank 12 24 14 29 18 29 18 38
30. 30. Spearman Rank • We calculate the correlation by assigning a rank to the values: Data 1 Rank 12 14 1 Data 2 Rank This is the Lowest value – So we call it rank 1 24 29 18 29 18 38
31. 31. Spearman Rank • We calculate the correlation by assigning a rank to the values: Data 1 Rank 12 14 18 18 Data 2 Rank 1 2 24 This is the 2nd lowest value – so we call it rank 2 29 29 38
32. 32. Spearman Rank • We calculate the correlation by assigning a rank to the values: Data 1 Rank 12 1 14 2 18 ? 18 ? Data 2 Rank These should be rank 3 & 4 – but they are the same. We find the average of 3 + 4 and give them this rank 24 29 29 38
33. 33. Spearman Rank • We calculate the correlation by assigning a rank to the values: Data 1 Rank Data 2 Rank 12 1 24 14 2 29 18 3.5 29 18 3.5 (3+4)/2 = 3.5 38
34. 34. Spearman Rank • We calculate the correlation by assigning a rank to the values: Data 1 Rank Data 2 Rank 12 1 14 2 18 3.5 29 18 3.5 38 Similarly on this side 24 29
35. 35. Spearman Rank • We calculate the correlation by assigning a rank to the values: Data 1 Rank Data 2 Rank 12 1 24 14 2 29 18 3.5 29 18 3.5 38 1
36. 36. Spearman Rank • We calculate the correlation by assigning a rank to the values: Data 1 Rank Data 2 Rank 12 1 24 1 14 2 29 2.5 18 3.5 29 2.5 18 3.5 The average of 2 & 3 38
37. 37. Spearman Rank • We calculate the correlation by assigning a rank to the values: Data 1 Rank Data 2 Rank 12 1 24 1 14 2 29 2.5 18 3.5 29 2.5 18 3.5 38 4
38. 38. Spearman Rank • • • • Find the difference D between each rank Square this difference Sum the D2 values Calculate the Spearman Rank Correlation Coefficient rs 6 D2 rs = 1 - n(n2-1)
39. 39. Spearman Rank • Compare rs with the critical value at the 5% level • If it is greater than the critical value (ignoring the sign) then we reject the null hypothesis • … there is a significant correlation between the two sets of data • If the value is positive there is a positive correlation • If it is negative then there is a negative correlation
40. 40. Quick guide Is your data interval data or is it categoric data (it can only be placed in a number of categories) Interval Categoric
41. 41. Quick guide Are you looking for a correlation between two sets of data – eg the rate of photosynthesis and light intensity Yes No
42. 42. Quick guide Use the Chi squared test Back End Chi squared
43. 43. Quick guide Use the Spearman Rank test Back End Chi squared
44. 44. Quick guide Are you comparing data from two populations? Yes No
45. 45. Quick guide Is your data normally distributed? 16 14 12 10 8 6 4 2 0 Yes No
46. 46. Quick guide Use a t-test t-test Back
47. 47. Quick guide Use a Mann-Whitney U test Back Exit