Like this presentation? Why not share!

# Biostatistics ii

## on Aug 17, 2012

• 737 views

### Views

Total Views
737
Views on SlideShare
737
Embed Views
0

Likes
2
0
0

No embeds

### Report content

• Comment goes here.
Are you sure you want to

## Biostatistics iiPresentation Transcript

• SYLLABUS REQUIREMENTS:Students should know how to work out t-test and Chi Squared test and their interpretation (excludingthe expectation of working out standard deviation or other long calculations).SUPPLEMENTARY NOTE ON STATISTICSThe following are conditions for using various statistical tests.t-Test (Independent samples)1. Interval level data.2. Independent samples3. Populations should be approximately normally distributed.4. Populations should have approximately the same standard deviation.5. Samples contain less than 30 values each.Degrees of freedom (df) for the two samples is the total number of samples minus two.t-Test (Matched samples)1. Matched paired samples2. Interval level data3. Population of differences should be normally distributed.4. Samples contain less than 30 values.Degrees of freedom = df = (numbers of pairs of values) –1Chi-Squared Test1. Nominal level data2. The expected frequency should not fall below 5 in more than 20% of the cells.Degrees of freedom = df = (number of columns) – 1
• Two statistical tests in syllabus: 1. Chi-Squared (2) Test 2. t-test:  Independent samples  Matched samples Statistics is the art and science of making sense out of data
• Let us discover: When? Why? How? To use these statistical tests
• When? Applied to results obtained from an experimentE.g. students investigate the effect of UV light on seed germination: Not irradiated: 8/10 germinated Irradiated: 3/10 germinated
• Why are statistical tests applied? To test whether a result occurred by chance or note.g. A student counted the number of visits made by butterflies in 1 h: 6 100 visits visits 20 visitsCan the student conclude that butterflies prefer blue flowers?
• Result could have occurred by chance
• Which test?Chi-Squared (2) Test t-test
• Type of data:Chi-Squared (2) TestCategorical / Nominal level t-test Interval level
• Categorical / Nominal Level of Data: from the Latin nomen, meaning name data is grouped under a ‘label’ Woodlice in humid Blood grouping in and in dry areas people Choice chamber
• Examples of Categorical Level of Data: Eye Blue Brown Green Other colour Number 25 100 55 20 of people Tree Type Oak Pine Olive Number 30 48 8 of insects
• Interval Level of Data: accurate measurements of a variable is continuous data has units of measurement e.g. length, weight, temperature
• t-test : assesses whether the means of two groups are statistically different from each other e.g. heart beats per minute: At rest After exercise 70 120 68 106 73 134 70 100 67 116 Mean: 69.6 Mean: 115.2
• Two formulae for t-test:Independent samples Comparing amount of sugars in two types of apple. Matched samples Comparing size of leaves on two positions on the tree.
• t-test (independent samples): readings are taken on two different:  organisms  situations Mean length of plant e.g. Mean height of stems grown in the boys and girls light and in darkness
• t-test (matched samples):1. One person’s pre-test and post-test score e.g. taking the time to recognise a picture upside down and normal orientation2. One person in a group matched to another person in another group e.g.  husband and wife  identical twins
• t-test (matched samples): A student wanted to find out if the area of moss (cm2) growing on the North and South facing sides of trees in N S a local wood, differs. TreeArea of moss A B C D E F G H I J K L(cm2)North side 44 44 46 47 48 50 51 52 52 57 62 67South side 36 39 39 43 49 49 51 54 58 60 61 72
• e.g. Independent samples: Spur dogfish Lesser spotted dogfish Blood pressure of ventral aorta (mm Hg)Spur dogfish 36 29 30 30 31 34 27 37 37 30 32 32Lesser spotted 22 20 24 14 17 17dogfish 21 20 19 18 18 25
• What is a ?A suggested explanation for an observation
• Hypothesis Testing is a method:for deciding if an observed effect or result occurs by chance alone
• Are there more woodlice in humid area: by ? their characteristic? Humid Dry 9 1
• TheScientific Method
• Hypothesis Testing A factory discharges chemical waste into a river A researcher wants to investigate whether the pollution is stunting the fish growth A local angler tells her that since the factory has opened, the fish he caught became smaller than average Angler’s explanation (hypothesis) is that pollution from factory has stunted the fish
• To decide if the results of anexperiment occur by chance or not, the researcher declares: A null hypothesis (NH) The hypothesis actually tested An alternative hypothesis (AH) The other hypothesis, assumed true if NH is false
• The NH states that there will be NODIFFERENCE between the groups as a result of the treatmentTHE AH indicates there WILL be a difference between the groups
• How to State the NH & AH: NH: There is no significant difference between the mean height of girls and boys. AH: There is a significant difference between the mean height of girls and boys.
• Write a suitable NH:A researcher wanted to findout whether light intensityhas an effect on the rate ofphotosynthesis in Elodea. There is no significant difference in the rate of photosynthesis when the light intensity is varied.
• Write a suitable NH:A researcher wanted to find out whetheralcohol has an effect on memory. He did thisby finding out the number of wordsremembered after drinking water and thenagain after drinking alcohol.There is no significant differencebetween the number of words ORremembered after drinking water oralcohol.Alcohol has no effect on memory.
• To ACCEPT or REJECT the NH:When NH is ACCEPTED:  i.e. there is no difference between the groupsWhen NH is REJECTED:  i.e. there is a difference between the groups – treatment made a difference
• provides a means of making decisions under certainty
• Whether NH is accepted or rejected is based on whether the results of astatistical test performed on the results of the experiment is: orthan a preset level of probability
• Probability Probability is the scientific way of stating the degree of confidence we have in predicating something Suppose a bag contains brown and green marbles and we extract 10:  Thus, we can say that the bag has more brown than green – but we can’t be certain
• Suppose we extract another 10 marbles and get: We are now more confident, but how confident would we have to be to satisfy ourselves that there are more brown than green marbles? Answer is 95% that is 5% chance of being wrong
• The percentage chosen probability is called, the: or
• By convention, the critical probabilityfor rejecting the NH is 5% (i.e. P = 0.05)
• You are given the critical values from atable and must choose the appropriate one Level of significance (P) Degrees of freedom (df) 0.05 0.025 0.01 0.005 0.001 1 3.84 5.02 6.63 7.88 10.83 2 5.99 7.38 9.21 10.60 13.81 3 7.81 9.35 11.34 12.84 16.27 Part from a table showing the critical values of 2 test.
• Degrees of freedom (df) are related to the size of the samples studied formulae depend on the test being used: Chi-squared test (2 ) df = (number of columns) – 1 Colour of flower Red Purple Yellow Number of bee visits 75 51 20 df = 3 – 1 = 2
• xB A Degrees of freedom (df) t-test (independent samples) df = (total number of samples) - 2 df = (7 + 7) – 2 Radicle lengths / cm df = 14 – 2 = 12 Treatment A Treatment B 4.1 8.1 8.4 9.0 9.2 8.1 6.0 7.8 6.4 5.3 5.3 7.7 4.1 9.8 Mean A = 6.21 Mean B = 7.97
• xB A Degrees of freedom (df) t-test (matched samples) df = (number of pairs of values) - 1 Specimen A B C D E F G H Rate of heart 28 30 30 31 32 33 34 36 beat at 5C Rate of heart 39 40 39 45 46 37 47 39 beat at 10C df = 8 – 1 df = 7
• OVERVIEW1. How to present a statistical test2. 2 (Chi-squared) test3. t-test
• The order of writing up a statistical test:- NH (Null Hypothesis) AH (Alternative Hypothesis) Name of test, including any assumptions about the populations Level of significance (is used to indicate the chance that we are wrong in rejecting the NH) Calculations Conclusion (accept or reject NH)
• is less than the the result is said to be not significant
• = 0.23 is less than the = 2.69 the result is due to chance
• is larger than the the result is said to be statistically significant
• = 15.87 is larger than the = 2.69 the result is not due to chance
• OVERVIEW1. How to present a statistical test2.  2 (Chi-squared) test3. t-test
• Chi-Squared ( 2) Test   2 O  E  2 EO = observed frequencies /valuesE = expected frequencies / values= the ‘sum of’
• Chi-Squared ( 2) Test   2 O  E  2 O = observed values E E = expected valuesChecklist Use the 2-test when the following conditions are satisfied 1 Categorical level data (Categorical = nominal) 2 The expected frequency should not fall below 5 in more than 20% of the cells. Number of degrees of freedom (df) = (number of columns – 1) [Note that columns refer to classes of data]
• The expected frequency should not fallbelow 5 in more than 20% of the cells. Expected 10 3 28 1 45 cell 5 cells = 100% 1 cell = 20%
• The expected frequency should not fallbelow 5 in more than 20% of the cells. Expected 10 3 28 1 45 2 cells = 40% The expected frequency is below 5 in 40% of the cells.
• Example 1: Comparing categories of a single sampleAs part of an investigation into the foraging habitsof bees (Bombus monticola), the number of visitsmade to two types of plant, Vaccinium vitis-idaeaand Erica tetralix, were recorded in the tablebelow; these numbers are called the observedfrequencies (O). Type of plant Vaccinium vitis- Erica tetralix idaea Number of visits (Observed 75 51 frequencies, O)
• Null hypothesis: There is no significant difference in the number of visits to each type of plant.Alternative hypothesis: There is a difference in the number of visits to each type of plant.
• How to calculate the expected valuesIf the NH is true: expected number of visits to each type of plant = 50% of total.total number of visits: 75 + 51 = 126No. of visits to V. vitis-idaea: 50% of 126 = 63No. of visits to E. tetralix: 50% of 126 = 63
• Observed Expected Difference Frequency Frequency (O - E) O  E 2 (O) (E) EVaccinium 75 63 75 - 63 = 12 75  632vitis-idaea 63 = 2.29Erica 51 63 51 - 63 = -12 51  63 2tetralix = 2.29 63 2   O  E 2 = 2.29 + 2.29 = 4.58 E  df = (number of classes of data) – 1 df = 2 - 1 = 1
•  Critical value (crit2) corresponding to 1 df and a 5% level of significance is 3.84 Calculated value is 4.45 CALCULATED VALUE is greater than the critical value, crit2 Reject the NH and accept the AH
• Conclusion:there is a difference in the number of visits to the two species of plantresult is not by chance Type of plant Vaccinium vitis- Erica tetralix idaea Number of visits 75 51
• Example 2: Comparing the Data Obtained from a Genetics Experiment with the Outcome Predicted using Mendelian RatiosImportant:Apply chi-squared to testoutcomes of a genetic cross
• One tall and one dwarf pure-breeding pea plantwere crossed to produce F1 generation plants.Two of these F1 generation plants were crossedto produce F2 generation plants. 300 seeds ofthese F2 generation plants were grown on, ofwhich 292 survived, comprising 215 tall and 77dwarf plants.According to Mendelian laws, the ratio of tall todwarf plants should be 3:1.Use the Chi-squared test (2) with a 5% level ofsignificance to determine if the data isconsistent with Mendelian laws, i.e. whetherthe Mendelian ratio fits the data.
• NH:There is no significant difference betweenthe data obtained and the Mendelian ratio. This can be also stated as:NH:Ratio of tall to dwarf plants is 3:1 (datais consistent with Mendelian laws).
• AH: Ratio of tall to dwarf plants is not 3:1 (data is not consistent with Mendelian laws).Expected frequency of 3  292  219 tall plants: 4Expected frequency of 1 dwarf plants:  292  73 4
• Observed Expected Differencefrequency frequency O  E 2 E O E O-E 215 219 215 – 219 = -4  42  0.07 219 77 – 73 = 4  4   0.22 2 77 73 73   2 O  E 2 = 0.07 + 0.22 = 0.29 Edf = 2 - 1 = 1crit2 = 3.84 at 5% level of significanceNH is accepted as 2 = 0.29 is less than crit2 = 3.84,i.e. data is consistent with Mendelian laws.
• EXAMPLES :  2 testHomozygous recessive pea plants were crossed withheterozygous round peas. 150 offspring were obtained,of which 81 were round and 69 wrinkled. Is thissignificantly different from the Mendelian ratio of 1:1?[crit2 = 3.84 (1df, P = 0.05)]NH: Data is consistent with Mendelian ratio.AH: Data is not consistent with Mendelian ratio. 150Expected ratio:  75 2
• Phenotype of Number of offspring O  E 2 offspring Observed Expected E Round 81 75 0.48 Wrinkled 69 75 0.48 2 = 0.48 + 0.48 = 0.96 [crit2 = 3.84 (1df, P = 0.05)]Since the calculated value for 2 is (less than /greater than) the critical value at the 5% level, thenthe null hypothesis is (rejected / accepted).
• 2. The gene for coat colour in dogs has an allele for dark coat colour dominant over the allele for albino colour, whilst the gene for hair length has an allele for short hair dominant over the allele for long hair. The ratio for the offspring of phenotypes is 9:3:3:1, assuming the genes are unlinked. Use 2 test to determine whether the data below is consistent with this ratio. Phenotype Dark / Dark / Albino / Albino / short long short long Number of 187 56 61 20 offspring [crit2 = 11.34 (3df, P = 0.05)]
• NH:There is no significant difference between the dataobtained and the Mendelian ratio. This can be also stated as:NH: Data is consistent with Mendelian ratio.AH: Data is not consistent with Mendelian ratio.
• Phenotype Dark / Dark / Albino Albino Total short long / short / long Number of 187 56 61 20 324 offspring Expected 9 3 3 1 16 ratio Number of offspringPhenotype of How to get theoffspring Observed Expected expected 9Dark / short 187 182.25  324 16 3Dark / long 56 60.75  324 16 3Albino / short 61 60.75  324 16 1Albino / long 20 20.25  324 16
• Phenotype of Number of offspring O  E 2 offspring Observed Expected E Dark / short 187 182.25 0.124 Dark / long 56 60.75 0.371 Albino / short 61 60.75 0.001 Albino / long 20 20.25 0.003 2 = 0.124 + 0.371 + 0.001 + 0.003 = 0.499 [crit2 = 11.34 (3df, P = 0.05)]Since the calculated value for 2 is (less than /greater than) the critical value at the 5% level, thenthe null hypothesis is (rejected / accepted).
• 3. An investigation to determine whether woodlice prefer dark conditions to light was carried out in a choice chamber. Half of the choice chamber was covered in black paper and the other half left in light.Ten woodlice were introduced into the choicechamber. The number of woodlice in each sidewas counted after thirty minutes. The experimentwas repeated five times and the results are shownbelow.
• Dark 7 9 10 8 6 Light 3 1 0 2 4Use 2 test to determine whether light affectsdistribution of woodlice. [crit2 = 3.84 (1df, P = 0.05)]NH: There is no significant difference between thenumber of woodlice in the dark and in the light.[Woodlice distribution is not affected by light].AH: There is a significant difference…………..
• Dark 7 9 10 8 6 = 40Light 3 1 0 2 4 = 10 Observed Expected Difference O  E 2 frequency frequency E O E O-E Dark 40 25 15 9 Light 10 25 -15 9 2 = 9 + 9 = 18 [crit2 = 3.84 (1df, P = 0.05)] Since the calculated value for 2 is (less than / greater than) the critical value at the 5% level, then the null hypothesis is (rejected / accepted).
• OVERVIEW1. How to present a statistical test2. 2 (Chi-squared) test3.t-test
• Formula for the t-test (independent samples) xT  x C t  2 2 ST SC  nT nC xT is the mean of a set of data T xC is the mean of a set of data C ST is the standard deviation for a set of data T SC is the standard deviation for a set of data C nT and nC are the number of samples in sets of data T and C respectively
• t-Test (Independent Samples)Checklist Use the t-test when the following conditions are satisfied 1 Interval level data 2 Independent samples 3 Populations should be approximately normally distributed
• t-Test (Independent Samples)Checklist Use the t-test when the following conditions are satisfied 4 Populations should have approximately the same standard deviation
• t-Test (Independent Samples)Checklist Use the t-test when the following conditions are satisfied 5 Samples contain less than 30 values each Number of insects on Number of insects on Austrian pine horse chestnut 14 0 4 11 1 1 11 7 13 4 10 30 1 values 41 1 7 7 3 2 7 2
• t-Test (Independent Samples)Number of degrees of freedom (df) = (total number of values in both samples – 2) Number of Number of insects /cm2 on insects /cm2 on Austrian pine horse chestnut 14 0 4 11 1 1 11 7 df = 20 - 2 = 18 13 4 30 1 41 1 7 7 3 2 7 2
• Formula for t-Test (Matched Samples) d n 1 t s Where: d is the mean of the differences n is the number of samples s is the standard deviation
• t-Test (Matched Samples)Checklist Use the t-test (matched samples) when the following conditions are satisfied 1 Matched (paired) samples 2 Interval level data 3 Population of differences should be approximately normally distributed 4 Samples contain less than 30 values eachNumber of degrees of freedom (df) = (Number of pairsof values) – 1
• What does the t-test do?Tests if the mean values of two groups are statistically different. Number of Number of insects /cm2 on insects /cm2 on Austrian pine horse chestnut 14 0 4 11 1 1 11 7 13 4 3.6 13.1 30 1 41 1 7 7 3 2 7 2 Do insects really prefer the Austrian pine or is result occurring by chance?
• Austrian pine Horse chestnut Mean: 13.1 Mean: 3.6
• What information is needed to work out the
• Mean Standard deviation squared Number of samples in set Number in Number in quadrat on quadrat on Austrian pine horse chestnut 14 0 = 131/10 4 11 = 36/10 1 1 = 13.1 11 7 = 3.6 13 4Standard deviation is 30 41 1 1 given. SYLLABUS 7 7 3 2 states that you are 7 2not required to work = 10 = 10 it out.
• These three cases help you understand why the standard deviation is important to consider:the difference between the means is the same in all three BUT
• spread of values is different. In which case is it most probable that the mean values of two groups are statistically different?
• consider area of OVERLAP Means are statistically different where overlap is least.
• difference between group means variability of groups
• NH: No reaction occurs.
• NH: REJECTED.