Upcoming SlideShare
×

# Biostatistics ii4june

434 views

Published on

1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
434
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
9
0
Likes
1
Embeds 0
No embeds

No notes for slide

### Biostatistics ii4june

1. 1. Inferential Statistics Session-II 2009 Dr. Arshad Sabir A.P
2. 2. Issues in epidemiological Researcha. Studies undertaken to assess population characteristics like age, vaccination status, prevalence of malnutrition, KAP of Contraception etc (sample based, variations occur normally)Issues: To what extent study findings are a true estimate of reference population ?b. Compare groups to study associations (Cases & Controls, Exposed & Unexposed, Efficacy of a drug etc)Issues: Are the differences observed hold true for the differences in total population? ( differences observed may be due sampling, variations occur normally) .VRIATIONS: Normal / Biological,………..Real………………Experimental ISSUE: we want to be as much precise as possible.
3. 3. Central limit theorem (CLT)• Suppose, we want to know weight of adult population of Rawalpindi city.• Take multiple, Random, large ( >30) samples ( say 1000) are taken. Calculate mean wt. in each case.• We will have 1000 mean wt.(X1-n)• If all the sample means are presented by frequency distribution curve. It will follow a normal distribution pattern. Known as “Sampling distribution of means”.• 68% samples means will fall X ±1SD , 95% means will be X ±2SD. And 99.7% mean will be X ±3SD.• Summary values of such a dist. i.e. mean, SD are very close to population values.• Its mean is almost equal to Pop. mean ( X = µ )• Its SD is known as “Standard Error” (SE)
4. 4. CLT• Formula for SE is ; SE = SD / √ n• SE is a unit of measure of variability that can happen due to sampling (sampling variation).• SE error is based upon “Normal distribution” so follows rules of “normal distribution curve”.• In actuality we take only one sample and use its SD as Standard Error.• So, we can be 95% confident that pop. mean will be within range of dist. mean ±2SE and its chances of falling beyond this range are only ≤5%.• SE is measure of “Chance variation” or normal variation from sample to population or b/w two samples or groups.
5. 5. Confidence Limits and Confidence Interval• When assessing Pop. mean on the basis of one sample. It has its mean X and SD (SE). Its mean is not equal to µ.• According to CLT, SE is a tool to measure variations that can happen due to sampling.• Sample mean (X) is not exactly equal to pop. mean but with help of sample SD .i.e. SE we can construct a range of values around sample mean within which pop. mean would fall with certain degree of confidence.• These limits worked out on both sides of sample mean on the basis of CLT are called “confidence limits” {CLs}. And the range between these limits is known as “confidence interval” {CI}. Formula for Pop. mean µci = X ± 2SE = X ± 2 ( SD) (95% CI) √n
6. 6. Estimation of population parameter from a sample statistic• As per CLT we are sure that 95% of sample means will be within confidence limits of µ ± 2SE .• 95% confidence interval means that there is 95% probability that Pop. mean (µ) lies 2SE below or above the sample mean and 5% probability that it lies outside this interval (P = 0.05). We can say that we are 95% confident in making this statement .• CI is related to size of sample (n). Larger the sample, smaller the CI for a given level of significance.
7. 7. Estimation of pop. Parameters (say mean) form sample statisticsApparently if there is large SE will have wide range of estimate (CI ) or vice versa. We desire a precise estimate. SE basically depends (depends upon 02 factors)Variability: How is dispersion of attribute in the actual Pop. ( reflected by σ ). If SD is large , estimate will far away or wide ( it inherent property , can not be changed) and if it is small, estimate will be close to true value.Sample size: A small sample (n) with no or small variability is good to estimate µ but larger samples are needed to accommodate higher variability in data. This relationship of SD to Sample size (n) is expressed as Standard Error ; SE = SD √n
8. 8. Exercise• 16Kg is mean Ht. of 3y old children obtained from a sample of 11 from a village.( SD=2kg)• How is this estimate? (sampling variation)• To what extent this mean is representative of actual pop. mean ?• SE = 2/ √ 11 = 0.6 Kg• 95% CI = 16 ± 2 x 0.6 14.8kg---------17.2kgRole of sample size: If n= 20, SE= 2/ √ 20 = 0.45kg 95%CI = 16 ± 2 x 0.45 = 15.1-----------16.9kg
9. 9. Standard error of proportion (SEP) • Similarly , Normal distribution of samples proportions around the proportions of pop. may be expressed arithmetically in term of SE of proportion with confidence limits. [ Central limit theorem ] • SEP is also measure of variation due to sampling • 95% of sample proportions will lie within limits of population proportion as P ± 2 SEP {95% CLs}. • Samples with larger or smaller than this range will be rare or only 5%. And such values will taken as statistically significant at 5% level of significance. Formula: SEP = √ p x q / n12/10/12 Dr. Arshad Sabir 9
10. 10. 95% CI for a proportion (percentage) ( categorical variable) Exercise.2• In sample of 120 T.B pts. drawn from country, 23.3%(28) had compliance with treatment.• Is this finding holds true for whole population ?Standard Error for Proportion/Percentage (SEP)if p = one of the percentage (23.3%)100-p = other percentage = 100-23.3 = 76.7% (q) SEP = √ p x q / n = √ 23.3-76.7/ 120 =3.8 95% CI for SEP = p ± 2 x SEP = 23.3 ± (2 x 3.8) 95% CI for SEP = 15.5%----31.1%
11. 11. Standard error of difference b/w two proportions [SE(p1 –p2)] (02 samples)Essentials:1. Samples are large2. Samples are selected at random observed difference = p1- p2 Z = Standard error of diff. SE (p1- p2) if observed difference is more than 2 SE, it is statistically significant or real difference, at 5% level of significance other wise is “normal” difference12/10/12 Dr. Arshad Sabir 11
12. 12. Calculation of SE of difference b/w two proportions [SE(p1 –p2)] SE(p1 –p2) = sum of the square root of the sum of the squares of SEs of the two proportions. SE(p1 –p2) = ( p1 x q1) + ( p2 x q2) n1 n2 Observed Difference (p1 –p2) Z = ------------------------------------- = (LOS ≥ 2) SE of the difference (SE(p1 –p2))12/10/12 Dr. Arshad Sabir 12
13. 13. SE of difference b/w two proportions: Exercise Morality in Pyomeningitis with B. Penicillin 30% and was 20% with Ceftrioxone in sample of 100 in both cases. SE(p1 –p2) = (30 x 70) + (20 x 80) 100 + 100 SE(p1 –p2) = 37 = 6.08 Z = Obs. diff = 30 – 20 = 10/ 6.08 = 1.64 ( critical LOS is 2) SE of diff. 6.08 Z = less than 2 (95% confidence limits) Hence difference is insignificant at 95% confidence limits or at 5% level of significance.12/10/12 Dr. Arshad Sabir 13
14. 14. Uses of SEP 1. To find confidence limits for population proportions (P) when only sample proportion (p) is known. 2. To determine if a sample was drawn from a known population or not when the population proportion is known……… Z = p-P/SEP ( should by within 2SEP at 5%LOS). 3. To find out standard error of the difference b/w the two proportions ( significant or not sig.) 4. To find the size of the sample. n = 4pq / L2 (margin of error, say 5% of proportion p [0.05])12/10/12 Dr. Arshad Sabir 14
15. 15. Decision making in Health1. Standard error for Mean2. Standard error of difference b/w two means3. Students t-test4. Standard Error for Proportion.5. Standard Error of the difference b/w two proportions6. Chi – square test
16. 16. Testing a statistical Hypothesis.“Hypothesis” is a statement which is to be tested under the assumption of to be true .In statistical testing 02 Hypothesis are formulated: 1. Null Hypothesis ( Ho )-there is no difference between characteristics of a two samples or both are from same population. {No difference Hypothesis} 2. Alternate Hypothesis (HA). Sample value is “significantly” different from pop. OR from other sample value. {Hypo. of significant difference}
17. 17. Hypothesis testing……Ho is against the claim of the researcher. Researcher desires to reject Ho and in doing so he may commit error-Type-I error or alpha -error……………… Rejecting Ho when it was actually true ( No significant differences exist ) ORType-II error or Beta-error . Accepting Ho when it was not true …… (Significant difference do exist)
18. 18. Hypothesis testingDecision True situationbased of Difference Difference notstudy results Exist ExistDifference Correct decision Type-I errorexist: {α error}H0 RejectedDifference Type –II error Correct decisiondont exist: {β-error}H0 Accepted
19. 19. Tests of significance• Whether a study result can be considered as result which indeed exist in study population from where sample was drawn?• Whether the differences observed are due to chance variation(normal) or are true due to play some external factor (significantly different).• These tests are mathematical procedures by which likelihood (probability) of an observed study results (differences)occurring by chance is found.• POWRE OF THE TEST: is its ability to detect differences between groups if such differences actually exist.
20. 20. Tests of significance When 02 or more groups are compared, possibility could be; – There is no difference [reject null Hypothesis] – There is some difference: • Slight difference (normal or by chance difference) • Large (sig.)difference not explainable by chance or that may be due to play of some external factor. Extent of an observed diff. of being “normal” and not normal beyond that (significant) is decided on the basis of certain cut off values obtained by applying some statistical test or procedure. Selection of tests depends upon type of data.
21. 21. Level of significanceStudy results are sample basedWe can never 100% sure about study result ( many sources of variation )By convention we accept results if we have 95% confidence upon results (diff. exist) or if chances of having results by chance (actually no diff.) are less than 5%.We allow 5% level of accepting results that might have occurred by chance . This is called level of significance (LOS) or level of alpha.
22. 22. Level of significance (α) and P-valueProbability of committing α-Error or getting the resultsby chance or wrongly rejecting Ho is fixed before thestart of the experiment. (LOS). A max. level is fixed.It is usually fixed at .01 (1%) or .05 (5%) LOSBut the p-value is obtained after completing the experiment. It is derived (from a table)after applying some suitable statistical test to the study results. It is not fixed. It may assume any value more, or less or equal to the LOS (5%).Obtained p-value is compared with LOS. If is (.03,.02 0r . 01)equal or less than 0.05, we will reject Ho and accept HA and if comes more than fixed LOS like 0.06. 0.1, 0.5 etc, we will accept Ho.
23. 23. Important tests of significance.Data information n Tests(Qualitative) Frequencies as Small ( less Fisher exactcategorical percentages or than 40) test-Nominal Proportions Chi-square Large (more etc than 40) test(Quantitative) -Means, 02 groups Students t-testNumeric Multiple gps F-testInterval, If linear Person’sratio scale relationship is Correlation-data suspected Co- efficient. ANVO
24. 24. Chi-square test (x2) ESSENTIALS: Used to find out whether the observed differences b/w proportions of events in 2 or more groups may considered statistically sig. • It was developed by Karl Pearson • Non-parametric test. Not based on any / normal distribution of the variable under study. • Used qualitative, discrete data in frequencies or proportions ( not in percentages) • Involves calculation of a quantity called Chi-square (x2) • This test is based on measuring diff b/w observed frequencies and expected frequencies.12/10/12 Dr. Arshad Sabir 25
25. 25. Steps of applying Chi-square test (x2)An assumption of f no difference is made which is then proved or disproved with the x2 test. (Null hypothesis)• Steps: – Fix a level of sig. (.05) for tab. P-value. – Enter study data in the table, observed Frequency (O) – Calculate expected frequency for each cell (E) – Formula for x 2 value of each cell = (O-E) 2/E E f = (RT x CT / GT) – Add up results of all cells X 2cal = ∑ (O-E) 2/E – Df = (C-1) x ( R-1) ( it is 1 in 2X2 table) – Compare X 2cal value with value X 2tab as pre decided LOS in the table for given DF , if it is equal or larger than it ,. that means p-value for this data is smaller than LOS p-value, we reject H0 and accept HA otherwise we accept H012/10/12 Dr. Arshad Sabir 26
26. 26. Is the use of ANS is associated with shorter distance ?Distance from ANS Used ANS Not Used ANS TotalLess than 10 Km (O) 51(E= 44.4) (O)29 (E= 35.6) 8010Km or more (O)35 (E = 41.6) (O)40 (E= 33.4) 75 86 69 155E or Expected values are calculated on the basis of supposition (H0 )of no difference in utilization of ANS in the two groups of womenX 2cal = ( 51-44.4)2 +(29- 53.6)2+(35-41.6)2+(40-33.4)2 = 4.55 44.4 35.6 41.6 33.4X 2cal at 2DF = 4.55 while X 2tab at 0.05 LOS at 2DF is 3.84 was as the cal value is larger than tab value that means P-value in this case is less than 0.05 hence the diff observed is sig. and H0 is rejected.
27. 27. Chi-square (x2) as a test of “Goodness of fit • Ratio of male to female birth is universally expected 1:1 (50% to 50%). • Observed ratio in a village was M=52 & F=48 • Is the difference is normal or significant? • Male Female • Obs-freq. 52 48 • Expect-freq. 50 50 ( 50% 50%) (52-50)2 + ( 48-50)2 X2 ______________ ____________ = 8/50 = 0.16 50 5012/10/12 Dr. Arshad Sabir 28
28. 28. Chi-square (x2) as a test of “Goodness of fit”. • Degree of freedom = (No. of classes--1) K—1 = 2-1 = 1 OR DF = (R-1) x (C-1) • At 5% LOS expected value of X2 = 3.841 (table value) while calculated value of chi-square is ( X2cal = 0.16) much lower than it. Hence the observed difference in births is normal or by chance and not significant.12/10/12 Dr. Arshad Sabir 29
29. 29. Student’s t-test• Numerical data (mean values),• Normal Variable, Compare 02 groups• Random samplingSteps:1. Calculate t-value (from data)2. Chose a level of Significance (LOS) usually .05 which actually means probability of having difference by chance (P-value)3. Determine DF (= sum of two sample sizes minus 2)4. Locate t-value corresponding to LOS at the given degree or freedom. If cal-t value is equal to larger than table value of t means P-value in this case is significant or less than chosen LOS (indicated at the top of column), Hence H0 is rejected
30. 30. How to calculate t-value1. Calculate means of the two groups (x1 and x2 )2. Calculate difference b/w means of the two groups. (x1 – x2 ).3. Calculate standard Deviation of each study group (SD1 & SD2 )4. Calculate the Standard Error for the both groups (SE1 & SE2) SE = SD / √ n.5. Formula for t- value is x1 – x2 t= ----------------- √ SD12 /n1 + SD22 /n2
31. 31. ExerciseDelivery outcome n Mean Ht. SD Normal B wt 60 156cm 3.1 LBW 52 152cm 2.8 H0 : there is no difference in mean hts. of the two gp? Diff. may be by chance but acceptable LOS is 0.05 (p < .05) x1 – x 2 2 2t = ----------------- = -------------------------- = ------ = 3.6 √ SD12 /n1 + SD22 /n2 √ 3.1 2 /60 + 2.8 2 /52 0.56Calculated value of t = 3.6, Tab value of t at DF 110 at .05 LOS is 1.98 . Hence Cal- t value is larger than tab value of t hence the difference is sig. and H is rejected.
32. 32. OSPE QuestionsExample: 1, In a sample(n=1000) obesity in man was found 20% and30% in women. Is the difference is has reflected actual diff in the total pop. or has occurred by chance.Calculate SE of the diff. b/w two proportions at 5%LOS ?Example.2, Average B.P of bank cashier (170) as compare to that of PRO staff (150). Is the difference is normal or is real due to play some external factor (stress).Calculate SE of the difference b/w two means at 5%LOS