Medical statistics Basic concept and applications [Square one]
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Medical statistics Basic concept and applications [Square one]

  • 476 views
Uploaded on

Provide the basic concept and application of bio-statistics using a practical model coupled with the essential theoretical background.

Provide the basic concept and application of bio-statistics using a practical model coupled with the essential theoretical background.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
476
On Slideshare
476
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
35
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Medical Statistics 2013 Dr Tarek Tawfik Amin
  • 2. Introduction - Questions Why statistics? The process The resources
  • 3. ?How • Book: Statistics at Square One 11th ed. “ Campbell and Swinscow” • SPSS Practical sessions-PASW guide. • Practical sessions using SPSS v. 17.0
  • 4. ”Statistics “ an overview Population Parameters Data Analysis Interpretation Information Sample Statistics Statistical analysis Reference range Researches
  • 5. Statistical analysis Data Statistical analysis Variables Qualitative Categorical Quantitative Numerical Depends on the sample (s) and objectives of analysis Interval/Ratio Nominal Ordinal Tables Discrete Continuous Descriptive Graphs Inferential Measures
  • 6. I-Descriptive Statistics Goals Summarizing Overview Data checking
  • 7. diabIB SB P SB P P NR AT AGE SE X SM E OK H IGH E T W IGH E T CH OL H A1C B DIAB DU DE AD 1 57 0 0 177 98 140 154 0 6.30 7.62 5 #NULL! 2 74 1 0 172 69 150 145 1 5.10 8.30 11 0 3 38 1 0 155 70 120 126 0 6.50 11.00 2 #NULL! 4 73 1 0 165 72 180 157 0 5.80 7.00 21 0 5 53 1 2 174 109 140 119 1 6.80 10.60 7 0 6 74 1 0 171 83 151 145 0 6.25 7.62 7 0 7 81 0 2 175 60 140 113 0 6.50 6.40 6 0 8 86 1 0 164 59 140 158 0 5.20 5.30 4 0 9 78 0 1 171 83 151 148 0 5.60 5.90 1 1 10 78 1 0 171 83 151 159 1 5.00 8.00 23 1 11 91 0 0 171 83 151 140 0 4.30 9.70 4 1 12 77 0 2 176 87 170 198 0 6.40 6.60 7 2 13 77 1 0 171 83 151 152 0 5.20 4.90 26 1 14 84 0 0 171 62 160 148 0 7.00 7.80 8 1 15 72 1 0 154 63 145 148 0 6.20 7.80 0 1 1 IN 2 INSUL
  • 8. I-Tables ) Tables can summarize counts, frequency (categorical), measures (numerical Contingency Frequency smoking history * SEX Crosstabulation SEX Valid male female Total Frequency 145 133 278 Percent 52.2 47.8 100.0 Count Valid Percent 52.2 47.8 100.0 Cumulative Percent 52.2 100.0 SEX male smoking history Total never stopped smoking yes 26 64 55 145 female 110 14 9 133 Total 136 78 64 278 )For comparison (2 or more variables
  • 9. Table 3 Daily servings of calcium and vitamin D rich foods in relation to body mass . index classification of the included adults (* F ood items (servings/ day Subjects classification (Obese (N=91 Milk Milk beverage Milk in cereals Milk in coffee or tea -T otal milk Yoghurt Cheese Ice cream -T otal dairy Tuna (canned) Fish Half cooked fish Shrimp/oyster Eggs Liver (including chicken livers) Others! -Dietary vitamin D (IU/ day): Median (mean ±SD) Low dietary intake c (< 200 IU/day): No. (%) -Dietary calcium (mg/ day): Median (mean ±SD) Low calcium intake d (<1000mg/day): No. (%) (Non-obese (N=125 (0.71±0.3)0.52 0.45(0.59±0.4) 0.20(0.33±0.2) 0.15(0.25±0.6) 0.90(1.03±0.3) 0.10(0.12±0.6) 0.20(0.24±0.9) 0.15(0.14±0.6) 0.25(0.45±0.6) 0.05(0.03±0.1) 0.15(0.19±0.7) 0.06(0.11±0.5) 0.05(0.08±0.1) 0.85(0.81±1.1) 0.02(0.04±0.4) 0.20(0.23±0.3) (111.6)118.1±73.5 56(62.2) (660.0)698.8±261.9 51(56.7) (0.88±0.7)0.65 0.35(0.53±0.4) 0.50(0.58±0.4) 0.20(0.23±0.6) 1.20(1.34±0.7) 0.20(0.14±0.5) 0.20(0.29±0.8) 0.06(0.09±0.3) 0.30(0.43±0.7) 0.03(0.04±0.3) 0.10(0.18±0.5) 0.25(0.27±0.6) 0.05(0.06±0.1) 0.80(0.76±0.7) 0.05(0.06±0.3) 0.40(0.55±0.5) (123.7)132.2±67.4 47(37.6) (692.0)717.9±245.9 49(39.2) P value 0.031 0.279 0.001 0.790 0.001 0.790 0.661 0.422 0.826 0.761 0.902 0.029 0.149 0.797 0.834 0.549 0.034 0.003b 0.223 0.011b a
  • 10. Assignment I ).Table 1 Basic characteristics for the patients examined (N=278 Baseline characteristics 1996 (%)Men- 1 (%)Insulin users- 2 (%)Smokers- 3 (%)Ex-smokers- 4 (%)Non-smokers- 5 )Age in years (mean ±SD- 6 )Systolic Blood pressure at starting point mmHg (mean ±SD- 7 )Systolic blood pressure two years mm Hg (mean ±SD- 8 )Duration of diabetes (median/Quartiles 1-3- 9 Missed values- 10 )Total (N=278 52.2 25.5 23.0 28.1 48.9 ±11.74 67.24 ±22.00 151.20 ±29.1 153.83 (2.75-12.25) 6.0 0.0
  • 11. II-Graphs Goals Impression Comparison Data checking Clustering Trend
  • 12. II- Graphs Types of variables-1 Number of variables-2 Objectives-3 Selection of graphs Next Categorical Numerical Figure 1Outcomes of the included diabetic patients (1996) Figure 2: Smoking status of the inlcuded diabetic patients 60 other cau se of death M issin g 50 40 30 20 alive 10 Percent died from CVD 0 never smoking history stopped smoking yes
  • 13. For numerical variables Figure 3: Total cholesterol level in diabetic pateints 1996 in mmol/l 60 50 40 30 20 Std. Dev = 1.33 10 Mean = 6.25 N = 278.00 0 . 13 . 12 . 11 . 10 00 00 00 00 0 0 00 9. 0 8. 0 7. 0 0 00 6. 0 5. 0 4. 00 3. total cholesterol
  • 14. Figure 4: Systolic blood pressure at starting point among diabetic patients 1996 (mmHg) 240 220 28 247 99 68 67 200 syst. blood pressure at start 180 160 140 120 100 80 N= 133 male SEX 145 female
  • 15. Figure 6: Total cholesterol level in relation to gender and smoking status among diabetic patients 1996 95% CI total cholesterol (mmol/l) 8.5 8.0 7.5 smoking history 7.0 6.5 n ever 6.0 stopped sm oking 5.5 5.0 yes N= 26 64 male SEX 55 110 14 female 9
  • 16. Figure 7: Duration of diabetes among the included patients 1996 Checking for normality (in years) 80 70 60 Median=6.0 Mode=1 50 Normal distribution 40 30 20 Std. Dev = 6.96 10 Mean = 7.9 0 N = 278.00 0.0 5.0 2.5 10.0 7.5 15.0 12.5 20.0 17.5 25.0 22.5 - 30.0 27.5 32.5 + duration of diabetes Outliers Mode Median Mean
  • 17. (III-Measures (numerical variables Central Tendency H the data aggregate around a central point ow Mean Median Mode P ercentiles Dispersion H the data varies ow )Range (max-min Inter Quartile range Variance Standard deviation Variation coefficient
  • 18. Central Tendency M ean= summation of observations/ their number Affected by extremes of value x1+x2+x3)/ number( M ode= T most frequently occurring values in a set of observations he M edian= T middle value that divide the ordered data set into 50/ he 50 Not affected by extremes of values
  • 19. Age of sample 3 3 7 7 37 11 M edian=7 M ean=(3+7+37)/ 3=15.7 M edian=7 M ean=(3+7+11)/ 3=7
  • 20. Dispersion 1 1 6 8 10 16 17 23 43 53 Range=53-1=52 Affected by extremes of values of data %25 25th percentile 1st quartile M edian=13 of data 50% 50th percentile=13 of the data 75% 75th percentiles 3rd quartile Interquartile range=3rd-1st quartiles 17=23-6 IQR not affected by extremes of values
  • 21. Standard deviation and variance 3 7 6- 2- Sample of 3, their age in years 9 17 M ean age=(3+7+17)/ 3=9 8+ T sum of the differences between the mean and individual values=0 he T mean deviation=0 he T overcome the 0= sum the difference squared/ o number-1= Variance 52=2/3-1(17-9)+2(6-9)+2(3-9) ) T amount of dispersion around the mean=52 years2 (wrong scale he H ence we need to convert back to the usual (natural) scale, use the standard deviation Variance=±7.2 years√
  • 22. T sample disperses around the mean (=9 years) by 7.2 years on both directions he
  • 23. Description of a binary (dichotomous (variable o A binary variable: H only two outcomes as (diseased or not diseased). o T proportion of the population that is he diseased (at certain point of time) is called prevalence. o T new cases occurring is called he incidence.
  • 24. Dichotomous variables P revalence= All cases (new or old)/ risk population at Incidence= New cases/ total population at risk
  • 25. P robability and Odds o Odds= chance o In a population of 1000, 200 has a certain disease. o W hen we randomly take one person out, the probability that this person is diseased= 200/ 1000= 0.2 (this is probability) o T chance (the Odds) that is person is he diseased= probability of having the disease / probability of not having the disease. o Odds= P (probability of disease)/ probability of not having the disease (1-P / = )=P 1-P 0.2/ 0.8=1/ the odds are 1 to 4. 4,
  • 26. T following table depicts the outcomes of isoniazid/ he placebo trail among children with H (death within 6 months IV Dead (within 6 (months Alive Total Placebo 21 110 131 Isoniazid 11 121 132 Interventions W hat is the risk of ?dying Risk=21/ 131=0.160 Risk=11/ 132=0.083 Absolute risk difference (ARD)=risk in placebo-risk in isoniazid= 0.077 Net relative risk (NRR)=risk in placebo/ risk in isoniazid= 1.928 Relative risk reduction (RRR)=risk in placebo-risk in isoniazid/ risk in placebo= 0.48 Number needed to treat (NNT )=1/ ARD=1/ 0.077=13
  • 27. )Odds ratio (OR o An odds ratio (OR) is a measure of association between an exposure and an outcome. o The OR represents the odds that an outcome will occur given a particular exposure, compared to the odds of the outcome occurring in the absence of that exposure. o Odds ratios are most commonly used in case-control studies, however they can also be used in crosssectional and cohort study designs as well (with some modifications and/or assumptions).
  • 28. B asic structure of case-control design PoPulation Diseased Unexposed to factor (b) Diseased (cases) Sample The Odds “ chance of exposure Is calculated between both groups E xposed to factor (a) Disease-free E xposed to factor (c) Disease-free (controls) Unexposed to factor (d) P time ast T race P resent time Starting point
  • 29. Calculation Case control study Diseased Exposed Cases+ exposed ((a Exposed+ not (diseased (b a+b Cases-not ( exposed (c Not exposed+ not (diseased (d c+d Non-exposed None Odds ratio= a/ d= ad/ c÷b/ bc Prevalence among the diseased/ prevalence among the non-diseased OR=1 Exposure does not affect odds of outcome OR>1 Exposure associated with higher odds of outcome OR<1 Exposure associated with lower odds of outcome Total
  • 30. Odds ratio Case control study Lung cancer Smoking a-80 b-30 110 c-20 d-70 90 None 80x70=5600 30x20=600 9.3=5600/600 No lung cancer Or 80/ 20÷30/ 70=9.3 Total
  • 31. B asic Structure of cohort study Diseased Disease-free The Relative Risk is calculated for exposure Develop )Disease (a Sample E xposed to factor Develop )Disease (c -Disease free Unexposed to factor P resent time Starting point Disease-free )b( F ollow Disease-free )d( Future tim e Comparing the incidence of disease in each group P opulation
  • 32. )Relative risk (RR Mammography Breast cancer No breast cancer Total Positive a-10 b-90 100 Negative c-20 d-998980 100,100 In Cohort design )RR= a/ (a+b)÷c/ (c+d 500 =0.1/0.0002=(100,100)20÷ (100)/10
  • 33. Coh ort stu dy )T relative risk (RR he L ung cancer Smokers Non 18 6 No lung cancer 582 1194 Risk for smokers=18/600=0.03 Risk for non-smokers=6/1200=0.005 RR=0.03/0.005=6 T otal 600 1200
  • 34. Cas ec ont rol stu dy )T Odds ratio (OR he L ung cancer Smokers Non 80 20 No lung cancer 30 70 Odds for smokers=80/30=2.67 Odds for non-smokers=20/70=0.29 OR=80* 70/30* 20=9.33 T otal 110 90
  • 35. Assignment I (.Table 1 Basic characteristics for the patients examined (N=278 Baseline characteristics 1996 )%(Men- 1 )%(Insulin users- 2 )%(Smokers- 3 )%(Ex-smokers- 4 )%(Non-smokers- 5 (Age in years (mean ±SD- 6 (Systolic Blood pressure at starting point mmHg (mean ±SD- 7 (Systolic blood pressure two years mm Hg (mean ±SD- 8 (Duration of diabetes (median/Quartiles 1st -3rd- 9 Missed values- 10 (Total (N=278 52.2 25.5 23.0 28.1 48.9 ±11.74 67.24 ±22.00 151.20 ±29.1 153.83 )2.75-12.25( 6.0 --
  • 36. 2a Smoking histroy (all subjects) 60 50 49 40 30 28 23 Percent 20 10 0 never smoking history stopped smoking yes
  • 37. 2b Smoking history by sex 100 80 83 60 44 40 38 Percent SEX 20 18 male 11 0 never smoking history stopped smoking 7 yes female
  • 38. 3a Age using Bar (mean used as summary) 70 69 68 Mean age (years) 67 66 65 64 male SEX female
  • 39. Boxplot age by Sex 3b 120 100 80 60 age (years) 40 20 195 0 N= This graph gives check for Data distribution and checking SEX for outliers 145 133 male female
  • 40. Height of the included subjects 4a Median=170.55 cm 50 40 30 20 Std. Dev = 8.89 10 Mean = 170.5 N = 278.00 0 5 7. 19 .0 5 19 .5 2 19 .0 0 19 .5 7 18 .0 5 18 .5 2 18 .0 0 18 .5 7 17 .0 5 17 .5 2 17 .0 0 17 .5 7 16 .0 5 16 .5 2 16 .0 0 16 .5 7 15 .0 5 15 .5 2 15 .0 0 15 height (cm)
  • 41. Duration of diabetes 4b 80 Median=6.0 years 70 60 50 40 30 20 Std. Dev = 6.96 10 Mean = 7.9 0 N = 278.00 0.0 5.0 2.5 10.0 7.5 15.0 12.5 duration of diabetes 20.0 17.5 25.0 22.5 30.0 27.5 32.5
  • 42. syst. blood pressure at sta rt Valid -5a Using F requency table: P 95≈189-190 100 110 112 115 116 120 121 122 124 125 127 130 131 132 134 135 136 137 139 140 141 144 145 147 148 150 151 151 152 153 155 158 160 161 162 163 164 165 167 168 170 171 172 175 176 177 178 179 180 182 184 185 187 189 190 194 195 200 205 209 210 216 220 Total Frequency 1 1 2 1 2 21 2 1 1 6 1 16 1 2 1 11 1 2 1 28 2 4 12 1 1 31 1 23 1 1 2 1 21 1 1 1 1 5 1 2 14 1 2 4 1 1 1 2 14 2 1 1 1 1 6 1 1 2 1 1 3 1 1 278 Percent .4 .4 .7 .4 .7 7.6 .7 .4 .4 2.2 .4 5.8 .4 .7 .4 4.0 .4 .7 .4 10.1 .7 1.4 4.3 .4 .4 11.2 .4 8.3 .4 .4 .7 .4 7.6 .4 .4 .4 .4 1.8 .4 .7 5.0 .4 .7 1.4 .4 .4 .4 .7 5.0 .7 .4 .4 .4 .4 2.2 .4 .4 .7 .4 .4 1.1 .4 .4 100.0 Valid Percent .4 .4 .7 .4 .7 7.6 .7 .4 .4 2.2 .4 5.8 .4 .7 .4 4.0 .4 .7 .4 10.1 .7 1.4 4.3 .4 .4 11.2 .4 8.3 .4 .4 .7 .4 7.6 .4 .4 .4 .4 1.8 .4 .7 5.0 .4 .7 1.4 .4 .4 .4 .7 5.0 .7 .4 .4 .4 .4 2.2 .4 .4 .7 .4 .4 1.1 .4 .4 100.0 Cumulative Percent .4 .7 1.4 1.8 2.5 10.1 10.8 11.2 11.5 13.7 14.0 19.8 20.1 20.9 21.2 25.2 25.5 26.3 26.6 36.7 37.4 38.8 43.2 43.5 43.9 55.0 55.4 63.7 64.0 64.4 65.1 65.5 73.0 73.4 73.7 74.1 74.5 76.3 76.6 77.3 82.4 82.7 83.5 84.9 85.3 85.6 86.0 86.7 91.7 92.4 92.8 93.2 93.5 93.9 96.0 96.4 96.8 97.5 97.8 98.2 99.3 99.6 100.0
  • 43. (p95, p5= M ean± Z score (probability) at the specified percentiles * (Standard deviation Probability distribution of the normal curve: page 180 -/-52 P95 SB P1= 151.2+1.645(22.0)=187.4 mmH g
  • 44. 5b-1 P5 for duration of diabetes duration of diabetes Valid 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 25 26 27 28 31 32 Total Frequency 12 35 22 21 24 20 23 19 6 6 6 13 2 7 6 5 11 8 6 5 3 5 2 2 3 1 1 2 1 1 278 Percent 4.3 12.6 7.9 7.6 8.6 7.2 8.3 6.8 2.2 2.2 2.2 4.7 .7 2.5 2.2 1.8 4.0 2.9 2.2 1.8 1.1 1.8 .7 .7 1.1 .4 .4 .7 .4 .4 100.0 Valid Percent 4.3 12.6 7.9 7.6 8.6 7.2 8.3 6.8 2.2 2.2 2.2 4.7 .7 2.5 2.2 1.8 4.0 2.9 2.2 1.8 1.1 1.8 .7 .7 1.1 .4 .4 .7 .4 .4 100.0 Cumulative Percent 4.3 16.9 24.8 32.4 41.0 48.2 56.5 63.3 65.5 67.6 69.8 74.5 75.2 77.7 79.9 81.7 85.6 88.5 90.6 92.4 93.5 95.3 96.0 96.8 97.8 98.2 98.6 99.3 99.6 100.0
  • 45. :Or using the formula M ean-Z score (1.645)* SD =-3.6 years
  • 46. Total population n=287, μ=67.24 years σ11.743 - +
  • 47. Sample no. Mean 1 67.6 12.07 2 67.13 11.81 3 67 11.98 4 67.8 11.63 5 66.33 11.44 6 67.44 11.95 7 67.84 12.42 8 66.59 11.36 9 67 12 10 66.38 11.9 11 68.06 12.06 12 67.61 11.02 13 67.31 11.33 14 66.44 11.91 15 66.87 11.26 16 66.8 11.5 17 66.73 12.37 18 66.38 11.77 19 67.03 11.22 20 66.58 12.13 21 66.81 11.55 22 66.58 12.21 23 67.2 11.61 24 66.48 11.48 25 67.53 12.1 26 67.58 10.6 27 67 11.91 28 67.31 11.59 Age in years SD 28 samples of 150 from a total population of 287 26 27 28 80 1 2 3 60 25 24 4 5 40 6 23 20 7 Sample no. 22 0 8 Mean 9 SD 21 20 19 10 11 18 17 16 15 14 13 12
  • 48. Population and Sample o In scientific research we want to make a statement (conclusion) about the population. o Studying the whole population is impossible in terms of money/time/labor. o Random sampling from the population and infer from the sample data the needed conclusions. o The task of statistics is to quantify the uncertainty (the sample is really representing that population).
  • 49. The concept of sampling Study population: You select a few sampling units Sam pling units from the study population You make an estimate “prediction” extrapolated to the study population (prevalence, outcomes etc.) Sample You collect information from these people to find answers to your research questions.
  • 50. What would be the mean systolic blood pressure ?of older subjects (65+) in Al Hassa 175 P opulation mean ( μ)= unknown 165 180 155 F rom our sample we calculate an estimate of the population parameter
  • 51. T good sample (the he (estimator : Should be :Unbiased The mean of sample = population mean )Precise: (narrow dispersion about the mean The dispersion in repeated samples is small This is a dream
  • 52. Sampling error F our individuals A, B C, D , A = 18 years B 20 years = C= 23 years D= 25 years T heir mean age is = 18+20+23+ 25= 86/ 21.5 years (population mean μ). 4=
  • 53. P robability of sampling two individuals: (6 probabilities) A+B =18+20= 38/ 2=19.0 years A+C= 18+23=20.5 years. Sampling error= population mean-sample mean A+D=18+25=21.5 years. = ranges from -2.5 to +2.5 years. B +C=20+23=21.5 years. B +D=20+25=22.5 years. C+D=23+25=24.0 years. P robability of sampling three individuals: (4 probabilities) A+B +C=18+20+23=20.33 years. E rror = ranges from -1.17 to +1.7 years. A+B +D=18+20+25=21.00 years. A+C+D=18+23+25=22.00 years. B +C+D=20+23+25=22.67 years. If C=32 (instead of 23) years and D=40 (instead of 25) years: sampling of 2= sampling error of -7.00 to +7.00 and in 3= -3.67 to +3.67 years. T greater the variability of a given variable the larger the sampling he error for a given sample size.
  • 54. )Infinite samples should represents the population it came from (good estimator
  • 55. 2 o T normal distribution he o T Standard error of the mean he o E stimation: Reference interval Confidence intervals F mean or proportion Difference between means/ proportions RR and OR
  • 56. Norm Distribution: al M any human traits, such as intelligence, personality, and attitudes, also, the weight and height, are distributed among the populations in a fairly normal way. 56 ١٤٣٥/٠٢/٦
  • 57. T normal distribution he (within between μ ±1 SD (σ ±68% (within between μ ±2 SD (σ ±95% SDs Definite outliers 3< 2SDs Possible outliers<
  • 58. One more T Z score which measures how many standard he deviations a particular data point is above or below the mean. oUnusual observations would have a Z score over 2 or under 2 SD. oE xtreme observations would have Z scores over 3 or under 3 SD and should be investigated as potential outliers. Z = X1 − χ s
  • 59. .Areas under the standard normal curve Z ±0.1 ±0.2 ±0.3 ±0.4 ±0.5 ±0.6 ±0.7 ±0.8 ±0.9 ±1 ±1.1 ±1.2 ±1.3 ±1.4 ±1.5 ±1.6 ±1.645 ±1.7 ±1.8 ±1.9 1.96 ±2 ±2.1 ±2.2 ±2.3 ±2.4 ±2.578 Area under curve between both points ((around the mean 0.080 0.159 0.236 0.311 0.383 0.451 0.516 0.576 0.632 0.683 0.729 0.770 0.806 0.838 0.866 0.890 0.900 0.911 0.928 0.943 0.950 0.954 0.964 0.972 0.979 0.984 0.99 B eyond both points (two tails) B eyond one point (one tail) 0.920 0.841 0.764 0.689 0.617 0.549 0.484 0.424 0.368 0.317 0.271 0.230 0.194 0.162 0.134 0.110 0.100 0.089 0.072 0.057 0.050 0.046 0.036 0.028 0.021 0.010 0.004 0.4600 0.4205 0.3820 0.3445 0.3085 0.2745 0.2420 0.2120 0.1840 0.1585 0.1355 0.1150 0.0970 0.0810 0.0670 0.0550 0.0500 0.0445 0.0360 0.0290 0.0250 0.0230 0.0180 0.0140 0.0105 0.0100 0.0020
  • 60. Calculating values from Z-scores (.Xi = Mean± Z (standard deviation (Value (percentiles) =M ean± Z score* (SD
  • 61. Random sample for estimating a population mean X1=128 ?μ X2=133 X3=129 F rom the information in the sample, we will estimate the unknown (population mean (X is an estimator for μ ?W hat could have happened if we had another random sample ?W hat is the measure of variation of sample means
  • 62. T Sampling Distribution of a Sample Statistics he ≈ L et’s assume that we want to survey a community of 400, the age of them were recorded and having the following parameters: µ = 35 years σ = 13 years ≈ L et’s assume, however, that we do not survey all 400, instead we randomly select 120 people and ask them about their ages and calculate the mean age. ≈ T hen, we put them back into the community and randomly select another 120 residents (may include members of the first sample). W did this over and over and each time we calculate the mean e age. T results will be like those in the following table. he ≈ ≈
  • 63. Sample Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 SD of the means Sample mean 34.7 35.9 35.5 34.7 34.5 34.4 35.7 34.6 37.4 35.3 34.1 35.5 34.9 36.2 35.6 35.0 35.1 36.4 35.6 33.6 13.37 Distribution of 20 random sample means ((n=20 μ ..… ..… . .… . . 33 34 35 36 .. . 37 All the results are clustered around the population value (35 years), with a few scores a bit further out and one extreme score of 37.4 (.years (random variation=1/ 20=5% ,T hose 400 people have age range from 2 to 69 years while the means of the samples have a very narrow range of value of about 4 years and 10 (.samples coincide with the population mean (35 years
  • 64. M of the samples will cluster around the population ost parameters with occasional sample result falling relatively further to one side or the other of the distribution (this called the sam pling distribution of (.sam m ple eans :H the following properties as T mean of the sampling distribution is equal to the he population mean, the average of the averages (µχ) will be the same as the population mean. T standard deviation of the sample means = the he standard error SE σ/ n, (σ= population SD). = √ T distribution of the sample means is Normal if the he population distribution is Normal. If the population distribution is Not Normal, T he distribution of the sample means is almost Normal when n is large (Central L imit T heorem).
  • 65. Standard error of the mean P opulation P arameters M ean S.D Sample mean Sample M ean S.D The degree the sample statistics are deviating /different .from the population parameters T term error indicates the fact that due to sampling error, he each sample mean is likely to deviate some what from true population mean.
  • 66. Central L imit T heorem .T formula for SE= SD/ he √n T formula indicates that we are estimating the SE given he .the S.D of a sample of size n .For a sam of 100 a S.D of 40 the SE= 40 / ple nd √100 = 4 .For a sam of 1000 and S.D of 40 the SE= 40 /√1000 = 1.26 ple T factors influence the SE sample size and S.D of the wo , :sample . Sample size has greater impact as it is used a denominator .For a sam of 100 a S.D of 20 the SE = 20 /√100 = 2 ple nd .For a sam of 100 a S.D of 40 the SE = 40 /√100 = 4 ple nd If there is more variability within a sample the greater the .SE
  • 67. (Confidence Interval (CI A confidence interval gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data.
  • 68. W need to know the smallest and the largest μ (range) we think is likely e using sample statistics. T mean of sample = μ he
  • 69. c= level of confidence Z c= Z critical values (under ( normal curve 90% 95% 99% 1.645 1.960 2.578 σ  χ±Ζ   c  n (C.I= Mean of the sample ±Z critical scores (SEM SEM= SD/√n
  • 70. C.I • The confidence interval provides a range that is highly likely (often 95% or 99%) to contain the true population parameter that is being estimated. • The narrower the interval the more informative is the result. • It is usually calculated using the estimate (sample mean) and its standard error (SEM).
  • 71. CI for μ Systolic blood pressure in 287 diabetic patients Descriptives syst. blood pressure at start syst. blood pressure at start Mean 90% Confidence Interval for Mean Lower Bound Upper Bound 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Descriptives Interquartile Range Skewness Mean Kurtosis 90% Confidence Lower Bound Interval for Mean Upper Bound 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis Statistic 151.20 149.02 Std. Error 1.319 153.38 (C.I= 151.20±1.65(21.997/ 287 90% √ C.I=149.02-153.38 mmH g 150.30 150.00 483.880 21.997 100 220 120 30.00 Statistic .540 155.06 .152 149.92 Std. Error .146 3.064 .291 160.20 154.72 151.20 460.033 21.448 115 205 90 30.00 .263 -.506 Random sample of 50 out of 287 .340 .668
  • 72. Descriptives syst. blood pressure at start Mean 95% Confidence Interval for Mean Lower Bound Upper Bound 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis Statistic 151.20 148.60 Std. Error 1.319 153.80 150.30 150.00 483.880 21.997 100 220 120 30.00 .540 .152 (C.I=151.20±1.96(21.997/ 287 95% √ C.I=148.60-153.80 mmH g .146 .291 Descriptives syst. blood pressure at start Mean 95% Confidence Interval for Mean 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis Lower Bound Upper Bound Statistic 155.06 148.90 Std. Error 3.064 Random Sample of 50 out of 287 161.22 154.72 151.20 460.033 21.448 115 205 90 30.00 .263 -.506 .340 .668
  • 73. Descriptives syst. blood pressure at start Mean 99% Confidence Interval for Mean Lower Bound Upper Bound 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Descriptives Skewness Kurtosis syst. blood pressure at start Mean 99% Confidence Interval for Mean 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis Lower Bound Upper Bound Statistic 151.20 147.78 154.62 150.30 150.00 483.880 21.997 100 220 120 30.00 .540 .152 Statistic 155.06 146.84 Std. Error 1.319 99% (C.I=151.20±2.58(21.997/ 287 √ C.I=147.78-154.62 mmH g .146 .291 Std. Error 3.064 163.28 154.72 151.20 460.033 21.448 115 205 90 30.00 .263 -.506 Random sample of 50 out of 287 .340 .668
  • 74. (C.I= 151.20±1.65(21.997/ 287 90% √ C.I=149.02-153.38 mmH g (C.I=151.20±1.96(21.997/ 287 95% √ C.I=148.60-153.80 mmH g 99% (C.I=151.20±2.58(21.997/ 287 √ C.I=147.78-154.62 mmH g W hat does this mean? It means that if the same population is sampled on numerous occasions and interval estimates are made on each occasion, the resulting intervals would bracket the true population parameter (ranged) in approximately 90, 95 and 99 % . of the cases
  • 75. T sample distribution of a proportion he µp =π SE ( p ) = p (1 − p ) n p =K / n CI p = p ±1.96( SE ) Z critical score equal 95%
  • 76. Smokers among diabetics Sample=400 Smokers=40 P=40/400=0.1 SE (p) = √0.1-0.9/400=0.015 CI p 95%= 0.1±1.96(0.015) [0.07-0.13] for % it is the same SE=1.5% C.I=[7-13]
  • 77. CI for the difference between two 95% (means (μ1-μ2 Smoke No Yes Difference n Mean SBP (SE (mean 214 64 153.1 144.8 8.3 1.50 2.62 χ1 − χ 2 ± 1.96 * SE ( χ1 − χ 2 ) SE = ( SE ( χ1 )) 2 + ( SE ( χ 2 )) 2 C.I= 2.4 to 14.2
  • 78. CI for percentage 95% (Smoke (n died% SE (No (212 28.8 3.11 (Yes (64 23.4 5.30 Pns − Ps ± 1.96 * SE ( Pns − Ps ) Difference= 5.4%  P1 × (100 − p1) P2 × (100 − p 2)  SE =  +  n1 n2   C.I=-6.7% to 17.4% 95%
  • 79. CI for RR and OR 95% Use available software http:/www.medcalc.org/ / calc/ odds_ratio.php http:/www.medcalc.org/ / calc/ relative_risk.php vl.academicdirect.org/applied_statistics/.../CIcalculator.xls
  • 80. Assignment II
  • 81. Inferential Statistics Testing in research o In scientific research we would like to test if our research ideas are true. o Based on previous observations (studies) we know that the mean cholesterol of patients with diabetes is higher than those without the disease. o We will take samples and check whether the results will agree with our expectations. o Meaning we are going to test the situation using a statistical test.
  • 82. The Z-test for one sample (Serum cholesterol (μ=5 mmol/ L Diabetic patients, mean cholesterol > 5 σ=±1.5 ?Considering σ=±1.5 Is there any difference between diabetes free population and the diabetic patients . regarding serum cholesterol? Let’ s perform Z test
  • 83. (Research question (hypothesis T research hypothesis would be he The mean cholesterol of diabetics is > 5mmol/L Null hypothesis H0: μ=sample mean=5 Alternative hypothesis (H1: μ >5 (one sided Or (H1: μ≠5 (two sided
  • 84. P rocedure μ=5 Mean of sample Cholesterol level diabetic patients in mmol/L 60 If the sample mean close to the population mean The null hypothesis is TRUE 50 40 If the sample mean differs from population mean We REJECT the null 30 20 Std. Dev = 1.33 10 Mean = 6.25 N = 278.00 0 0 .0 13 0 .0 12 0 .0 11 0 .0 10 00 9. 00 8. 00 7. 00 6. 00 5. 00 4. 00 3. total cholesterol
  • 85. T ά level (P he (value T probability to obtain / he achieve the null hypothesis T probability that P he opulation mean=sample mean T here no difference between the population and .sample mean Or The maximum probability we accept to reject the null hypothesis falsely ά = 0.05
  • 86. (P > 0.05 (ά Accept the null Sample mean= population mean (P ≤ 0.05 (ά Reject the null Sample mean≠ population mean Alpha level
  • 87. (Calculation (σ=1.5 SE =μ/ n=0.3 M √ Z=(mean sample-μ)/ σ P (mean of the sample≥6)=P ≥6-5)/ (Z 0.3= 0.0005 Under the normal curve area of rejection >1.96 Z : P=0.0005 T cholesterol blood level of diabetic patients can coincide he with the population (disease free) 5 in 10,000 times T two values could be the same in 5 times if we repeated this test 10,000 tim he P < 0.05 so we reject the null T diabetics have larger mean cholesterol level than the normal population he
  • 88. In reality It is unlikely that the σ (population SD) is known. In most of the cases, σ will be unknown and we will be able to apply neither the formula nor the table of normal distribution (areas under the curve=Z score). We resort to other statistical tests.
  • 89. P ossible situations in testing
  • 90. Possible situations in Hypothesis testing Level of significance Reality Decision Reject H0 (Type I error (ά H0 is true H0 is not true Do not reject H0 (OK (1-ά )OK (1-В )Type II error (В В= Power-1 It is the probability to reject the null hypothesis if is NOT T RUE Usually 80% is the least required for any test
  • 91. Errors of Hypothesis Testing and Power Conclusion from hypothesis testing Decisions and errors in hypothesis testing True Situation (Difference exist (H ) 1 No difference (H 0 Study results Correct decision Difference exist Reject H 0 No difference Do not reject H 0 (power or 1-β ) T ype II or β error F alse acceptance T here is no difference when it is really .present T ype I error or ά Rejection when it is true F alse rejection T here is a difference when it is really not Correct decision
  • 92. P assive smoking and lung cancer T ruth about the population Conclusions, based on results from a study of a sample of the population Reject the null hypothesis (rates in the study appear to (be different Accept the null hypothesis (rates in the study appear (similar P assive smoking is related to lung .cancer Not related to .lung cancer T ype I E rror Incorrect rejection P assive sm oking is related to lung cancer when it is really not.. T ype II E rror Incorrect acceptance P assive sm oking is not related to lung cancer when it is reallydoes.
  • 93. The Alpha-Fetoprotein (AFP) test has both Type I and Type II error . possibilities This test screens the mother’ s blood during pregnancy for AFP and . determines risk . Abnormally high or low levels may indicate Down syndrome Ha: patient is unhealthy H0: patient is healthy Error Type I (False positive or False Rejection) is: Test wrongly indicates that patient has a Down syndrome, which means that pregnancy must be .aborted for no reason Error Type II (False negative or False Acceptance) is: Test is negative and the child will be born with multiple anomalies
  • 94. HypotHesis test This is the distribution given the null hypothesis is true
  • 95. type i and type ii error False acceptance False rejection
  • 96. one sample The distribution of X under the null and alternative hypotheses.
  • 97. t-distribution In real life situations we will estimate the unknown population SD . using Sample SD Results are standardized to :the t-distribution Z test for normal distribution The population SD is known χ −µ t= s n Z= χ −µ σ n
  • 98. t-distribution df=No. of observations (sample size)-1 Heavier tails than the Z distribution
  • 99. (Degree of freedom (df For all sample statistics: variance, SD, we used n-1 All the observations in any given sample are free .except one= Complementary effect
  • 100. Degree of freedom total =50 12 restricted 16 7 df = n-1 15
  • 101. t-distribution
  • 102. t-test-steps to determine the statistical difference W hen? descriptive statistics: mean ± standard deviation Number of samples One sample vs. population mean t = χ − µ / SD n T independent wo samples 2 SD12 SD2 χ1 − χ 2 / + n1 n2 T dependent (two paired): Repeated measures tMatched pairs d− dependent = SE ( d −) Steps: 1- State the hypothesis to be tested: Null (non-directional-two tailed) mean= mean Alternative (unidirectional-one tail) mean ≠ mean 2- F the calculated t value: using the formulae. ind 3- F the degree of freedom: all = n-1 (two sample independent df=n1-1+n2-1 ind (n1+n2-2). 4- F the P value using the tables of t-distribution. ind 5- Conclude: if < 0.05 = rejection. If > 0.05 the null is accepted.
  • 103. t-test (student’s t-test) one sample t = χ − µ / SD n ?Using diabetes data: Is the mean age of diabetics > 65 years H0:μ=65 H1:μ≠65 t one sample =67.24-65/SD/√n=3.18 t distribution P=0.002 Reject the null Diabetics are significantly older than 65 years Statistics age (years( N Mean Std. Error of Mean Std. Deviation Variance Valid Missing 278 0 67.24 .704 11.743 137.902
  • 104. (P value (two sided One-Sample Test Test Value = 65 age (years( t 3.182 df 277 Sig. (2-tailed( .002 Mean Difference 2.24 95% Confidence Interval of the Difference Lower Upper .85 3.63 Degree of freedom Assuming that the distribution of age is normal ( Population SD is unknown (σ
  • 105. t-test for comparison of means of two independent samples H0: Smoking has no effect on systolic blood pressure Mean S= Mean NS or Mean S-mean NS=0 H1: smoking has an effect Mean S≠ Mean NS or Mean S-Mean NS≠0 :Assumptions •Independent observations (2 samples) •Normally distributed •Equal variances (for the pooled t-test)
  • 106. T hree formulae Expected difference if H0 is true Standardized t = χ −χ −0 1 2 2 S12 S2 + n1 n2 If SDs are equal t= χ1 − χ 2 2 Sp n1 + SD of the difference t= 2 Sp n2 2 (n1 − 1) S12 + (n2 − 1) S 2 S = (n1 − 1) + (n2 − 1) 2 p If SDs are not equal χ1 − χ 2 2 1 2 2 S S + n1 n2 Pooled SD Decision based on L evene’s test
  • 107. Variances are apparently equal Group Statistics syst. blood pressure at start SMOKING no smokers N Mean 153.11 144.82 214 64 Std. Deviation 21.995 20.934 Std. Error Mean 1.504 2.617 Independent Samples Test Levene's Test for Equality of Variances F syst. blood pressure at start Equal variances assumed Equal variances not assumed Sig. .006 .936 t-test for Equality of Means t Sig. (2-tailed( df Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper 2.674 276 .008 8.29 3.100 2.188 14.392 2.747 107.982 .007 8.29 3.018 2.308 14.272 Two separate t-test Not significant it means equal variances P value <0.05, reject H0
  • 108. Paired t-test If we have paired data (two repeated measurements on the same subjects) or before and after If the difference of the paired observations are Normally distributed.
  • 109. (P aired samples (dependent • • (P aired /dependent 2-sample t-test) To compare observations collected form the same group of individuals on 2 separate occasions (dependent observations or paired samples). T paired t statistics is calculated by: he - Calculate the difference between the 2 measurements taken on each individual. md - Calculate the mean of the differences. - Calculate the SE of the observed differences. SE d - Under the null hypothesis of no difference or difference = 0, the paired t statistic takes the form. md - 0 - t= Mean difference / SE of the difference. t= SEd - It has a normal distribution with degrees of freedom = (n-1)
  • 110. E xample F our students had the following scores in 2 subsequent tests. Is there a significant difference in their performance? Number Name T 1 est T 2 Dif est 1 Mike 35% 32- 67% 2 Melanie 50% 4 46% 3 Melissa 90% 4 86% 4 Mitchell 78% 13- 91% S D Dif = 17.152, SE Dif = 8.58Mean Dif = -9.25, Calculated Paired t = -9.25/8.58 = -1.078, df=n-1 = 3 md - 0 t= SEd
  • 111. df P value0.01 Level of significance for one-tail test 0.05 0.02 0.01 0.005 Level of significance for two-tail test 0.20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 35 50 ∞ 0.10 0.05 0.02 0.01 3.078 1.886 1.638 1.533 1.476 1.440 1.415 1.397 1.383 1.372 1.363 1.356 1.350 1.345 1.341 1.340 1.333 1.330 1.328 1.325 1.323 1.306 1.299 1.282 6.314 2.920 2.353 2.132 2.015 1.943 1.895 1.860 1.833 1.812 1.796 1.782 1.771 1.761 1.753 1.746 1.740 1.734 1.729 1.725 1.721 1.690 1.676 1.645 12.706 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 2.228 2.201 2.179 2.160 2.145 2.131 2.120 2.110 2.101 2.093 2.086 2.080 2.030 2.009 1.960 31.821 6.965 4.541 3.747 3.365 3.143 2.998 2.896 2.821 2.764 2.718 2.681 2.650 2.624 2.602 2.583 2.567 2.552 2.539 2.528 2.518 2.438 2.403 2.326 63.657 9.925 5.841 4.604 4.032 3.707 3.499 3.355 3.250 3.169 3.106 3.055 3.012 2.977 2.947 2.921 2.898 2.878 2.861 2.845 2.831 2.724 2.678 2.576 T P value = 0.20, the null is accepted! he
  • 112. Conclusion T observed difference can be he encountered in 36 (actual P value =0.362 out of 100 cases. i.e. we accept the null hypothesis of no difference between first and 2nd test.
  • 113. Paired Samples Statistics Mean Pair 1 syst. blood pressure at start syst. blood pressure after 2 years Std. Deviation N Std. Error Mean 151.20 278 21.997 1.319 153.83 278 29.076 1.744 Paired Samples Test Paired Differences Mean Pair 1 syst. blood pressure at start - syst. blood pressure after 2 years -2.63 Std. Deviation Std. Error Mean 17.920 1.075 95% Confidence Interval of the Difference Lower Upper -4.74 -.51 t -2.443 df Sig. (2-tailed( 277 .015
  • 114. T of significance est Interval/ ratio data P arametric assuming normal distribution (Known Population Variance (σ One sample Z-test Z test, rejection limit > ±1.96 χ−µ Z= σ n One sample vs. population One sample t-test Unknown Population Variance t-test Reject if P ≤ 0.05 Number of samples T samples wo Dependent t-paired test Independent t-test independent
  • 115. The Chi-Square test χ 2 Used for hypothesis testing for categorical variables M any types depends on design, distribution of variables and objectives of testing
  • 116. χ 2 :E xample Vaccination against Influenza deceases the risk .to get the disease :Study Compare the effectiveness of 5 vaccines with .respect to the probability to get influenza (Comparison will be in respect to a nominal variable (getting influenza: yes or no
  • 117. Effectiveness of Five Vaccines Data cross tabulated 2X5: response variable: Influenza Frequency within Vaccines% Vaccines Influenz a No Influenz a Yes T otal Vaccines Influenz a No Influenz a Yes T otal 1 2 3 4 5 237 198 245 212 233 43 52 25 48 57 280 250 270 260 290 1 2 3 4 5 84.6 79.2 90.7 81.5 80.3 15.4 20.8 9.3 18.5 19.7 100 100 100 100 100 T otal 1125 225 1350 T otal 83.3 16.7 100 T probability to get influenza he he null hypothesis states that the probability to get influenza is independent of the vaccin T alternative states that a dependency exists he
  • 118. Effectiveness of Five Vaccines :If H0 is true =The probability to influenza in every group should be the same , the probability in the total population (Equal to: 225/1350=0.167 (16.7% , Vaccine 1 used in 280, if H0 is true .we expect that 16.7% (≈47) to get influenza However this is not true
  • 119. Expected frequencies F any cell: E or xpected F requency= Row total* column total/grand total Vaccines Observed-1 E xpected Observed-2 E xpected Observed-3 E xpected Observed-4 E xpected Observed-5 E xpected T otal Influenz a No Influenza Yes T otal 237 233.3 198 208.3 245 225.0 212 216.7 233 241.7 43 46.7 52 41.7 25 45.0 48 43.3 57 48.3 280 1125 225 1350 Column total 250 Row total 280X225/1350 270 260 1125/1350*260 290 Grand total
  • 120. Pearson Chi-square test .Calculate the expected frequencies (assuming H0 is true) for all the ten cells Calculate Chi square: Of = observed frequency Ef = Expected frequency χ =∑ 2 (O f − E f ) 2 Ef Reject H0 if χ2 is large Use the Chi-square distribution (After determining the degree of freedom (df (df= (r-1)* (c-1
  • 121. Chi-square distribution
  • 122. Critical values for Chi-square df Level of Significance 0.99 1 2 3 4 5 . . 30 0.90 0.70 0.50 0.30 0.20 0.10 0.05 0.01 0.001 0.00016 0.0201 0.115 0.297 0.554 0.0158 0.211 0.584 1.064 1.610 0.148 0.713 1.424 2.195 3.000 0.455 1.386 2.366 3.357 4.351 1.074 2.408 3.665 4.878 6.064 1.642 3.219 4.642 5.989 7.289 2.706 4.605 6.251 7.779 9.236 3.841 5.991 7.815 9.488 11.070 6.635 9.210 11.341 13.277 15.086 10.827 13.815 16.268 18.465 20.517 14.953 20.599 25.508 29.336 33.530 36.250 40.256 43.773 50.892 59.703 χ2critical= 9.488 Calculated=16.555 df=(2-1)(5-1)=4 P=0.002 There is a relation (dependence) between type of vaccine and influenza prevention
  • 123. SMOKING * SEX Crosstabulation SEX male SMOKING no smokers Total 90 42.1% 55 85.9% 145 52.2% female 124 57.9% 9 14.1% 133 47.8% Total 214 100.0% 64 100.0% 278 100.0% Exact Sig. (2-sided( Exact Sig. (1-sided( .000 Count % within SMOKING Count % within SMOKING Count % within SMOKING .000 Chi-Square Tests Pearson Chi-Square Continuity Correctiona Likelihood Ratio Fisher's Exact Test Linear-by-Linear Association N of Valid Cases Value 38.017b 36.279 41.649 37.880 df 1 1 1 1 Asymp. Sig. (2-sided( .000 .000 .000 .000 278 a. Computed only for a 2x2 table b. 0 cells (.0%( have expected count less than 5. The minimum expected count is 30.62. At least 80% of cells must have Ef >5
  • 124. We can’ t use Pearson Chi-square if the expected frequency is <5 In this case we use Fisher’ s Exact test
  • 125. status * SEX Crosstabulation Count SEX male status alive died from CVD other cause of death Total 24 4 2 30 female 15 1 2 18 Total 39 5 4 48 (Expected f=4*30/48=2.5 (<5 Fisher Exact test provides correction (E f=5*18/48=1.875 (<5
  • 126. Chi-Square Tests Pearson Chi-Square Likelihood Ratio Linear-by-Linear Association N of Valid Cases Value .935a .991 .004 2 2 Asymp. Sig. (2-sided( .626 .609 1 .951 df 48 a. 4 cells (66.7%( have expected count less than 5. The minimum expected count is 1.50. Chi-square is not valid
  • 127. Chi-Square Tests Pearson Chi-Square Continuity Correctiona Likelihood Ratio Fisher's Exact Test Linear-by-Linear Association N of Valid Cases 37.880 df 1 1 1 1 Exact Sig. (2-sided( Exact Sig. (1-sided( .000 Value 38.017b 36.279 41.649 Asymp. Sig. (2-sided( .000 .000 .000 .000 .000 278 a. Computed only for a 2x2 table b. 0 cells (.0%( have expected count less than 5. The minimum expected count is 30.62.
  • 128. McNemar test Paired data in a cross tabulation (eczematous persons on both arms use ointment A or B (randomized 54 Ointment B No+ Total Ointment A + No 10 5 16 23 26 28 Total 15 39 54 M cNemar test only take the discordant pairs into account Χ2=(23-10)2/23+10 df=1
  • 129. Questions
  • 130. Thank you