Medical statistics Basic concept and applications [Square one]

Medical Statistics
2013
Dr Tarek Tawfik Amin

Introduction
-

Questions
Why statistics?
The process
The resources

?How
• Book: Statistics at Square One 11th ed.
“ Campbell and Swinscow”
• SPSS Practical sessions-PASW guide.
• Practical sessions using SPSS v. 17.0

”Statistics “ an overview
Population

Parameters
Data

Analysis
Interpretation

Information

Sample

Statistics

Statistical analysis

Reference range
Researches

Data

Variables

Qualitative
Categorical

Quantitative
Numerical

Depends on the sample (s) and objectives of analysis

Interval/Ratio
Nominal

Ordinal

Tables

Discrete
Continuous

Descriptive

Graphs

Inferential

Measures

I-Descriptive Statistics
Goals
Summarizing
Overview
Data checking

diabIB
SB
P

SB
P

P NR
AT

AGE

SE
X

SM E
OK

H IGH
E T

W IGH
E T

CH
OL

H A1C
B

DIAB
DU

DE
AD

1

57

0

0

177

98

140

154

0

6.30

7.62

5

#NULL!

2

74

1

0

172

69

150

145

1

5.10

8.30

11

0

3

38

1

0

155

70

120

126

0

6.50

11.00

2

#NULL!

4

73

1

0

165

72

180

157

0

5.80

7.00

21

0

5

53

1

2

174

109

140

119

1

6.80

10.60

7

0

6

74

1

0

171

83

151

145

0

6.25

7.62

7

0

7

81

0

2

175

60

140

113

0

6.50

6.40

6

0

8

86

1

0

164

59

140

158

0

5.20

5.30

4

0

9

78

0

1

171

83

151

148

0

5.60

5.90

1

1

10

78

1

0

171

83

151

159

1

5.00

8.00

23

1

11

91

0

0

171

83

151

140

0

4.30

9.70

4

1

12

77

0

2

176

87

170

198

0

6.40

6.60

7

2

13

77

1

0

171

83

151

152

0

5.20

4.90

26

1

14

84

0

0

171

62

160

148

0

7.00

7.80

8

1

15

72

1

0

154

63

145

148

0

6.20

7.80

0

1

1

IN
2 INSUL

I-Tables
) Tables can summarize counts, frequency (categorical), measures (numerical

Contingency

Frequency

smoking history * SEX Crosstabulation
SEX

Valid

male
female
Total

Frequency
145
133
278

Percent
52.2
47.8
100.0

Count
Valid Percent
52.2
47.8
100.0

Cumulative
Percent
52.2
100.0

SEX
male
smoking
history
Total

never
stopped smoking
yes

26
64
55
145

female
110
14
9
133

Total
136
78
64
278

)For comparison (2 or more variables

Table 3 Daily servings of calcium and vitamin D rich foods in relation to body mass
. index classification of the included adults

(* F
ood items (servings/
day

Subjects classification
(Obese (N=91

Milk
Milk beverage
Milk in cereals
Milk in coffee or tea
-T
otal milk
Yoghurt
Cheese
Ice cream
-T
otal dairy
Tuna (canned)
Fish
Half cooked fish
Shrimp/oyster
Eggs
Liver (including chicken livers)
Others!
-Dietary vitamin D (IU/
day): Median (mean ±SD)
Low dietary intake c (< 200 IU/day): No. (%)
-Dietary calcium (mg/
day): Median (mean ±SD)
Low calcium intake d (<1000mg/day): No. (%)

(Non-obese (N=125

(0.71±0.3)0.52
0.45(0.59±0.4)
0.20(0.33±0.2)
0.15(0.25±0.6)
0.90(1.03±0.3)
0.10(0.12±0.6)
0.20(0.24±0.9)
0.15(0.14±0.6)
0.25(0.45±0.6)
0.05(0.03±0.1)
0.15(0.19±0.7)
0.06(0.11±0.5)
0.05(0.08±0.1)
0.85(0.81±1.1)
0.02(0.04±0.4)
0.20(0.23±0.3)
(111.6)118.1±73.5
56(62.2)
(660.0)698.8±261.9
51(56.7)

(0.88±0.7)0.65
0.35(0.53±0.4)
0.50(0.58±0.4)
0.20(0.23±0.6)
1.20(1.34±0.7)
0.20(0.14±0.5)
0.20(0.29±0.8)
0.06(0.09±0.3)
0.30(0.43±0.7)
0.03(0.04±0.3)
0.10(0.18±0.5)
0.25(0.27±0.6)
0.05(0.06±0.1)
0.80(0.76±0.7)
0.05(0.06±0.3)
0.40(0.55±0.5)
(123.7)132.2±67.4
47(37.6)
(692.0)717.9±245.9
49(39.2)

P value

0.031
0.279
0.001
0.790
0.001
0.790
0.661
0.422
0.826
0.761
0.902
0.029
0.149
0.797
0.834
0.549
0.034
0.003b
0.223
0.011b

a

Assignment I
).Table 1 Basic characteristics for the patients examined (N=278

Baseline characteristics 1996
(%)Men- 1
(%)Insulin users- 2
(%)Smokers- 3
(%)Ex-smokers- 4
(%)Non-smokers- 5
)Age in years (mean ±SD- 6
)Systolic Blood pressure at starting point mmHg (mean ±SD- 7
)Systolic blood pressure two years mm Hg (mean ±SD- 8
)Duration of diabetes (median/Quartiles 1-3- 9
Missed values- 10

)Total (N=278
52.2
25.5
23.0
28.1
48.9
±11.74 67.24
±22.00 151.20
±29.1 153.83
(2.75-12.25) 6.0
0.0

II-Graphs
Goals
Impression
Comparison
Data checking
Clustering
Trend

II- Graphs
Types of variables-1
Number of variables-2
Objectives-3

Selection of graphs
Next

Categorical

Numerical

Figure 1Outcomes of the included diabetic patients (1996)
Figure 2: Smoking status of the inlcuded diabetic patients
60

other cau se of death
M issin g

50

40

30

20

alive

10

Percent

died from CVD

0

never
smoking history

stopped smoking

yes

For numerical variables
Figure 3: Total cholesterol level in diabetic pateints 1996
in mmol/l
60
50
40
30
20
Std. Dev = 1.33

10

Mean = 6.25
N = 278.00

0

.
13

.
12

.
11

.
10

00

00

00

00

0

0

00
9.

0
8.

0
7.

0

0

00
6.

0
5.

0
4.

00
3.

total cholesterol

Figure 4: Systolic blood pressure at starting point
among diabetic patients 1996 (mmHg)
240
220

28
247
99
68
67

200

syst. blood pressure at start

180
160
140
120
100
80
N=

133

male
SEX

145

female

Figure 6: Total cholesterol level in relation to gender and
smoking status among diabetic patients 1996

95% CI total cholesterol (mmol/l)

8.5
8.0
7.5
smoking history
7.0
6.5

n ever

6.0
stopped sm oking

5.5
5.0

yes
N=

26

64

male

SEX

55

110

14

female

9

Figure 7: Duration of diabetes among the included patients 1996

Checking for normality

(in years)
80
70
60

Median=6.0

Mode=1

50

Normal distribution

40
30
20

Std. Dev = 6.96

10

Mean = 7.9

0

N = 278.00
0.0

5.0
2.5

10.0
7.5

15.0
12.5

20.0
17.5

25.0
22.5

-

30.0
27.5

32.5

+

duration of diabetes

Outliers

Mode
Median
Mean

(III-Measures (numerical variables

Central Tendency
H the data aggregate around a central point
ow

Mean
Median
Mode
P
ercentiles

Dispersion
H the data varies
ow

)Range (max-min
Inter Quartile range
Variance
Standard deviation
Variation coefficient

Central Tendency
M
ean= summation of observations/
their number
Affected by extremes of value

x1+x2+x3)/
number(

M
ode= T most frequently occurring values in a set of observations
he
M
edian= T middle value that divide the ordered data set into 50/
he
50
Not affected by extremes of values

Age of sample

3

3

7

7

37

11

M
edian=7
M
ean=(3+7+37)/
3=15.7

M
edian=7
M
ean=(3+7+11)/
3=7

Dispersion
1

1

6

8

10

16

17

23

43

53
Range=53-1=52
Affected by extremes of values

of data %25
25th percentile
1st quartile

M
edian=13
of data 50%
50th percentile=13

of the data 75%
75th percentiles
3rd quartile

Interquartile range=3rd-1st quartiles
17=23-6
IQR not affected by extremes of values

Standard deviation and variance
3

7
6-

2-

Sample of 3, their age in years

9

17
M
ean age=(3+7+17)/
3=9
8+

T sum of the differences between the mean and individual values=0
he
T mean deviation=0
he
T overcome the 0= sum the difference squared/
o
number-1= Variance
52=2/3-1(17-9)+2(6-9)+2(3-9)
)
T amount of dispersion around the mean=52 years2 (wrong scale
he
H
ence we need to convert back to the usual (natural) scale, use the standard deviation
Variance=±7.2 years√

T sample disperses around the mean (=9 years) by 7.2 years on both directions
he

Description of a binary (dichotomous
(variable
o A binary variable: H only two outcomes
as
(diseased or not diseased).
o T proportion of the population that is
he
diseased (at certain point of time) is
called prevalence.
o T new cases occurring is called
he
incidence.

Dichotomous variables
P
revalence= All cases (new or old)/ risk population
at
Incidence= New cases/
total population at risk

P
robability and Odds
o Odds= chance
o In a population of 1000, 200 has a certain
disease.
o W
hen we randomly take one person out, the
probability that this person is diseased=
200/
1000= 0.2 (this is probability)
o T chance (the Odds) that is person is
he
diseased= probability of having the disease
/
probability of not having the disease.
o Odds= P (probability of disease)/
probability of
not having the disease (1-P / =
)=P 1-P
0.2/
0.8=1/ the odds are 1 to 4.
4,

T following table depicts the outcomes of isoniazid/
he
placebo trail
among children with H (death within 6 months
IV

Dead
(within 6
(months

Alive

Total

Placebo

21

110

131

Isoniazid

11

121

132

Interventions

W
hat is the risk of
?dying

Risk=21/
131=0.160

Risk=11/
132=0.083

Absolute risk difference (ARD)=risk in placebo-risk in isoniazid= 0.077

Net relative risk (NRR)=risk in placebo/
risk in isoniazid= 1.928
Relative risk reduction (RRR)=risk in placebo-risk in isoniazid/
risk in placebo= 0.48

Number needed to treat (NNT
)=1/
ARD=1/
0.077=13

)Odds ratio (OR
o An odds ratio (OR) is a measure of association
between an exposure and an outcome.
o The OR represents the odds that an outcome will
occur given a particular exposure, compared to the
odds of the outcome occurring in the absence of that
exposure.
o Odds ratios are most commonly used in case-control
studies, however they can also be used in crosssectional and cohort study designs as well (with some
modifications and/or assumptions).

B
asic structure of case-control design

PoPulation
Diseased

Unexposed to factor
(b)

Diseased
(cases)
Sample

The Odds “ chance of exposure
Is calculated between both groups

E
xposed to factor
(a)

Disease-free

E
xposed to factor
(c)

Disease-free
(controls)

Unexposed to factor
(d)

P time
ast

T
race

P
resent time

Starting point

Calculation
Case control
study

Diseased

Exposed

Cases+ exposed
((a

Exposed+ not
(diseased (b

a+b

Cases-not
( exposed (c

Not exposed+ not
(diseased (d

c+d

Non-exposed

None

Odds ratio= a/ d= ad/
c÷b/
bc
Prevalence among the diseased/
prevalence among the non-diseased

OR=1 Exposure does not affect odds of outcome
OR>1 Exposure associated with higher odds of outcome
OR<1 Exposure associated with lower odds of outcome

Total

Odds ratio
Case control
study

Lung cancer

Smoking

a-80

b-30

110

c-20

d-70

90

None
80x70=5600
30x20=600
9.3=5600/600

No lung cancer

Or 80/
20÷30/
70=9.3

Total

B
asic Structure of cohort study
Diseased

Disease-free

The Relative Risk is calculated for exposure
Develop
)Disease (a

Sample

E
xposed
to factor

Develop
)Disease (c

-Disease
free
Unexposed
to factor

P
resent time
Starting point

Disease-free
)b(

F
ollow

Disease-free
)d(
Future tim
e

Comparing the incidence of disease in each group

P
opulation

)Relative risk (RR
Mammography

Breast cancer

No breast cancer

Total

Positive

a-10

b-90

100

Negative

c-20

d-998980

100,100

In Cohort design
)RR= a/
(a+b)÷c/
(c+d
500 =0.1/0.0002=(100,100)20÷ (100)/10

Coh
ort
stu
dy

)T relative risk (RR
he
L
ung cancer

Smokers
Non

18
6

No lung
cancer
582
1194

Risk for smokers=18/600=0.03
Risk for non-smokers=6/1200=0.005
RR=0.03/0.005=6

T
otal
600
1200

Cas
ec
ont
rol
stu
dy

)T Odds ratio (OR
he
L
ung cancer

Smokers
Non

80
20

No lung
cancer
30
70

Odds for smokers=80/30=2.67
Odds for non-smokers=20/70=0.29
OR=80* 70/30* 20=9.33

T
otal
110
90

Assignment I
(.Table 1 Basic characteristics for the patients examined (N=278

Baseline characteristics 1996
)%(Men- 1
)%(Insulin users- 2
)%(Smokers- 3
)%(Ex-smokers- 4
)%(Non-smokers- 5
(Age in years (mean ±SD- 6
(Systolic Blood pressure at starting point mmHg (mean ±SD- 7
(Systolic blood pressure two years mm Hg (mean ±SD- 8
(Duration of diabetes (median/Quartiles 1st -3rd- 9
Missed values- 10

(Total (N=278
52.2
25.5
23.0
28.1
48.9
±11.74 67.24
±22.00 151.20
±29.1 153.83
)2.75-12.25( 6.0
--

2a

Smoking histroy (all subjects)
60

50
49
40

30
28
23

Percent

20

10

0
never

smoking history

stopped smoking

yes

2b

Smoking history by sex
100

80

83

60

44

40

38

Percent

SEX
20
18

male
11

0
never

smoking history

stopped smoking

7
yes

female

3a

Age using Bar (mean used as summary)
70

69

68

Mean age (years)

67

66

65

64
male

SEX

female

Boxplot age by Sex

3b
120

100

80

60

age (years)

40

20

195

0
N=

This graph gives check for
Data distribution and checking
SEX
for outliers

145

133

male

female

Height of the included subjects
4a

Median=170.55 cm
50

40

30

20

Std. Dev = 8.89
10

Mean = 170.5
N = 278.00
0

5
7.
19 .0
5
19 .5
2
19 .0
0
19 .5
7
18 .0
5
18 .5
2
18 .0
0
18 .5
7
17 .0
5
17 .5
2
17 .0
0
17 .5
7
16 .0
5
16 .5
2
16 .0
0
16 .5
7
15 .0
5
15 .5
2
15 .0
0
15

height (cm)

Duration of diabetes

4b
80

Median=6.0 years
70
60
50
40
30
20
Std. Dev = 6.96
10

Mean = 7.9

0

N = 278.00
0.0

5.0
2.5

10.0
7.5

15.0
12.5


20.0
17.5

25.0
22.5

30.0
27.5

32.5

syst. blood pressure at sta rt

Valid

-5a

Using F
requency table: P
95≈189-190

100
110
112
115
116
120
121
122
124
125
127
130
131
132
134
135
136
137
139
140
141
144
145
147
148
150
151
151
152
153
155
158
160
161
162
163
164
165
167
168
170
171
172
175
176
177
178
179
180
182
184
185
187
189
190
194
195
200
205
209
210
216
220
Total

Frequency
1
1
2
1
2
21
2
1
1
6
1
16
1
2
1
11
1
2
1
28
2
4
12
1
1
31
1
23
1
1
2
1
21
1
1
1
1
5
1
2
14
1
2
4
1
1
1
2
14
2
1
1
1
1
6
1
1
2
1
1
3
1
1
278

Percent
.4
.4
.7
.4
.7
7.6
.7
.4
.4
2.2
.4
5.8
.4
.7
.4
4.0
.4
.7
.4
10.1
.7
1.4
4.3
.4
.4
11.2
.4
8.3
.4
.4
.7
.4
7.6
.4
.4
.4
.4
1.8
.4
.7
5.0
.4
.7
1.4
.4
.4
.4
.7
5.0
.7
.4
.4
.4
.4
2.2
.4
.4
.7
.4
.4
1.1
.4
.4
100.0

Valid Percent
.4
.4
.7
.4
.7
7.6
.7
.4
.4
2.2
.4
5.8
.4
.7
.4
4.0
.4
.7
.4
10.1
.7
1.4
4.3
.4
.4
11.2
.4
8.3
.4
.4
.7
.4
7.6
.4
.4
.4
.4
1.8
.4
.7
5.0
.4
.7
1.4
.4
.4
.4
.7
5.0
.7
.4
.4
.4
.4
2.2
.4
.4
.7
.4
.4
1.1
.4
.4
100.0

Cumulative
Percent
.4
.7
1.4
1.8
2.5
10.1
10.8
11.2
11.5
13.7
14.0
19.8
20.1
20.9
21.2
25.2
25.5
26.3
26.6
36.7
37.4
38.8
43.2
43.5
43.9
55.0
55.4
63.7
64.0
64.4
65.1
65.5
73.0
73.4
73.7
74.1
74.5
76.3
76.6
77.3
82.4
82.7
83.5
84.9
85.3
85.6
86.0
86.7
91.7
92.4
92.8
93.2
93.5
93.9
96.0
96.4
96.8
97.5
97.8
98.2
99.3
99.6
100.0

(p95, p5= M
ean± Z score (probability) at the specified percentiles * (Standard deviation

Probability distribution of the normal curve: page 180

-/-52
P95 SB
P1= 151.2+1.645(22.0)=187.4 mmH
g

5b-1
P5 for duration of diabetes


Valid

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
25
26
27
28
31
32
Total

Frequency
12
35
22
21
24
20
23
19
6
6
6
13
2
7
6
5
11
8
6
5
3
5
2
2
3
1
1
2
1
1
278

Percent
4.3
12.6
7.9
7.6
8.6
7.2
8.3
6.8
2.2
2.2
2.2
4.7
.7
2.5
2.2
1.8
4.0
2.9
2.2
1.8
1.1
1.8
.7
.7
1.1
.4
.4
.7
.4
.4
100.0

Valid Percent
4.3
12.6
7.9
7.6
8.6
7.2
8.3
6.8
2.2
2.2
2.2
4.7
.7
2.5
2.2
1.8
4.0
2.9
2.2
1.8
1.1
1.8
.7
.7
1.1
.4
.4
.7
.4
.4
100.0

Cumulative
Percent
4.3
16.9
24.8
32.4
41.0
48.2
56.5
63.3
65.5
67.6
69.8
74.5
75.2
77.7
79.9
81.7
85.6
88.5
90.6
92.4
93.5
95.3
96.0
96.8
97.8
98.2
98.6
99.3
99.6
100.0

:Or using the formula
M
ean-Z score (1.645)* SD =-3.6 years

Total population n=287, μ=67.24 years

σ11.743
-

+

Sample no.

Mean

1

67.6

12.07

2

67.13

11.81

3

67

11.98

4

67.8

11.63

5

66.33

11.44

6

67.44

11.95

7

67.84

12.42

8

66.59

11.36

9

67

12

10

66.38

11.9

11

68.06

12.06

12

67.61

11.02

13

67.31

11.33

14

66.44

11.91

15

66.87

11.26

16

66.8

11.5

17

66.73

12.37

18

66.38

11.77

19

67.03

11.22

20

66.58

12.13

21

66.81

11.55

22

66.58

12.21

23

67.2

11.61

24

66.48

11.48

25

67.53

12.1

26

67.58

10.6

27

67

11.91

28

67.31

11.59

Age in years

SD

28 samples of 150 from
a total population of 287

26

27

28 80

1

2

3

60

25
24

4
5

40

6

23

20

7

Sample no.

22

0

8

Mean

9

SD

21
20
19

10
11
18

17

16

15

14

13

12

Population and Sample
o In scientific research we want to make a statement
(conclusion) about the population.
o Studying the whole population is impossible in terms
of money/time/labor.
o Random sampling from the population and infer from
the sample data the needed conclusions.
o The task of statistics is to quantify the uncertainty
(the sample is really representing that population).

The concept of sampling
Study population: You select a few sampling units
Sam
pling units
from the study population

You make an estimate
“prediction” extrapolated to the
study population
(prevalence, outcomes etc.)

Sample

You collect information
from these people to
find answers to your
research questions.

What would be the mean systolic blood pressure
?of older subjects (65+) in Al Hassa
175

P
opulation mean ( μ)= unknown

165

180
155

F
rom our sample we calculate an estimate of the population parameter

T good sample (the
he
(estimator
: Should be
:Unbiased
The mean of sample = population mean
)Precise: (narrow dispersion about the mean
The dispersion in repeated samples is small
This is a dream

Sampling error
F
our individuals A, B C, D
,
A = 18 years
B 20 years
=
C= 23 years
D= 25 years
T
heir mean age is = 18+20+23+ 25=
86/ 21.5 years (population mean μ).
4=

P
robability of sampling two individuals: (6 probabilities)
A+B
=18+20= 38/
2=19.0 years
A+C= 18+23=20.5 years.
Sampling error= population mean-sample mean
A+D=18+25=21.5 years.
= ranges from -2.5 to +2.5 years.
B
+C=20+23=21.5 years.
B
+D=20+25=22.5 years.
C+D=23+25=24.0 years.

P
robability of sampling three individuals: (4 probabilities)
A+B
+C=18+20+23=20.33 years. E
rror = ranges from -1.17 to +1.7 years.
A+B
+D=18+20+25=21.00 years.
A+C+D=18+23+25=22.00 years.
B
+C+D=20+23+25=22.67 years.

If C=32 (instead of 23) years and D=40 (instead of 25) years:
sampling of 2= sampling error of -7.00 to +7.00 and in 3= -3.67 to
+3.67 years.
T greater the variability of a given variable the larger the sampling
he
error for a given sample size.

)Infinite samples should represents the population it came from (good estimator

2
o T normal distribution
he
o T Standard error of the mean
he
o E
stimation:
Reference interval Confidence intervals F mean
or
proportion
Difference between
means/
proportions
RR and OR

Norm Distribution:
al
M
any human traits, such as intelligence, personality, and attitudes, also, the
weight and height, are distributed among the populations in a fairly normal
way.

56

١٤٣٥/٠٢/٦

T normal distribution
he

(within between μ ±1 SD (σ ±68%

(within between μ ±2 SD (σ ±95%
SDs Definite outliers 3<

2SDs Possible outliers<

One more
T Z score which measures how many standard
he
deviations a particular data point is above or
below the mean.
oUnusual observations would have a Z score over
2 or under 2 SD.
oE
xtreme observations would have Z scores over 3
or under 3 SD and should be investigated as
potential outliers.

Z = X1 − χ

s

.Areas under the standard normal curve
Z

±0.1
±0.2
±0.3
±0.4
±0.5
±0.6
±0.7
±0.8
±0.9
±1
±1.1
±1.2
±1.3
±1.4
±1.5
±1.6
±1.645
±1.7
±1.8
±1.9
1.96
±2
±2.1
±2.2
±2.3
±2.4
±2.578

Area under curve
between both points
((around the mean
0.080
0.159
0.236
0.311
0.383
0.451
0.516
0.576
0.632
0.683
0.729
0.770
0.806
0.838
0.866
0.890
0.900
0.911
0.928
0.943
0.950
0.954
0.964
0.972
0.979
0.984
0.99

B
eyond both
points
(two tails)

B
eyond one point
(one tail)

0.920
0.841
0.764
0.689
0.617
0.549
0.484
0.424
0.368
0.317
0.271
0.230
0.194
0.162
0.134
0.110
0.100
0.089
0.072
0.057
0.050
0.046
0.036
0.028
0.021
0.010
0.004

0.4600
0.4205
0.3820
0.3445
0.3085
0.2745
0.2420
0.2120
0.1840
0.1585
0.1355
0.1150
0.0970
0.0810
0.0670
0.0550
0.0500
0.0445
0.0360
0.0290
0.0250
0.0230
0.0180
0.0140
0.0105
0.0100
0.0020

Calculating values from Z-scores

(.Xi = Mean± Z (standard deviation
(Value (percentiles) =M
ean± Z score* (SD

Random sample for estimating a population
mean
X1=128
?μ

X2=133
X3=129

F
rom the information in the sample, we will estimate the
unknown
(population mean (X is an estimator for μ
?W
hat could have happened if we had another random sample
?W
hat is the measure of variation of sample means

T Sampling Distribution of a Sample Statistics
he

≈ L
et’s assume that we want to survey a
community of 400, the age of them were
recorded and having the following parameters:
µ = 35 years
σ = 13 years

≈

L
et’s assume, however, that we do not survey all 400,
instead we randomly select 120 people and ask them
about their ages and calculate the mean age.

≈

T
hen, we put them back into the community and randomly select
another 120 residents (may include members of the first sample).
W did this over and over and each time we calculate the mean
e
age.
T results will be like those in the following table.
he

≈
≈

Sample Number
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
SD of the means

Sample mean
34.7
35.9
35.5
34.7
34.5
34.4
35.7
34.6
37.4
35.3
34.1
35.5
34.9
36.2
35.6
35.0
35.1
36.4
35.6
33.6
13.37

Distribution of 20 random sample means
((n=20
μ

..…
..… . .…
.

.
33

34

35

36

..

.
37

All the results are clustered around the
population value (35 years), with a few scores
a bit further out and one extreme score of 37.4
(.years (random variation=1/
20=5%

,T
hose 400 people have age range from 2 to 69 years
while the means of the samples have a very
narrow range of value of about 4 years and 10
(.samples coincide with the population mean (35 years

M of the samples will cluster around the population
ost
parameters with occasional sample result falling
relatively further to one side or the other of the
distribution (this called the sam
pling distribution of
(.sam m
ple eans

:H the following properties
as
T mean of the sampling distribution is equal to the
he
population mean, the average of the averages (µχ)
will be the same as the population mean.
T standard deviation of the sample means = the
he
standard error SE σ/ n, (σ= population SD).
= √
T distribution of the sample means is Normal if the
he
population distribution is Normal.
If the population distribution is Not Normal, T
he
distribution of the sample means is almost Normal
when n is large (Central L
imit T
heorem).

Standard error of the mean

P
opulation
P
arameters
M
ean
S.D

Sample mean

Sample
M
ean
S.D

The degree the sample statistics are deviating /different
.from the population parameters

T term error indicates the fact that due to sampling error,
he
each sample mean is likely to deviate some what from true
population mean.

Central L
imit T
heorem
.T formula for SE= SD/
he
√n
T formula indicates that we are estimating the SE given
he
.the S.D of a sample of size n
.For a sam of 100 a S.D of 40 the SE= 40 /
ple
nd
√100 = 4
.For a sam of 1000 and S.D of 40 the SE= 40 /√1000 = 1.26
ple
T factors influence the SE sample size and S.D of the
wo
,
:sample
. Sample size has greater impact as it is used a denominator
.For a sam of 100 a S.D of 20 the SE = 20 /√100 = 2
ple
nd
.For a sam of 100 a S.D of 40 the SE = 40 /√100 = 4
ple
nd
If there is more variability within a sample the greater the
.SE

(Confidence Interval (CI
A confidence interval gives an estimated
range of values which is likely to include
an unknown population parameter, the
estimated range being calculated from a
given set of sample data.

W need to know the smallest and the largest μ (range) we think is likely
e
using sample statistics.
T mean of sample = μ
he

c= level of
confidence

Z c= Z critical
values (under
( normal curve

90%
95%
99%

1.645
1.960
2.578

σ 
χ±Ζ 

c
 n
(C.I= Mean of the sample ±Z critical scores (SEM
SEM= SD/√n

C.I
• The confidence interval provides a range that
is highly likely (often 95% or 99%) to contain
the true population parameter that is being
estimated.
• The narrower the interval the more informative
is the result.
• It is usually calculated using the estimate
(sample mean) and its standard error (SEM).

CI for μ
Systolic blood pressure in 287 diabetic patients
Descriptives
syst. blood
pressure at start

syst. blood
pressure at start

Mean
90% Confidence
Interval for Mean

Lower Bound
Upper Bound

5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Descriptives
Interquartile Range
Skewness
Mean
Kurtosis
90% Confidence
Lower Bound
Interval for Mean
Upper Bound
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis

Statistic
151.20
149.02

Std. Error
1.319

153.38

(C.I= 151.20±1.65(21.997/ 287 90%
√
C.I=149.02-153.38 mmH
g

150.30
150.00
483.880
21.997
100
220
120
30.00
Statistic
.540
155.06
.152
149.92

Std. Error
.146
3.064
.291

160.20
154.72
151.20
460.033
21.448
115
205
90
30.00
.263
-.506

Random sample of 50 out of 287

.340
.668

Descriptives
syst. blood
pressure at start

Mean
95% Confidence
Interval for Mean

Lower Bound
Upper Bound

5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis

Statistic
151.20
148.60

Std. Error
1.319

153.80
150.30
150.00
483.880
21.997
100
220
120
30.00
.540
.152

(C.I=151.20±1.96(21.997/ 287 95%
√
C.I=148.60-153.80 mmH
g
.146
.291

Descriptives
syst. blood
pressure at start

Mean
95% Confidence
Interval for Mean
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis

Lower Bound
Upper Bound

Statistic
155.06
148.90

Std. Error
3.064

Random Sample of 50 out of 287

161.22
154.72
151.20
460.033
21.448
115
205
90
30.00
.263
-.506

.340
.668

Descriptives
syst. blood
pressure at start

Mean
99% Confidence
Interval for Mean

Lower Bound
Upper Bound

5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Descriptives
Skewness
Kurtosis
syst. blood
pressure at start

Mean
99% Confidence
Interval for Mean
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis

Lower Bound
Upper Bound

Statistic
151.20
147.78
154.62
150.30
150.00
483.880
21.997
100
220
120
30.00
.540
.152
Statistic
155.06
146.84

Std. Error
1.319

99%
(C.I=151.20±2.58(21.997/ 287
√
C.I=147.78-154.62 mmH
g

.146
.291
Std. Error
3.064

163.28
154.72
151.20
460.033
21.448
115
205
90
30.00
.263
-.506

Random sample of 50 out of 287

.340
.668

(C.I= 151.20±1.65(21.997/ 287 90%
√
C.I=149.02-153.38 mmH
g
(C.I=151.20±1.96(21.997/ 287 95%
√
C.I=148.60-153.80 mmH
g
99%
(C.I=151.20±2.58(21.997/ 287
√
C.I=147.78-154.62 mmH
g
W
hat does this mean? It means that if the same
population is sampled on numerous occasions and
interval estimates are made on each occasion, the
resulting intervals would bracket the true population
parameter (ranged) in approximately 90, 95 and 99 %
. of the cases

T sample distribution of a proportion
he

µp =π
SE ( p ) =

p (1 − p )
n

p =K / n

CI p = p ±1.96( SE )
Z critical score equal 95%

Smokers among diabetics
Sample=400
Smokers=40
P=40/400=0.1
SE (p) = √0.1-0.9/400=0.015
CI p 95%= 0.1±1.96(0.015)
[0.07-0.13]
for % it is the same SE=1.5% C.I=[7-13]

CI for the difference between two 95%
(means (μ1-μ2
Smoke
No
Yes
Difference

n

Mean SBP

(SE (mean

214
64

153.1
144.8
8.3

1.50
2.62

χ1 − χ 2 ± 1.96 * SE ( χ1 − χ 2 )
SE = ( SE ( χ1 )) 2 + ( SE ( χ 2 )) 2

C.I= 2.4 to 14.2

CI for percentage 95%
(Smoke (n

died%

SE

(No (212

28.8

3.11

(Yes (64

23.4

5.30

Pns − Ps ± 1.96 * SE ( Pns − Ps )

Difference= 5.4%

 P1 × (100 − p1) P2 × (100 − p 2) 
SE = 
+

n1
n2


C.I=-6.7% to 17.4% 95%

CI for RR and OR 95%
Use available software

http:/www.medcalc.org/
/
calc/
odds_ratio.php
http:/www.medcalc.org/
/
calc/
relative_risk.php
vl.academicdirect.org/applied_statistics/.../CIcalculator.xls

Inferential Statistics
Testing in research
o In scientific research we would like to test if our
research ideas are true.
o Based on previous observations (studies) we know
that the mean cholesterol of patients with diabetes is
higher than those without the disease.
o We will take samples and check whether the results
will agree with our expectations.
o Meaning we are going to test the situation using a
statistical test.

The Z-test for one sample
(Serum cholesterol (μ=5 mmol/
L

Diabetic patients, mean cholesterol > 5

σ=±1.5

?Considering σ=±1.5

Is there any difference between diabetes free population and the diabetic patients
. regarding serum cholesterol? Let’ s perform Z test

(Research question (hypothesis
T research hypothesis would be
he
The mean cholesterol of diabetics is > 5mmol/L

Null hypothesis
H0: μ=sample mean=5

Alternative hypothesis
(H1: μ >5 (one sided
Or
(H1: μ≠5 (two sided

P
rocedure

μ=5

Mean of sample

Cholesterol level diabetic patients in mmol/L
60

If the sample mean close to the population mean
The null hypothesis is TRUE

50

40

If the sample mean differs from population mean
We REJECT the null

30

20

Std. Dev = 1.33

10

Mean = 6.25
N = 278.00

0

0
.0
13
0
.0
12
0
.0
11
0
.0
10

00
9.

00
8.

00
7.

00
6.

00
5.

00
4.

00
3.

total cholesterol

T ά level (P
he
(value
T probability to obtain /
he
achieve the null
hypothesis
T probability that P
he
opulation mean=sample
mean
T
here no difference between the population and
.sample mean
Or
The maximum probability we accept to reject the null
hypothesis falsely
ά = 0.05

(P > 0.05 (ά
Accept the null
Sample mean= population
mean

(P ≤ 0.05 (ά
Reject the null
Sample mean≠
population mean

Alpha level

(Calculation (σ=1.5
SE =μ/ n=0.3
M √
Z=(mean sample-μ)/
σ
P (mean of the sample≥6)=P ≥6-5)/
(Z
0.3= 0.0005
Under the normal curve area of rejection >1.96 Z

: P=0.0005
T cholesterol blood level of diabetic patients can coincide
he
with the population (disease free) 5 in 10,000 times
T two values could be the same in 5 times if we repeated this test 10,000 tim
he
P < 0.05 so we reject the null
T diabetics have larger mean cholesterol level than the normal population
he

In reality
It is unlikely that the σ (population SD) is
known.
In most of the cases, σ will be unknown and
we will be able to apply neither the formula
nor the table of normal distribution (areas
under the curve=Z score).
We resort to other statistical tests.

P
ossible situations in testing

Possible situations in Hypothesis testing
Level of significance

Reality

Decision

Reject H0
(Type I error (ά

H0 is true
H0 is not true

Do not reject H0
(OK (1-ά

)OK (1-В

)Type II error (В

В= Power-1
It is the probability to reject the null hypothesis if is NOT T
RUE
Usually 80% is the least required for any test

Errors of Hypothesis Testing and Power
Conclusion from hypothesis testing

Decisions and errors in hypothesis testing

True Situation
(Difference exist (H )
1

No difference (H
0

Study results

Correct decision
Difference exist
Reject H
0

No difference
Do not reject H
0

(power or 1-β )

T
ype II or β error
F
alse acceptance
T
here is no difference
when it is really
.present

T
ype I error or ά
Rejection when it is true
F
alse rejection
T
here is a difference
when it is really not
Correct decision

P
assive smoking and lung cancer
T
ruth about the population

Conclusions,
based on results
from a study of a
sample of the
population

Reject the null
hypothesis (rates in
the study appear to
(be different
Accept the null
hypothesis (rates in
the study appear
(similar

P
assive smoking
is
related to lung
.cancer

Not related to
.lung cancer
T
ype I E
rror
Incorrect rejection
P
assive sm
oking is related to
lung cancer when it is really
not..

T
ype II E
rror
Incorrect acceptance
P
assive sm
oking is not
related to lung cancer when
it is reallydoes.

The Alpha-Fetoprotein (AFP) test has both Type I and Type II error
. possibilities
This test screens the mother’ s blood during pregnancy for AFP and
. determines risk
.
Abnormally high or low levels may indicate Down syndrome
Ha: patient is unhealthy

H0: patient is healthy

Error Type I (False positive or False Rejection) is: Test wrongly indicates
that patient has a Down syndrome, which means that pregnancy must be
.aborted for no reason
Error Type II (False negative or False Acceptance) is: Test is negative and
the child will be born with multiple anomalies

HypotHesis test
This is the distribution
given the null
hypothesis is true

type i and type ii
error

False acceptance
False rejection

one sample
The distribution of X under the null and
alternative hypotheses.

t-distribution
In real life situations we
will estimate the
unknown population SD
. using Sample SD
Results are standardized to
:the t-distribution
Z test for normal distribution
The population SD is known

χ −µ
t=
s
n
Z=

χ −µ
σ

n

t-distribution

df=No. of observations (sample size)-1

Heavier tails than the Z distribution

(Degree of freedom (df

For all sample statistics: variance, SD, we used
n-1
All the observations in any given sample are free
.except one= Complementary effect

Degree of freedom
total =50
12

restricted

16
7

df = n-1

15

t-test-steps to determine the statistical difference
W
hen? descriptive statistics: mean ± standard deviation

Number of
samples

One sample
vs. population mean

t = χ − µ / SD

n

T independent
wo
samples
2
SD12
SD2
χ1 − χ 2 /
+
n1
n2

T dependent (two
paired):
Repeated
measures
tMatched pairs
d−

dependent =
SE ( d −)

Steps:
1- State the hypothesis to be tested: Null (non-directional-two tailed)
mean= mean
Alternative (unidirectional-one tail)
mean ≠ mean
2- F the calculated t value: using the formulae.
ind
3- F the degree of freedom: all = n-1 (two sample independent df=n1-1+n2-1
ind
(n1+n2-2).
4- F the P value using the tables of t-distribution.
ind
5- Conclude: if < 0.05 = rejection. If > 0.05 the null is accepted.

t-test (student’s t-test) one sample
t = χ − µ / SD

n

?Using diabetes data: Is the mean age of diabetics > 65 years
H0:μ=65
H1:μ≠65
t one sample =67.24-65/SD/√n=3.18
t distribution P=0.002
Reject the null
Diabetics are significantly older than 65 years

Statistics
age (years(
N
Mean
Std. Error of Mean
Std. Deviation
Variance

Valid
Missing

278
0
67.24
.704
11.743
137.902

(P value (two sided

One-Sample Test
Test Value = 65

age (years(

t
3.182

df
277

Sig. (2-tailed(
.002

Mean
Difference
2.24

95% Confidence
Interval of the
Difference
Lower
Upper
.85
3.63

Degree of freedom

Assuming that the distribution of age is normal
( Population SD is unknown (σ

t-test for comparison of means of two
independent samples
H0: Smoking has no effect on systolic blood pressure
Mean S= Mean NS or Mean S-mean NS=0
H1: smoking has an effect
Mean S≠ Mean NS or Mean S-Mean NS≠0
:Assumptions
•Independent observations (2 samples)
•Normally distributed
•Equal variances (for the pooled t-test)

T
hree formulae
Expected difference if H0 is true
Standardized

t =

χ −χ −0
1
2
2
S12
S2
+
n1
n2

If SDs are equal

t=

χ1 − χ 2
2
Sp

n1

+

SD of the difference

t=

2
Sp

n2

2
(n1 − 1) S12 + (n2 − 1) S 2
S =
(n1 − 1) + (n2 − 1)
2
p

If SDs are not equal

χ1 − χ 2
2
1

2
2

S
S
+
n1 n2

Pooled SD
Decision based on L
evene’s test

Variances are apparently equal
Group Statistics

syst. blood
pressure at start

SMOKING
no
smokers

N

Mean
153.11
144.82

214
64

Std. Deviation
21.995
20.934

Std. Error
Mean
1.504
2.617

Independent Samples Test
Levene's Test for
Equality of Variances

F
syst. blood
pressure at start

Equal variances
assumed
Equal variances
not assumed

Sig.
.006

.936

t-test for Equality of Means

t

Sig. (2-tailed(

df

Mean
Difference

Std. Error
Difference

95% Confidence
Interval of the
Difference
Lower
Upper

2.674

276

.008

8.29

3.100

2.188

14.392

2.747

107.982

.007

8.29

3.018

2.308

14.272

Two separate t-test

Not significant it means equal variances

P value <0.05, reject H0

Paired t-test
If we have paired data (two repeated
measurements on the same subjects) or before
and after
If the difference of the paired observations are
Normally distributed.

(P
aired samples (dependent
•
•

(P
aired /dependent 2-sample t-test)

To compare observations collected form the same group of individuals on 2
separate occasions (dependent observations or paired samples).
T paired t statistics is calculated by:
he

- Calculate the difference between the 2 measurements taken on
each individual.
md
- Calculate the mean of the differences.
- Calculate the SE of the observed differences. SE d
- Under the null hypothesis of no difference or difference = 0, the
paired t statistic takes the form.
md - 0
- t= Mean difference / SE of the difference.
t=

SEd

- It has a normal distribution with degrees of freedom = (n-1)

E
xample

F
our students had the following scores in 2 subsequent tests.
Is there a significant difference in their performance?
Number

Name

T 1
est

T 2 Dif
est

1

Mike

35%

32-

67%

2

Melanie

50%

4

46%

3

Melissa

90%

4

86%

4

Mitchell

78%

13-

91%

S D Dif = 17.152, SE Dif = 8.58Mean Dif = -9.25,
Calculated Paired t = -9.25/8.58 = -1.078,
df=n-1 = 3

md - 0
t=
SEd

df

P value0.01

Level of significance for one-tail test
0.05

0.02

0.01

0.005

Level of significance for two-tail test
0.20

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
35
50
∞

0.10

0.05

0.02

0.01

3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372
1.363
1.356
1.350
1.345
1.341
1.340
1.333
1.330
1.328
1.325
1.323
1.306
1.299
1.282

6.314
2.920
2.353
2.132
2.015
1.943
1.895
1.860
1.833
1.812
1.796
1.782
1.771
1.761
1.753
1.746
1.740
1.734
1.729
1.725
1.721
1.690
1.676
1.645

12.706
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
2.201
2.179
2.160
2.145
2.131
2.120
2.110
2.101
2.093
2.086
2.080
2.030
2.009
1.960

31.821
6.965
4.541
3.747
3.365
3.143
2.998
2.896
2.821
2.764
2.718
2.681
2.650
2.624
2.602
2.583
2.567
2.552
2.539
2.528
2.518
2.438
2.403
2.326

63.657
9.925
5.841
4.604
4.032
3.707
3.499
3.355
3.250
3.169
3.106
3.055
3.012
2.977
2.947
2.921
2.898
2.878
2.861
2.845
2.831
2.724
2.678
2.576

T P value = 0.20, the null is accepted!
he

Conclusion
T observed difference can be
he
encountered in 36 (actual P
value =0.362 out of 100 cases.
i.e. we accept the null hypothesis
of no difference between first
and 2nd test.

Paired Samples Statistics
Mean
Pair
1

syst. blood pressure
at start
after 2 years

Std. Deviation

N

Std. Error
Mean

151.20

278

21.997

1.319

153.83

278

29.076

1.744

Paired Samples Test
Paired Differences

Mean
Pair
1

at start - syst. blood
pressure after 2 years

-2.63

Std. Deviation

Std. Error
Mean

17.920

1.075

95% Confidence
Interval of the
Difference
Lower
Upper
-4.74

-.51

t
-2.443

df

Sig. (2-tailed(
277

.015

T of significance
est
Interval/
ratio data
P
arametric assuming normal distribution

(Known Population Variance (σ
One sample Z-test
Z test, rejection limit >
±1.96

χ−µ
Z= σ
n

One sample vs. population
One sample t-test

Unknown Population Variance

t-test

Reject if P ≤ 0.05

Number of samples

T samples
wo
Dependent
t-paired test

Independent
t-test independent

The Chi-Square test χ

2

Used for hypothesis testing for categorical
variables
M
any types depends on design, distribution
of variables and objectives of testing

χ

2

:E
xample
Vaccination against Influenza deceases the risk
.to get the disease
:Study
Compare the effectiveness of 5 vaccines with
.respect to the probability to get influenza
(Comparison will be in respect to a nominal variable (getting influenza: yes or no

Effectiveness of Five Vaccines
Data cross tabulated 2X5: response variable: Influenza
Frequency

within Vaccines%

Vaccines

Influenz
a No

Influenz
a Yes

T
otal

Vaccines

Influenz
a No

Influenz
a Yes

T
otal

1
2
3
4
5

237
198
245
212
233

43
52
25
48
57

280
250
270
260
290

1
2
3
4
5

84.6
79.2
90.7
81.5
80.3

15.4
20.8
9.3
18.5
19.7

100
100
100
100
100

T
otal

1125

225

1350

T
otal

83.3

16.7

100

T probability to get influenza
he

he null hypothesis states that the probability to get influenza is independent of the vaccin
T alternative states that a dependency exists
he

Effectiveness of Five Vaccines
:If H0 is true
=The probability to influenza in every group should be the same
, the probability in the total population
(Equal to: 225/1350=0.167 (16.7%
, Vaccine 1 used in 280, if H0 is true
.we expect that 16.7% (≈47) to get influenza
However this is not true

Expected frequencies
F any cell: E
or
xpected F
requency= Row total* column total/grand total
Vaccines
Observed-1
E
xpected
Observed-2
E
xpected
Observed-3
E
xpected
Observed-4
E
xpected
Observed-5
E
xpected
T
otal

Influenz
a No

Influenza
Yes

T
otal

237
233.3
198
208.3
245
225.0
212
216.7
233
241.7

43
46.7
52
41.7
25
45.0
48
43.3
57
48.3

280

1125

225

1350

Column total

250

Row total
280X225/1350

270
260
1125/1350*260
290
Grand total

Pearson Chi-square test
.Calculate the expected frequencies (assuming H0 is true) for all the ten cells
Calculate Chi square: Of = observed frequency
Ef = Expected frequency

χ =∑
2

(O f − E f )

2

Ef

Reject H0 if χ2 is large
Use the Chi-square distribution
(After determining the degree of freedom (df
(df= (r-1)* (c-1

Critical values for Chi-square
df

Level of Significance
0.99

1
2
3
4
5
.
.
30

0.90

0.70

0.50

0.30

0.20

0.10

0.05

0.01

0.001

0.00016
0.0201
0.115
0.297
0.554

0.0158
0.211
0.584
1.064
1.610

0.148
0.713
1.424
2.195
3.000

0.455
1.386
2.366
3.357
4.351

1.074
2.408
3.665
4.878
6.064

1.642
3.219
4.642
5.989
7.289

2.706
4.605
6.251
7.779
9.236

3.841
5.991
7.815
9.488
11.070

6.635
9.210
11.341
13.277
15.086

10.827
13.815
16.268
18.465
20.517

14.953

20.599

25.508

29.336

33.530

36.250

40.256

43.773

50.892

59.703

χ2critical= 9.488
Calculated=16.555
df=(2-1)(5-1)=4
P=0.002

There is a relation (dependence) between type of vaccine and influenza prevention

SMOKING * SEX Crosstabulation
SEX
male
SMOKING

no
smokers

Total

90
42.1%
55
85.9%
145
52.2%

female
124
57.9%
9
14.1%
133
47.8%

Total
214
100.0%
64
100.0%
278
100.0%

Exact Sig.
(2-sided(

Exact Sig.
(1-sided(

.000

Count
% within SMOKING
Count
% within SMOKING
Count
% within SMOKING

.000

Chi-Square Tests

Pearson Chi-Square
Continuity Correctiona
Likelihood Ratio
Fisher's Exact Test
Linear-by-Linear
Association
N of Valid Cases

Value
38.017b
36.279
41.649
37.880

df
1
1
1
1

Asymp. Sig.
(2-sided(
.000
.000
.000
.000

278

a. Computed only for a 2x2 table
b. 0 cells (.0%( have expected count less than 5. The minimum expected count is
30.62.

At least 80% of cells must have Ef >5

We can’ t use Pearson Chi-square if
the expected frequency is <5
In this case we use Fisher’ s Exact test

status * SEX Crosstabulation
Count
SEX
male
status alive
died from CVD
other cause of death
Total

24
4
2
30

female
15
1
2
18

Total
39
5
4
48

(Expected f=4*30/48=2.5 (<5

Fisher Exact test provides correction

(E f=5*18/48=1.875 (<5

Chi-Square Tests

Pearson Chi-Square
Likelihood Ratio
Linear-by-Linear
Association
N of Valid Cases

Value
.935a
.991
.004

2
2

Asymp. Sig.
(2-sided(
.626
.609

1

.951

df

48

a. 4 cells (66.7%( have expected count less than 5. The
minimum expected count is 1.50.

Chi-square is not valid

Chi-Square Tests

Pearson Chi-Square
Continuity Correctiona
Likelihood Ratio
Fisher's Exact Test
Linear-by-Linear
Association
N of Valid Cases

37.880

df
1
1
1
1

Exact Sig.
(2-sided(

Exact Sig.
(1-sided(

.000

Value
38.017b
36.279
41.649

Asymp. Sig.
(2-sided(
.000
.000
.000

.000

.000

278

a. Computed only for a 2x2 table
b. 0 cells (.0%( have expected count less than 5. The minimum expected count is
30.62.

McNemar test

Paired data in a cross tabulation
(eczematous persons on both arms use ointment A or B (randomized 54

Ointment B
No+

Total

Ointment A
+
No

10
5

16
23

26
28

Total

15

39

54

M
cNemar test only take the discordant pairs into account

Χ2=(23-10)2/23+10
df=1

Medical statistics Basic concept and applications [Square one]

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Medical statistics Basic concept and applications [Square one]

Similar to Medical statistics Basic concept and applications [Square one] (20)

More from Tarek Tawfik Amin

More from Tarek Tawfik Amin (20)

Recently uploaded

Recently uploaded (20)

Medical statistics Basic concept and applications [Square one]