SlideShare a Scribd company logo
1 of 135
Medical Statistics
2013
Dr Tarek Tawfik Amin
Introduction
-

Questions
Why statistics?
The process
The resources
?How
• Book: Statistics at Square One 11th ed.
“ Campbell and Swinscow”
• SPSS Practical sessions-PASW guide.
• Practical sessions using SPSS v. 17.0
”Statistics “ an overview
Population

Parameters
Data

Analysis
Interpretation

Information

Sample

Statistics

Statistical analysis

Reference range
Researches
Statistical analysis
Data
Statistical analysis

Variables

Qualitative
Categorical

Quantitative
Numerical

Depends on the sample (s) and objectives of analysis

Interval/Ratio
Nominal

Ordinal

Tables

Discrete
Continuous

Descriptive

Graphs

Inferential

Measures
I-Descriptive Statistics
Goals
Summarizing
Overview
Data checking
diabIB
SB
P

SB
P

P NR
AT

AGE

SE
X

SM E
OK

H IGH
E T

W IGH
E T

CH
OL

H A1C
B

DIAB
DU

DE
AD

1

57

0

0

177

98

140

154

0

6.30

7.62

5

#NULL!

2

74

1

0

172

69

150

145

1

5.10

8.30

11

0

3

38

1

0

155

70

120

126

0

6.50

11.00

2

#NULL!

4

73

1

0

165

72

180

157

0

5.80

7.00

21

0

5

53

1

2

174

109

140

119

1

6.80

10.60

7

0

6

74

1

0

171

83

151

145

0

6.25

7.62

7

0

7

81

0

2

175

60

140

113

0

6.50

6.40

6

0

8

86

1

0

164

59

140

158

0

5.20

5.30

4

0

9

78

0

1

171

83

151

148

0

5.60

5.90

1

1

10

78

1

0

171

83

151

159

1

5.00

8.00

23

1

11

91

0

0

171

83

151

140

0

4.30

9.70

4

1

12

77

0

2

176

87

170

198

0

6.40

6.60

7

2

13

77

1

0

171

83

151

152

0

5.20

4.90

26

1

14

84

0

0

171

62

160

148

0

7.00

7.80

8

1

15

72

1

0

154

63

145

148

0

6.20

7.80

0

1

1

IN
2 INSUL
I-Tables
) Tables can summarize counts, frequency (categorical), measures (numerical

Contingency

Frequency

smoking history * SEX Crosstabulation
SEX

Valid

male
female
Total

Frequency
145
133
278

Percent
52.2
47.8
100.0

Count
Valid Percent
52.2
47.8
100.0

Cumulative
Percent
52.2
100.0

SEX
male
smoking
history
Total

never
stopped smoking
yes

26
64
55
145

female
110
14
9
133

Total
136
78
64
278

)For comparison (2 or more variables
Table 3 Daily servings of calcium and vitamin D rich foods in relation to body mass
. index classification of the included adults

(* F
ood items (servings/
day

Subjects classification
(Obese (N=91

Milk
Milk beverage
Milk in cereals
Milk in coffee or tea
-T
otal milk
Yoghurt
Cheese
Ice cream
-T
otal dairy
Tuna (canned)
Fish
Half cooked fish
Shrimp/oyster
Eggs
Liver (including chicken livers)
Others!
-Dietary vitamin D (IU/
day): Median (mean ±SD)
Low dietary intake c (< 200 IU/day): No. (%)
-Dietary calcium (mg/
day): Median (mean ±SD)
Low calcium intake d (<1000mg/day): No. (%)

(Non-obese (N=125

(0.71±0.3)0.52
0.45(0.59±0.4)
0.20(0.33±0.2)
0.15(0.25±0.6)
0.90(1.03±0.3)
0.10(0.12±0.6)
0.20(0.24±0.9)
0.15(0.14±0.6)
0.25(0.45±0.6)
0.05(0.03±0.1)
0.15(0.19±0.7)
0.06(0.11±0.5)
0.05(0.08±0.1)
0.85(0.81±1.1)
0.02(0.04±0.4)
0.20(0.23±0.3)
(111.6)118.1±73.5
56(62.2)
(660.0)698.8±261.9
51(56.7)

(0.88±0.7)0.65
0.35(0.53±0.4)
0.50(0.58±0.4)
0.20(0.23±0.6)
1.20(1.34±0.7)
0.20(0.14±0.5)
0.20(0.29±0.8)
0.06(0.09±0.3)
0.30(0.43±0.7)
0.03(0.04±0.3)
0.10(0.18±0.5)
0.25(0.27±0.6)
0.05(0.06±0.1)
0.80(0.76±0.7)
0.05(0.06±0.3)
0.40(0.55±0.5)
(123.7)132.2±67.4
47(37.6)
(692.0)717.9±245.9
49(39.2)

P value

0.031
0.279
0.001
0.790
0.001
0.790
0.661
0.422
0.826
0.761
0.902
0.029
0.149
0.797
0.834
0.549
0.034
0.003b
0.223
0.011b

a
Assignment I
).Table 1 Basic characteristics for the patients examined (N=278

Baseline characteristics 1996
(%)Men- 1
(%)Insulin users- 2
(%)Smokers- 3
(%)Ex-smokers- 4
(%)Non-smokers- 5
)Age in years (mean ±SD- 6
)Systolic Blood pressure at starting point mmHg (mean ±SD- 7
)Systolic blood pressure two years mm Hg (mean ±SD- 8
)Duration of diabetes (median/Quartiles 1-3- 9
Missed values- 10

)Total (N=278
52.2
25.5
23.0
28.1
48.9
±11.74 67.24
±22.00 151.20
±29.1 153.83
(2.75-12.25) 6.0
0.0
II-Graphs
Goals
Impression
Comparison
Data checking
Clustering
Trend
II- Graphs
Types of variables-1
Number of variables-2
Objectives-3

Selection of graphs
Next

Categorical

Numerical

Figure 1Outcomes of the included diabetic patients (1996)
Figure 2: Smoking status of the inlcuded diabetic patients
60

other cau se of death
M issin g

50

40

30

20

alive

10

Percent

died from CVD

0

never
smoking history

stopped smoking

yes
For numerical variables
Figure 3: Total cholesterol level in diabetic pateints 1996
in mmol/l
60
50
40
30
20
Std. Dev = 1.33

10

Mean = 6.25
N = 278.00

0

.
13

.
12

.
11

.
10

00

00

00

00

0

0

00
9.

0
8.

0
7.

0

0

00
6.

0
5.

0
4.

00
3.

total cholesterol
Figure 4: Systolic blood pressure at starting point
among diabetic patients 1996 (mmHg)
240
220

28
247
99
68
67

200

syst. blood pressure at start

180
160
140
120
100
80
N=

133

male
SEX

145

female
Figure 6: Total cholesterol level in relation to gender and
smoking status among diabetic patients 1996

95% CI total cholesterol (mmol/l)

8.5
8.0
7.5
smoking history
7.0
6.5

n ever

6.0
stopped sm oking

5.5
5.0

yes
N=

26

64

male

SEX

55

110

14

female

9
Figure 7: Duration of diabetes among the included patients 1996

Checking for normality

(in years)
80
70
60

Median=6.0

Mode=1

50

Normal distribution

40
30
20

Std. Dev = 6.96

10

Mean = 7.9

0

N = 278.00
0.0

5.0
2.5

10.0
7.5

15.0
12.5

20.0
17.5

25.0
22.5

-

30.0
27.5

32.5

+

duration of diabetes

Outliers

Mode
Median
Mean
(III-Measures (numerical variables

Central Tendency
H the data aggregate around a central point
ow

Mean
Median
Mode
P
ercentiles

Dispersion
H the data varies
ow

)Range (max-min
Inter Quartile range
Variance
Standard deviation
Variation coefficient
Central Tendency
M
ean= summation of observations/
their number
Affected by extremes of value

x1+x2+x3)/
number(

M
ode= T most frequently occurring values in a set of observations
he
M
edian= T middle value that divide the ordered data set into 50/
he
50
Not affected by extremes of values
Age of sample

3

3

7

7

37

11

M
edian=7
M
ean=(3+7+37)/
3=15.7

M
edian=7
M
ean=(3+7+11)/
3=7
Dispersion
1

1

6

8

10

16

17

23

43

53
Range=53-1=52
Affected by extremes of values

of data %25
25th percentile
1st quartile

M
edian=13
of data 50%
50th percentile=13

of the data 75%
75th percentiles
3rd quartile

Interquartile range=3rd-1st quartiles
17=23-6
IQR not affected by extremes of values
Standard deviation and variance
3

7
6-

2-

Sample of 3, their age in years

9

17
M
ean age=(3+7+17)/
3=9
8+

T sum of the differences between the mean and individual values=0
he
T mean deviation=0
he
T overcome the 0= sum the difference squared/
o
number-1= Variance
52=2/3-1(17-9)+2(6-9)+2(3-9)
)
T amount of dispersion around the mean=52 years2 (wrong scale
he
H
ence we need to convert back to the usual (natural) scale, use the standard deviation
Variance=±7.2 years√
T sample disperses around the mean (=9 years) by 7.2 years on both directions
he
Description of a binary (dichotomous
(variable
o A binary variable: H only two outcomes
as
(diseased or not diseased).
o T proportion of the population that is
he
diseased (at certain point of time) is
called prevalence.
o T new cases occurring is called
he
incidence.
Dichotomous variables
P
revalence= All cases (new or old)/ risk population
at
Incidence= New cases/
total population at risk
P
robability and Odds
o Odds= chance
o In a population of 1000, 200 has a certain
disease.
o W
hen we randomly take one person out, the
probability that this person is diseased=
200/
1000= 0.2 (this is probability)
o T chance (the Odds) that is person is
he
diseased= probability of having the disease
/
probability of not having the disease.
o Odds= P (probability of disease)/
probability of
not having the disease (1-P / =
)=P 1-P
0.2/
0.8=1/ the odds are 1 to 4.
4,
T following table depicts the outcomes of isoniazid/
he
placebo trail
among children with H (death within 6 months
IV

Dead
(within 6
(months

Alive

Total

Placebo

21

110

131

Isoniazid

11

121

132

Interventions

W
hat is the risk of
?dying

Risk=21/
131=0.160

Risk=11/
132=0.083

Absolute risk difference (ARD)=risk in placebo-risk in isoniazid= 0.077

Net relative risk (NRR)=risk in placebo/
risk in isoniazid= 1.928
Relative risk reduction (RRR)=risk in placebo-risk in isoniazid/
risk in placebo= 0.48

Number needed to treat (NNT
)=1/
ARD=1/
0.077=13
)Odds ratio (OR
o An odds ratio (OR) is a measure of association
between an exposure and an outcome.
o The OR represents the odds that an outcome will
occur given a particular exposure, compared to the
odds of the outcome occurring in the absence of that
exposure.
o Odds ratios are most commonly used in case-control
studies, however they can also be used in crosssectional and cohort study designs as well (with some
modifications and/or assumptions).
B
asic structure of case-control design

PoPulation
Diseased

Unexposed to factor
(b)

Diseased
(cases)
Sample

The Odds “ chance of exposure
Is calculated between both groups

E
xposed to factor
(a)

Disease-free

E
xposed to factor
(c)

Disease-free
(controls)

Unexposed to factor
(d)

P time
ast

T
race

P
resent time

Starting point
Calculation
Case control
study

Diseased

Exposed

Cases+ exposed
((a

Exposed+ not
(diseased (b

a+b

Cases-not
( exposed (c

Not exposed+ not
(diseased (d

c+d

Non-exposed

None

Odds ratio= a/ d= ad/
c÷b/
bc
Prevalence among the diseased/
prevalence among the non-diseased

OR=1 Exposure does not affect odds of outcome
OR>1 Exposure associated with higher odds of outcome
OR<1 Exposure associated with lower odds of outcome

Total
Odds ratio
Case control
study

Lung cancer

Smoking

a-80

b-30

110

c-20

d-70

90

None
80x70=5600
30x20=600
9.3=5600/600

No lung cancer

Or 80/
20÷30/
70=9.3

Total
B
asic Structure of cohort study
Diseased

Disease-free

The Relative Risk is calculated for exposure
Develop
)Disease (a

Sample

E
xposed
to factor

Develop
)Disease (c

-Disease
free
Unexposed
to factor

P
resent time
Starting point

Disease-free
)b(

F
ollow

Disease-free
)d(
Future tim
e

Comparing the incidence of disease in each group

P
opulation
)Relative risk (RR
Mammography

Breast cancer

No breast cancer

Total

Positive

a-10

b-90

100

Negative

c-20

d-998980

100,100

In Cohort design
)RR= a/
(a+b)÷c/
(c+d
500 =0.1/0.0002=(100,100)20÷ (100)/10
Coh
ort
stu
dy

)T relative risk (RR
he
L
ung cancer

Smokers
Non

18
6

No lung
cancer
582
1194

Risk for smokers=18/600=0.03
Risk for non-smokers=6/1200=0.005
RR=0.03/0.005=6

T
otal
600
1200
Cas
ec
ont
rol
stu
dy

)T Odds ratio (OR
he
L
ung cancer

Smokers
Non

80
20

No lung
cancer
30
70

Odds for smokers=80/30=2.67
Odds for non-smokers=20/70=0.29
OR=80* 70/30* 20=9.33

T
otal
110
90
Assignment I
(.Table 1 Basic characteristics for the patients examined (N=278

Baseline characteristics 1996
)%(Men- 1
)%(Insulin users- 2
)%(Smokers- 3
)%(Ex-smokers- 4
)%(Non-smokers- 5
(Age in years (mean ±SD- 6
(Systolic Blood pressure at starting point mmHg (mean ±SD- 7
(Systolic blood pressure two years mm Hg (mean ±SD- 8
(Duration of diabetes (median/Quartiles 1st -3rd- 9
Missed values- 10

(Total (N=278
52.2
25.5
23.0
28.1
48.9
±11.74 67.24
±22.00 151.20
±29.1 153.83
)2.75-12.25( 6.0
--
2a

Smoking histroy (all subjects)
60

50
49
40

30
28
23

Percent

20

10

0
never

smoking history

stopped smoking

yes
2b

Smoking history by sex
100

80

83

60

44

40

38

Percent

SEX
20
18

male
11

0
never

smoking history

stopped smoking

7
yes

female
3a

Age using Bar (mean used as summary)
70

69

68

Mean age (years)

67

66

65

64
male

SEX

female
Boxplot age by Sex

3b
120

100

80

60

age (years)

40

20

195

0
N=

This graph gives check for
Data distribution and checking
SEX
for outliers

145

133

male

female
Height of the included subjects
4a

Median=170.55 cm
50

40

30

20

Std. Dev = 8.89
10

Mean = 170.5
N = 278.00
0

5
7.
19 .0
5
19 .5
2
19 .0
0
19 .5
7
18 .0
5
18 .5
2
18 .0
0
18 .5
7
17 .0
5
17 .5
2
17 .0
0
17 .5
7
16 .0
5
16 .5
2
16 .0
0
16 .5
7
15 .0
5
15 .5
2
15 .0
0
15

height (cm)
Duration of diabetes

4b
80

Median=6.0 years
70
60
50
40
30
20
Std. Dev = 6.96
10

Mean = 7.9

0

N = 278.00
0.0

5.0
2.5

10.0
7.5

15.0
12.5

duration of diabetes

20.0
17.5

25.0
22.5

30.0
27.5

32.5
syst. blood pressure at sta rt

Valid

-5a

Using F
requency table: P
95≈189-190

100
110
112
115
116
120
121
122
124
125
127
130
131
132
134
135
136
137
139
140
141
144
145
147
148
150
151
151
152
153
155
158
160
161
162
163
164
165
167
168
170
171
172
175
176
177
178
179
180
182
184
185
187
189
190
194
195
200
205
209
210
216
220
Total

Frequency
1
1
2
1
2
21
2
1
1
6
1
16
1
2
1
11
1
2
1
28
2
4
12
1
1
31
1
23
1
1
2
1
21
1
1
1
1
5
1
2
14
1
2
4
1
1
1
2
14
2
1
1
1
1
6
1
1
2
1
1
3
1
1
278

Percent
.4
.4
.7
.4
.7
7.6
.7
.4
.4
2.2
.4
5.8
.4
.7
.4
4.0
.4
.7
.4
10.1
.7
1.4
4.3
.4
.4
11.2
.4
8.3
.4
.4
.7
.4
7.6
.4
.4
.4
.4
1.8
.4
.7
5.0
.4
.7
1.4
.4
.4
.4
.7
5.0
.7
.4
.4
.4
.4
2.2
.4
.4
.7
.4
.4
1.1
.4
.4
100.0

Valid Percent
.4
.4
.7
.4
.7
7.6
.7
.4
.4
2.2
.4
5.8
.4
.7
.4
4.0
.4
.7
.4
10.1
.7
1.4
4.3
.4
.4
11.2
.4
8.3
.4
.4
.7
.4
7.6
.4
.4
.4
.4
1.8
.4
.7
5.0
.4
.7
1.4
.4
.4
.4
.7
5.0
.7
.4
.4
.4
.4
2.2
.4
.4
.7
.4
.4
1.1
.4
.4
100.0

Cumulative
Percent
.4
.7
1.4
1.8
2.5
10.1
10.8
11.2
11.5
13.7
14.0
19.8
20.1
20.9
21.2
25.2
25.5
26.3
26.6
36.7
37.4
38.8
43.2
43.5
43.9
55.0
55.4
63.7
64.0
64.4
65.1
65.5
73.0
73.4
73.7
74.1
74.5
76.3
76.6
77.3
82.4
82.7
83.5
84.9
85.3
85.6
86.0
86.7
91.7
92.4
92.8
93.2
93.5
93.9
96.0
96.4
96.8
97.5
97.8
98.2
99.3
99.6
100.0
(p95, p5= M
ean± Z score (probability) at the specified percentiles * (Standard deviation

Probability distribution of the normal curve: page 180

-/-52
P95 SB
P1= 151.2+1.645(22.0)=187.4 mmH
g
5b-1
P5 for duration of diabetes

duration of diabetes

Valid

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
25
26
27
28
31
32
Total

Frequency
12
35
22
21
24
20
23
19
6
6
6
13
2
7
6
5
11
8
6
5
3
5
2
2
3
1
1
2
1
1
278

Percent
4.3
12.6
7.9
7.6
8.6
7.2
8.3
6.8
2.2
2.2
2.2
4.7
.7
2.5
2.2
1.8
4.0
2.9
2.2
1.8
1.1
1.8
.7
.7
1.1
.4
.4
.7
.4
.4
100.0

Valid Percent
4.3
12.6
7.9
7.6
8.6
7.2
8.3
6.8
2.2
2.2
2.2
4.7
.7
2.5
2.2
1.8
4.0
2.9
2.2
1.8
1.1
1.8
.7
.7
1.1
.4
.4
.7
.4
.4
100.0

Cumulative
Percent
4.3
16.9
24.8
32.4
41.0
48.2
56.5
63.3
65.5
67.6
69.8
74.5
75.2
77.7
79.9
81.7
85.6
88.5
90.6
92.4
93.5
95.3
96.0
96.8
97.8
98.2
98.6
99.3
99.6
100.0
:Or using the formula
M
ean-Z score (1.645)* SD =-3.6 years
Total population n=287, μ=67.24 years

σ11.743
-

+
Sample no.

Mean

1

67.6

12.07

2

67.13

11.81

3

67

11.98

4

67.8

11.63

5

66.33

11.44

6

67.44

11.95

7

67.84

12.42

8

66.59

11.36

9

67

12

10

66.38

11.9

11

68.06

12.06

12

67.61

11.02

13

67.31

11.33

14

66.44

11.91

15

66.87

11.26

16

66.8

11.5

17

66.73

12.37

18

66.38

11.77

19

67.03

11.22

20

66.58

12.13

21

66.81

11.55

22

66.58

12.21

23

67.2

11.61

24

66.48

11.48

25

67.53

12.1

26

67.58

10.6

27

67

11.91

28

67.31

11.59

Age in years

SD

28 samples of 150 from
a total population of 287

26

27

28 80

1

2

3

60

25
24

4
5

40

6

23

20

7

Sample no.

22

0

8

Mean

9

SD

21
20
19

10
11
18

17

16

15

14

13

12
Population and Sample
o In scientific research we want to make a statement
(conclusion) about the population.
o Studying the whole population is impossible in terms
of money/time/labor.
o Random sampling from the population and infer from
the sample data the needed conclusions.
o The task of statistics is to quantify the uncertainty
(the sample is really representing that population).
The concept of sampling
Study population: You select a few sampling units
Sam
pling units
from the study population

You make an estimate
“prediction” extrapolated to the
study population
(prevalence, outcomes etc.)

Sample

You collect information
from these people to
find answers to your
research questions.
What would be the mean systolic blood pressure
?of older subjects (65+) in Al Hassa
175

P
opulation mean ( μ)= unknown

165

180
155

F
rom our sample we calculate an estimate of the population parameter
T good sample (the
he
(estimator
: Should be
:Unbiased
The mean of sample = population mean
)Precise: (narrow dispersion about the mean
The dispersion in repeated samples is small
This is a dream
Sampling error
F
our individuals A, B C, D
,
A = 18 years
B 20 years
=
C= 23 years
D= 25 years
T
heir mean age is = 18+20+23+ 25=
86/ 21.5 years (population mean μ).
4=
P
robability of sampling two individuals: (6 probabilities)
A+B
=18+20= 38/
2=19.0 years
A+C= 18+23=20.5 years.
Sampling error= population mean-sample mean
A+D=18+25=21.5 years.
= ranges from -2.5 to +2.5 years.
B
+C=20+23=21.5 years.
B
+D=20+25=22.5 years.
C+D=23+25=24.0 years.

P
robability of sampling three individuals: (4 probabilities)
A+B
+C=18+20+23=20.33 years. E
rror = ranges from -1.17 to +1.7 years.
A+B
+D=18+20+25=21.00 years.
A+C+D=18+23+25=22.00 years.
B
+C+D=20+23+25=22.67 years.

If C=32 (instead of 23) years and D=40 (instead of 25) years:
sampling of 2= sampling error of -7.00 to +7.00 and in 3= -3.67 to
+3.67 years.
T greater the variability of a given variable the larger the sampling
he
error for a given sample size.
)Infinite samples should represents the population it came from (good estimator
2
o T normal distribution
he
o T Standard error of the mean
he
o E
stimation:
Reference interval Confidence intervals F mean
or
proportion
Difference between
means/
proportions
RR and OR
Norm Distribution:
al
M
any human traits, such as intelligence, personality, and attitudes, also, the
weight and height, are distributed among the populations in a fairly normal
way.

56

١٤٣٥/٠٢/٦
T normal distribution
he

(within between μ ±1 SD (σ ±68%

(within between μ ±2 SD (σ ±95%
SDs Definite outliers 3<

2SDs Possible outliers<
One more
T Z score which measures how many standard
he
deviations a particular data point is above or
below the mean.
oUnusual observations would have a Z score over
2 or under 2 SD.
oE
xtreme observations would have Z scores over 3
or under 3 SD and should be investigated as
potential outliers.

Z = X1 − χ

s
.Areas under the standard normal curve
Z

±0.1
±0.2
±0.3
±0.4
±0.5
±0.6
±0.7
±0.8
±0.9
±1
±1.1
±1.2
±1.3
±1.4
±1.5
±1.6
±1.645
±1.7
±1.8
±1.9
1.96
±2
±2.1
±2.2
±2.3
±2.4
±2.578

Area under curve
between both points
((around the mean
0.080
0.159
0.236
0.311
0.383
0.451
0.516
0.576
0.632
0.683
0.729
0.770
0.806
0.838
0.866
0.890
0.900
0.911
0.928
0.943
0.950
0.954
0.964
0.972
0.979
0.984
0.99

B
eyond both
points
(two tails)

B
eyond one point
(one tail)

0.920
0.841
0.764
0.689
0.617
0.549
0.484
0.424
0.368
0.317
0.271
0.230
0.194
0.162
0.134
0.110
0.100
0.089
0.072
0.057
0.050
0.046
0.036
0.028
0.021
0.010
0.004

0.4600
0.4205
0.3820
0.3445
0.3085
0.2745
0.2420
0.2120
0.1840
0.1585
0.1355
0.1150
0.0970
0.0810
0.0670
0.0550
0.0500
0.0445
0.0360
0.0290
0.0250
0.0230
0.0180
0.0140
0.0105
0.0100
0.0020
Calculating values from Z-scores

(.Xi = Mean± Z (standard deviation
(Value (percentiles) =M
ean± Z score* (SD
Random sample for estimating a population
mean
X1=128
?μ

X2=133
X3=129

F
rom the information in the sample, we will estimate the
unknown
(population mean (X is an estimator for μ
?W
hat could have happened if we had another random sample
?W
hat is the measure of variation of sample means
T Sampling Distribution of a Sample Statistics
he

≈ L
et’s assume that we want to survey a
community of 400, the age of them were
recorded and having the following parameters:
µ = 35 years
σ = 13 years

≈

L
et’s assume, however, that we do not survey all 400,
instead we randomly select 120 people and ask them
about their ages and calculate the mean age.

≈

T
hen, we put them back into the community and randomly select
another 120 residents (may include members of the first sample).
W did this over and over and each time we calculate the mean
e
age.
T results will be like those in the following table.
he

≈
≈
Sample Number
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
SD of the means

Sample mean
34.7
35.9
35.5
34.7
34.5
34.4
35.7
34.6
37.4
35.3
34.1
35.5
34.9
36.2
35.6
35.0
35.1
36.4
35.6
33.6
13.37

Distribution of 20 random sample means
((n=20
μ

..…
..… . .…
.

.
33

34

35

36

..

.
37

All the results are clustered around the
population value (35 years), with a few scores
a bit further out and one extreme score of 37.4
(.years (random variation=1/
20=5%

,T
hose 400 people have age range from 2 to 69 years
while the means of the samples have a very
narrow range of value of about 4 years and 10
(.samples coincide with the population mean (35 years
M of the samples will cluster around the population
ost
parameters with occasional sample result falling
relatively further to one side or the other of the
distribution (this called the sam
pling distribution of
(.sam m
ple eans

:H the following properties
as
T mean of the sampling distribution is equal to the
he
population mean, the average of the averages (µχ)
will be the same as the population mean.
T standard deviation of the sample means = the
he
standard error SE σ/ n, (σ= population SD).
= √
T distribution of the sample means is Normal if the
he
population distribution is Normal.
If the population distribution is Not Normal, T
he
distribution of the sample means is almost Normal
when n is large (Central L
imit T
heorem).
Standard error of the mean

P
opulation
P
arameters
M
ean
S.D

Sample mean

Sample
M
ean
S.D

The degree the sample statistics are deviating /different
.from the population parameters

T term error indicates the fact that due to sampling error,
he
each sample mean is likely to deviate some what from true
population mean.
Central L
imit T
heorem
.T formula for SE= SD/
he
√n
T formula indicates that we are estimating the SE given
he
.the S.D of a sample of size n
.For a sam of 100 a S.D of 40 the SE= 40 /
ple
nd
√100 = 4
.For a sam of 1000 and S.D of 40 the SE= 40 /√1000 = 1.26
ple
T factors influence the SE sample size and S.D of the
wo
,
:sample
. Sample size has greater impact as it is used a denominator
.For a sam of 100 a S.D of 20 the SE = 20 /√100 = 2
ple
nd
.For a sam of 100 a S.D of 40 the SE = 40 /√100 = 4
ple
nd
If there is more variability within a sample the greater the
.SE
(Confidence Interval (CI
A confidence interval gives an estimated
range of values which is likely to include
an unknown population parameter, the
estimated range being calculated from a
given set of sample data.
W need to know the smallest and the largest μ (range) we think is likely
e
using sample statistics.
T mean of sample = μ
he
c= level of
confidence

Z c= Z critical
values (under
( normal curve

90%
95%
99%

1.645
1.960
2.578

σ 
χ±Ζ 

c
 n
(C.I= Mean of the sample ±Z critical scores (SEM
SEM= SD/√n
C.I
• The confidence interval provides a range that
is highly likely (often 95% or 99%) to contain
the true population parameter that is being
estimated.
• The narrower the interval the more informative
is the result.
• It is usually calculated using the estimate
(sample mean) and its standard error (SEM).
CI for μ
Systolic blood pressure in 287 diabetic patients
Descriptives
syst. blood
pressure at start

syst. blood
pressure at start

Mean
90% Confidence
Interval for Mean

Lower Bound
Upper Bound

5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Descriptives
Interquartile Range
Skewness
Mean
Kurtosis
90% Confidence
Lower Bound
Interval for Mean
Upper Bound
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis

Statistic
151.20
149.02

Std. Error
1.319

153.38

(C.I= 151.20±1.65(21.997/ 287 90%
√
C.I=149.02-153.38 mmH
g

150.30
150.00
483.880
21.997
100
220
120
30.00
Statistic
.540
155.06
.152
149.92

Std. Error
.146
3.064
.291

160.20
154.72
151.20
460.033
21.448
115
205
90
30.00
.263
-.506

Random sample of 50 out of 287

.340
.668
Descriptives
syst. blood
pressure at start

Mean
95% Confidence
Interval for Mean

Lower Bound
Upper Bound

5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis

Statistic
151.20
148.60

Std. Error
1.319

153.80
150.30
150.00
483.880
21.997
100
220
120
30.00
.540
.152

(C.I=151.20±1.96(21.997/ 287 95%
√
C.I=148.60-153.80 mmH
g
.146
.291

Descriptives
syst. blood
pressure at start

Mean
95% Confidence
Interval for Mean
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis

Lower Bound
Upper Bound

Statistic
155.06
148.90

Std. Error
3.064

Random Sample of 50 out of 287

161.22
154.72
151.20
460.033
21.448
115
205
90
30.00
.263
-.506

.340
.668
Descriptives
syst. blood
pressure at start

Mean
99% Confidence
Interval for Mean

Lower Bound
Upper Bound

5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Descriptives
Skewness
Kurtosis
syst. blood
pressure at start

Mean
99% Confidence
Interval for Mean
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis

Lower Bound
Upper Bound

Statistic
151.20
147.78
154.62
150.30
150.00
483.880
21.997
100
220
120
30.00
.540
.152
Statistic
155.06
146.84

Std. Error
1.319

99%
(C.I=151.20±2.58(21.997/ 287
√
C.I=147.78-154.62 mmH
g

.146
.291
Std. Error
3.064

163.28
154.72
151.20
460.033
21.448
115
205
90
30.00
.263
-.506

Random sample of 50 out of 287

.340
.668
(C.I= 151.20±1.65(21.997/ 287 90%
√
C.I=149.02-153.38 mmH
g
(C.I=151.20±1.96(21.997/ 287 95%
√
C.I=148.60-153.80 mmH
g
99%
(C.I=151.20±2.58(21.997/ 287
√
C.I=147.78-154.62 mmH
g
W
hat does this mean? It means that if the same
population is sampled on numerous occasions and
interval estimates are made on each occasion, the
resulting intervals would bracket the true population
parameter (ranged) in approximately 90, 95 and 99 %
. of the cases
T sample distribution of a proportion
he

µp =π
SE ( p ) =

p (1 − p )
n

p =K / n

CI p = p ±1.96( SE )
Z critical score equal 95%
Smokers among diabetics
Sample=400
Smokers=40
P=40/400=0.1
SE (p) = √0.1-0.9/400=0.015
CI p 95%= 0.1±1.96(0.015)
[0.07-0.13]
for % it is the same SE=1.5% C.I=[7-13]
CI for the difference between two 95%
(means (μ1-μ2
Smoke
No
Yes
Difference

n

Mean SBP

(SE (mean

214
64

153.1
144.8
8.3

1.50
2.62

χ1 − χ 2 ± 1.96 * SE ( χ1 − χ 2 )
SE = ( SE ( χ1 )) 2 + ( SE ( χ 2 )) 2

C.I= 2.4 to 14.2
CI for percentage 95%
(Smoke (n

died%

SE

(No (212

28.8

3.11

(Yes (64

23.4

5.30

Pns − Ps ± 1.96 * SE ( Pns − Ps )

Difference= 5.4%

 P1 × (100 − p1) P2 × (100 − p 2) 
SE = 
+

n1
n2


C.I=-6.7% to 17.4% 95%
CI for RR and OR 95%
Use available software

http:/www.medcalc.org/
/
calc/
odds_ratio.php
http:/www.medcalc.org/
/
calc/
relative_risk.php
vl.academicdirect.org/applied_statistics/.../CIcalculator.xls
Assignment II
Inferential Statistics
Testing in research
o In scientific research we would like to test if our
research ideas are true.
o Based on previous observations (studies) we know
that the mean cholesterol of patients with diabetes is
higher than those without the disease.
o We will take samples and check whether the results
will agree with our expectations.
o Meaning we are going to test the situation using a
statistical test.
The Z-test for one sample
(Serum cholesterol (μ=5 mmol/
L

Diabetic patients, mean cholesterol > 5

σ=±1.5

?Considering σ=±1.5

Is there any difference between diabetes free population and the diabetic patients
. regarding serum cholesterol? Let’ s perform Z test
(Research question (hypothesis
T research hypothesis would be
he
The mean cholesterol of diabetics is > 5mmol/L

Null hypothesis
H0: μ=sample mean=5

Alternative hypothesis
(H1: μ >5 (one sided
Or
(H1: μ≠5 (two sided
P
rocedure

μ=5

Mean of sample

Cholesterol level diabetic patients in mmol/L
60

If the sample mean close to the population mean
The null hypothesis is TRUE

50

40

If the sample mean differs from population mean
We REJECT the null

30

20

Std. Dev = 1.33

10

Mean = 6.25
N = 278.00

0

0
.0
13
0
.0
12
0
.0
11
0
.0
10

00
9.

00
8.

00
7.

00
6.

00
5.

00
4.

00
3.

total cholesterol
T ά level (P
he
(value
T probability to obtain /
he
achieve the null
hypothesis
T probability that P
he
opulation mean=sample
mean
T
here no difference between the population and
.sample mean
Or
The maximum probability we accept to reject the null
hypothesis falsely
ά = 0.05
(P > 0.05 (ά
Accept the null
Sample mean= population
mean

(P ≤ 0.05 (ά
Reject the null
Sample mean≠
population mean

Alpha level
(Calculation (σ=1.5
SE =μ/ n=0.3
M √
Z=(mean sample-μ)/
σ
P (mean of the sample≥6)=P ≥6-5)/
(Z
0.3= 0.0005
Under the normal curve area of rejection >1.96 Z

: P=0.0005
T cholesterol blood level of diabetic patients can coincide
he
with the population (disease free) 5 in 10,000 times
T two values could be the same in 5 times if we repeated this test 10,000 tim
he
P < 0.05 so we reject the null
T diabetics have larger mean cholesterol level than the normal population
he
In reality
It is unlikely that the σ (population SD) is
known.
In most of the cases, σ will be unknown and
we will be able to apply neither the formula
nor the table of normal distribution (areas
under the curve=Z score).
We resort to other statistical tests.
P
ossible situations in testing
Possible situations in Hypothesis testing
Level of significance

Reality

Decision

Reject H0
(Type I error (ά

H0 is true
H0 is not true

Do not reject H0
(OK (1-ά

)OK (1-В

)Type II error (В

В= Power-1
It is the probability to reject the null hypothesis if is NOT T
RUE
Usually 80% is the least required for any test
Errors of Hypothesis Testing and Power
Conclusion from hypothesis testing

Decisions and errors in hypothesis testing

True Situation
(Difference exist (H )
1

No difference (H
0

Study results

Correct decision
Difference exist
Reject H
0

No difference
Do not reject H
0

(power or 1-β )

T
ype II or β error
F
alse acceptance
T
here is no difference
when it is really
.present

T
ype I error or ά
Rejection when it is true
F
alse rejection
T
here is a difference
when it is really not
Correct decision
P
assive smoking and lung cancer
T
ruth about the population

Conclusions,
based on results
from a study of a
sample of the
population

Reject the null
hypothesis (rates in
the study appear to
(be different
Accept the null
hypothesis (rates in
the study appear
(similar

P
assive smoking
is
related to lung
.cancer

Not related to
.lung cancer
T
ype I E
rror
Incorrect rejection
P
assive sm
oking is related to
lung cancer when it is really
not..

T
ype II E
rror
Incorrect acceptance
P
assive sm
oking is not
related to lung cancer when
it is reallydoes.
The Alpha-Fetoprotein (AFP) test has both Type I and Type II error
. possibilities
This test screens the mother’ s blood during pregnancy for AFP and
. determines risk
.
Abnormally high or low levels may indicate Down syndrome
Ha: patient is unhealthy

H0: patient is healthy

Error Type I (False positive or False Rejection) is: Test wrongly indicates
that patient has a Down syndrome, which means that pregnancy must be
.aborted for no reason
Error Type II (False negative or False Acceptance) is: Test is negative and
the child will be born with multiple anomalies
HypotHesis test
This is the distribution
given the null
hypothesis is true
type i and type ii
error

False acceptance
False rejection
one sample
The distribution of X under the null and
alternative hypotheses.
t-distribution
In real life situations we
will estimate the
unknown population SD
. using Sample SD
Results are standardized to
:the t-distribution
Z test for normal distribution
The population SD is known

χ −µ
t=
s
n
Z=

χ −µ
σ

n
t-distribution

df=No. of observations (sample size)-1

Heavier tails than the Z distribution
(Degree of freedom (df

For all sample statistics: variance, SD, we used
n-1
All the observations in any given sample are free
.except one= Complementary effect
Degree of freedom
total =50
12

restricted

16
7

df = n-1

15
t-distribution
t-test-steps to determine the statistical difference
W
hen? descriptive statistics: mean ± standard deviation

Number of
samples

One sample
vs. population mean

t = χ − µ / SD

n

T independent
wo
samples
2
SD12
SD2
χ1 − χ 2 /
+
n1
n2

T dependent (two
paired):
Repeated
measures
tMatched pairs
d−

dependent =
SE ( d −)

Steps:
1- State the hypothesis to be tested: Null (non-directional-two tailed)
mean= mean
Alternative (unidirectional-one tail)
mean ≠ mean
2- F the calculated t value: using the formulae.
ind
3- F the degree of freedom: all = n-1 (two sample independent df=n1-1+n2-1
ind
(n1+n2-2).
4- F the P value using the tables of t-distribution.
ind
5- Conclude: if < 0.05 = rejection. If > 0.05 the null is accepted.
t-test (student’s t-test) one sample
t = χ − µ / SD

n

?Using diabetes data: Is the mean age of diabetics > 65 years
H0:μ=65
H1:μ≠65
t one sample =67.24-65/SD/√n=3.18
t distribution P=0.002
Reject the null
Diabetics are significantly older than 65 years

Statistics
age (years(
N
Mean
Std. Error of Mean
Std. Deviation
Variance

Valid
Missing

278
0
67.24
.704
11.743
137.902
(P value (two sided

One-Sample Test
Test Value = 65

age (years(

t
3.182

df
277

Sig. (2-tailed(
.002

Mean
Difference
2.24

95% Confidence
Interval of the
Difference
Lower
Upper
.85
3.63

Degree of freedom

Assuming that the distribution of age is normal
( Population SD is unknown (σ
t-test for comparison of means of two
independent samples
H0: Smoking has no effect on systolic blood pressure
Mean S= Mean NS or Mean S-mean NS=0
H1: smoking has an effect
Mean S≠ Mean NS or Mean S-Mean NS≠0
:Assumptions
•Independent observations (2 samples)
•Normally distributed
•Equal variances (for the pooled t-test)
T
hree formulae
Expected difference if H0 is true
Standardized

t =

χ −χ −0
1
2
2
S12
S2
+
n1
n2

If SDs are equal

t=

χ1 − χ 2
2
Sp

n1

+

SD of the difference

t=

2
Sp

n2

2
(n1 − 1) S12 + (n2 − 1) S 2
S =
(n1 − 1) + (n2 − 1)
2
p

If SDs are not equal

χ1 − χ 2
2
1

2
2

S
S
+
n1 n2

Pooled SD
Decision based on L
evene’s test
Variances are apparently equal
Group Statistics

syst. blood
pressure at start

SMOKING
no
smokers

N

Mean
153.11
144.82

214
64

Std. Deviation
21.995
20.934

Std. Error
Mean
1.504
2.617

Independent Samples Test
Levene's Test for
Equality of Variances

F
syst. blood
pressure at start

Equal variances
assumed
Equal variances
not assumed

Sig.
.006

.936

t-test for Equality of Means

t

Sig. (2-tailed(

df

Mean
Difference

Std. Error
Difference

95% Confidence
Interval of the
Difference
Lower
Upper

2.674

276

.008

8.29

3.100

2.188

14.392

2.747

107.982

.007

8.29

3.018

2.308

14.272

Two separate t-test

Not significant it means equal variances

P value <0.05, reject H0
Paired t-test
If we have paired data (two repeated
measurements on the same subjects) or before
and after
If the difference of the paired observations are
Normally distributed.
(P
aired samples (dependent
•
•

(P
aired /dependent 2-sample t-test)

To compare observations collected form the same group of individuals on 2
separate occasions (dependent observations or paired samples).
T paired t statistics is calculated by:
he

- Calculate the difference between the 2 measurements taken on
each individual.
md
- Calculate the mean of the differences.
- Calculate the SE of the observed differences. SE d
- Under the null hypothesis of no difference or difference = 0, the
paired t statistic takes the form.
md - 0
- t= Mean difference / SE of the difference.
t=

SEd

- It has a normal distribution with degrees of freedom = (n-1)
E
xample

F
our students had the following scores in 2 subsequent tests.
Is there a significant difference in their performance?
Number

Name

T 1
est

T 2 Dif
est

1

Mike

35%

32-

67%

2

Melanie

50%

4

46%

3

Melissa

90%

4

86%

4

Mitchell

78%

13-

91%

S D Dif = 17.152, SE Dif = 8.58Mean Dif = -9.25,
Calculated Paired t = -9.25/8.58 = -1.078,
df=n-1 = 3

md - 0
t=
SEd
df

P value0.01

Level of significance for one-tail test
0.05

0.02

0.01

0.005

Level of significance for two-tail test
0.20

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
35
50
∞

0.10

0.05

0.02

0.01

3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372
1.363
1.356
1.350
1.345
1.341
1.340
1.333
1.330
1.328
1.325
1.323
1.306
1.299
1.282

6.314
2.920
2.353
2.132
2.015
1.943
1.895
1.860
1.833
1.812
1.796
1.782
1.771
1.761
1.753
1.746
1.740
1.734
1.729
1.725
1.721
1.690
1.676
1.645

12.706
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
2.201
2.179
2.160
2.145
2.131
2.120
2.110
2.101
2.093
2.086
2.080
2.030
2.009
1.960

31.821
6.965
4.541
3.747
3.365
3.143
2.998
2.896
2.821
2.764
2.718
2.681
2.650
2.624
2.602
2.583
2.567
2.552
2.539
2.528
2.518
2.438
2.403
2.326

63.657
9.925
5.841
4.604
4.032
3.707
3.499
3.355
3.250
3.169
3.106
3.055
3.012
2.977
2.947
2.921
2.898
2.878
2.861
2.845
2.831
2.724
2.678
2.576

T P value = 0.20, the null is accepted!
he
Conclusion
T observed difference can be
he
encountered in 36 (actual P
value =0.362 out of 100 cases.
i.e. we accept the null hypothesis
of no difference between first
and 2nd test.
Paired Samples Statistics
Mean
Pair
1

syst. blood pressure
at start
syst. blood pressure
after 2 years

Std. Deviation

N

Std. Error
Mean

151.20

278

21.997

1.319

153.83

278

29.076

1.744

Paired Samples Test
Paired Differences

Mean
Pair
1

syst. blood pressure
at start - syst. blood
pressure after 2 years

-2.63

Std. Deviation

Std. Error
Mean

17.920

1.075

95% Confidence
Interval of the
Difference
Lower
Upper
-4.74

-.51

t
-2.443

df

Sig. (2-tailed(
277

.015
T of significance
est
Interval/
ratio data
P
arametric assuming normal distribution

(Known Population Variance (σ
One sample Z-test
Z test, rejection limit >
±1.96

χ−µ
Z= σ
n

One sample vs. population
One sample t-test

Unknown Population Variance

t-test

Reject if P ≤ 0.05

Number of samples

T samples
wo
Dependent
t-paired test

Independent
t-test independent
The Chi-Square test χ

2

Used for hypothesis testing for categorical
variables
M
any types depends on design, distribution
of variables and objectives of testing
χ

2

:E
xample
Vaccination against Influenza deceases the risk
.to get the disease
:Study
Compare the effectiveness of 5 vaccines with
.respect to the probability to get influenza
(Comparison will be in respect to a nominal variable (getting influenza: yes or no
Effectiveness of Five Vaccines
Data cross tabulated 2X5: response variable: Influenza
Frequency

within Vaccines%

Vaccines

Influenz
a No

Influenz
a Yes

T
otal

Vaccines

Influenz
a No

Influenz
a Yes

T
otal

1
2
3
4
5

237
198
245
212
233

43
52
25
48
57

280
250
270
260
290

1
2
3
4
5

84.6
79.2
90.7
81.5
80.3

15.4
20.8
9.3
18.5
19.7

100
100
100
100
100

T
otal

1125

225

1350

T
otal

83.3

16.7

100

T probability to get influenza
he

he null hypothesis states that the probability to get influenza is independent of the vaccin
T alternative states that a dependency exists
he
Effectiveness of Five Vaccines
:If H0 is true
=The probability to influenza in every group should be the same
, the probability in the total population
(Equal to: 225/1350=0.167 (16.7%
, Vaccine 1 used in 280, if H0 is true
.we expect that 16.7% (≈47) to get influenza
However this is not true
Expected frequencies
F any cell: E
or
xpected F
requency= Row total* column total/grand total
Vaccines
Observed-1
E
xpected
Observed-2
E
xpected
Observed-3
E
xpected
Observed-4
E
xpected
Observed-5
E
xpected
T
otal

Influenz
a No

Influenza
Yes

T
otal

237
233.3
198
208.3
245
225.0
212
216.7
233
241.7

43
46.7
52
41.7
25
45.0
48
43.3
57
48.3

280

1125

225

1350

Column total

250

Row total
280X225/1350

270
260
1125/1350*260
290
Grand total
Pearson Chi-square test
.Calculate the expected frequencies (assuming H0 is true) for all the ten cells
Calculate Chi square: Of = observed frequency
Ef = Expected frequency

χ =∑
2

(O f − E f )

2

Ef

Reject H0 if χ2 is large
Use the Chi-square distribution
(After determining the degree of freedom (df
(df= (r-1)* (c-1
Chi-square distribution
Critical values for Chi-square
df

Level of Significance
0.99

1
2
3
4
5
.
.
30

0.90

0.70

0.50

0.30

0.20

0.10

0.05

0.01

0.001

0.00016
0.0201
0.115
0.297
0.554

0.0158
0.211
0.584
1.064
1.610

0.148
0.713
1.424
2.195
3.000

0.455
1.386
2.366
3.357
4.351

1.074
2.408
3.665
4.878
6.064

1.642
3.219
4.642
5.989
7.289

2.706
4.605
6.251
7.779
9.236

3.841
5.991
7.815
9.488
11.070

6.635
9.210
11.341
13.277
15.086

10.827
13.815
16.268
18.465
20.517

14.953

20.599

25.508

29.336

33.530

36.250

40.256

43.773

50.892

59.703

χ2critical= 9.488
Calculated=16.555
df=(2-1)(5-1)=4
P=0.002

There is a relation (dependence) between type of vaccine and influenza prevention
SMOKING * SEX Crosstabulation
SEX
male
SMOKING

no
smokers

Total

90
42.1%
55
85.9%
145
52.2%

female
124
57.9%
9
14.1%
133
47.8%

Total
214
100.0%
64
100.0%
278
100.0%

Exact Sig.
(2-sided(

Exact Sig.
(1-sided(

.000

Count
% within SMOKING
Count
% within SMOKING
Count
% within SMOKING

.000

Chi-Square Tests

Pearson Chi-Square
Continuity Correctiona
Likelihood Ratio
Fisher's Exact Test
Linear-by-Linear
Association
N of Valid Cases

Value
38.017b
36.279
41.649
37.880

df
1
1
1
1

Asymp. Sig.
(2-sided(
.000
.000
.000
.000

278

a. Computed only for a 2x2 table
b. 0 cells (.0%( have expected count less than 5. The minimum expected count is
30.62.

At least 80% of cells must have Ef >5
We can’ t use Pearson Chi-square if
the expected frequency is <5
In this case we use Fisher’ s Exact test
status * SEX Crosstabulation
Count
SEX
male
status alive
died from CVD
other cause of death
Total

24
4
2
30

female
15
1
2
18

Total
39
5
4
48

(Expected f=4*30/48=2.5 (<5

Fisher Exact test provides correction

(E f=5*18/48=1.875 (<5
Chi-Square Tests

Pearson Chi-Square
Likelihood Ratio
Linear-by-Linear
Association
N of Valid Cases

Value
.935a
.991
.004

2
2

Asymp. Sig.
(2-sided(
.626
.609

1

.951

df

48

a. 4 cells (66.7%( have expected count less than 5. The
minimum expected count is 1.50.

Chi-square is not valid
Chi-Square Tests

Pearson Chi-Square
Continuity Correctiona
Likelihood Ratio
Fisher's Exact Test
Linear-by-Linear
Association
N of Valid Cases

37.880

df
1
1
1
1

Exact Sig.
(2-sided(

Exact Sig.
(1-sided(

.000

Value
38.017b
36.279
41.649

Asymp. Sig.
(2-sided(
.000
.000
.000

.000

.000

278

a. Computed only for a 2x2 table
b. 0 cells (.0%( have expected count less than 5. The minimum expected count is
30.62.
McNemar test

Paired data in a cross tabulation
(eczematous persons on both arms use ointment A or B (randomized 54

Ointment B
No+

Total

Ointment A
+
No

10
5

16
23

26
28

Total

15

39

54

M
cNemar test only take the discordant pairs into account

Χ2=(23-10)2/23+10
df=1
Questions
Thank you

More Related Content

What's hot

What's hot (20)

Study designs
Study designsStudy designs
Study designs
 
Biostatistics lec 1
Biostatistics lec 1Biostatistics lec 1
Biostatistics lec 1
 
HLinc presentation: levels of evidence
HLinc presentation:  levels of evidenceHLinc presentation:  levels of evidence
HLinc presentation: levels of evidence
 
Health research methods
Health research methodsHealth research methods
Health research methods
 
Introduction to research methodology
Introduction to research methodologyIntroduction to research methodology
Introduction to research methodology
 
Sample size calculation
Sample  size calculationSample  size calculation
Sample size calculation
 
Public health surveillance
Public health surveillancePublic health surveillance
Public health surveillance
 
Health system research
Health system researchHealth system research
Health system research
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
05 intervention studies
05 intervention studies05 intervention studies
05 intervention studies
 
Systematic Review & Meta-Analysis Course - Summary Slides
Systematic Review & Meta-Analysis Course - Summary SlidesSystematic Review & Meta-Analysis Course - Summary Slides
Systematic Review & Meta-Analysis Course - Summary Slides
 
Bias and confounding
Bias and confounding Bias and confounding
Bias and confounding
 
Association and Causation
Association and CausationAssociation and Causation
Association and Causation
 
Measuring Disease Frequency
Measuring Disease FrequencyMeasuring Disease Frequency
Measuring Disease Frequency
 
Basic Descriptive statistics
Basic Descriptive statisticsBasic Descriptive statistics
Basic Descriptive statistics
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
Cross sectional study-dr.wah
Cross sectional study-dr.wahCross sectional study-dr.wah
Cross sectional study-dr.wah
 
Sample Size Estimation
Sample Size EstimationSample Size Estimation
Sample Size Estimation
 
Epidemiology lecture3 incidence
Epidemiology lecture3 incidenceEpidemiology lecture3 incidence
Epidemiology lecture3 incidence
 
ODDS RATIO AND RELATIVE RISK EVALUATION
ODDS RATIO AND RELATIVE RISK EVALUATIONODDS RATIO AND RELATIVE RISK EVALUATION
ODDS RATIO AND RELATIVE RISK EVALUATION
 

Viewers also liked

Statr session 15 and 16
Statr session 15 and 16Statr session 15 and 16
Statr session 15 and 16Ruru Chowdhury
 
The Concept of Sampling
The Concept of SamplingThe Concept of Sampling
The Concept of SamplingVICHET KEO
 
HYPOTHESIS TESTING
HYPOTHESIS TESTINGHYPOTHESIS TESTING
HYPOTHESIS TESTINGAmna Sheikh
 
STATISTICS: Hypothesis Testing
STATISTICS: Hypothesis TestingSTATISTICS: Hypothesis Testing
STATISTICS: Hypothesis Testingjundumaug1
 
Basic statistics for algorithmic trading
Basic statistics for algorithmic tradingBasic statistics for algorithmic trading
Basic statistics for algorithmic tradingQuantInsti
 
Review & Hypothesis Testing
Review & Hypothesis TestingReview & Hypothesis Testing
Review & Hypothesis TestingSr Edith Bogue
 
Test of hypothesis (z)
Test of hypothesis (z)Test of hypothesis (z)
Test of hypothesis (z)Marlon Gomez
 
Medical Statistics Pt 1
Medical Statistics Pt 1Medical Statistics Pt 1
Medical Statistics Pt 1Fastbleep
 
NG BB 47 Basic Design of Experiments
NG BB 47 Basic Design of ExperimentsNG BB 47 Basic Design of Experiments
NG BB 47 Basic Design of ExperimentsLeanleaders.org
 
Z score
Z scoreZ score
Z scorefebru
 
Four steps to hypothesis testing
Four steps to hypothesis testingFour steps to hypothesis testing
Four steps to hypothesis testingHasnain Baber
 
NG BB 33 Hypothesis Testing Basics
NG BB 33 Hypothesis Testing BasicsNG BB 33 Hypothesis Testing Basics
NG BB 33 Hypothesis Testing BasicsLeanleaders.org
 
Statistics in nursing research
Statistics in nursing researchStatistics in nursing research
Statistics in nursing researchNursing Path
 

Viewers also liked (20)

Statr session 15 and 16
Statr session 15 and 16Statr session 15 and 16
Statr session 15 and 16
 
The Concept of Sampling
The Concept of SamplingThe Concept of Sampling
The Concept of Sampling
 
HYPOTHESIS TESTING
HYPOTHESIS TESTINGHYPOTHESIS TESTING
HYPOTHESIS TESTING
 
STATISTICS: Hypothesis Testing
STATISTICS: Hypothesis TestingSTATISTICS: Hypothesis Testing
STATISTICS: Hypothesis Testing
 
Basic statistics for algorithmic trading
Basic statistics for algorithmic tradingBasic statistics for algorithmic trading
Basic statistics for algorithmic trading
 
Review & Hypothesis Testing
Review & Hypothesis TestingReview & Hypothesis Testing
Review & Hypothesis Testing
 
BBA 020
BBA 020BBA 020
BBA 020
 
Test of hypothesis (z)
Test of hypothesis (z)Test of hypothesis (z)
Test of hypothesis (z)
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
biostatistics basic
biostatistics basic biostatistics basic
biostatistics basic
 
Medical Statistics Pt 1
Medical Statistics Pt 1Medical Statistics Pt 1
Medical Statistics Pt 1
 
Hypothesis
HypothesisHypothesis
Hypothesis
 
NG BB 47 Basic Design of Experiments
NG BB 47 Basic Design of ExperimentsNG BB 47 Basic Design of Experiments
NG BB 47 Basic Design of Experiments
 
Z scores
Z scoresZ scores
Z scores
 
Z score
Z scoreZ score
Z score
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Z scores
Z scoresZ scores
Z scores
 
Four steps to hypothesis testing
Four steps to hypothesis testingFour steps to hypothesis testing
Four steps to hypothesis testing
 
NG BB 33 Hypothesis Testing Basics
NG BB 33 Hypothesis Testing BasicsNG BB 33 Hypothesis Testing Basics
NG BB 33 Hypothesis Testing Basics
 
Statistics in nursing research
Statistics in nursing researchStatistics in nursing research
Statistics in nursing research
 

Similar to Medical statistics Basic concept and applications [Square one]

ueda2013 primary prevention-d.lobna
ueda2013 primary prevention-d.lobnaueda2013 primary prevention-d.lobna
ueda2013 primary prevention-d.lobnaueda2015
 
ChandanChakrabarty_1.pdf
ChandanChakrabarty_1.pdfChandanChakrabarty_1.pdf
ChandanChakrabarty_1.pdfDikshathawait
 
Knowledge, attitude and practice about hypertension among adult
Knowledge, attitude and practice about hypertension among adultKnowledge, attitude and practice about hypertension among adult
Knowledge, attitude and practice about hypertension among adultMd.Nahian Rahman
 
Excerise Tolerance and Post-Operative Outcomes in Patients with Pulmonary Hyp...
Excerise Tolerance and Post-Operative Outcomes in Patients with Pulmonary Hyp...Excerise Tolerance and Post-Operative Outcomes in Patients with Pulmonary Hyp...
Excerise Tolerance and Post-Operative Outcomes in Patients with Pulmonary Hyp...Aalap Shah
 
biostat 2.pptx h h jbjbivigyfyfyfyfyftftc
biostat 2.pptx h h jbjbivigyfyfyfyfyftftcbiostat 2.pptx h h jbjbivigyfyfyfyfyftftc
biostat 2.pptx h h jbjbivigyfyfyfyfyftftcMrMedicine
 
Why to know statistics
Why to know statisticsWhy to know statistics
Why to know statisticsHesham Gaber
 
Home Telehealth Monitoring Outcome Assessment - Kings Fund
Home Telehealth Monitoring Outcome Assessment - Kings FundHome Telehealth Monitoring Outcome Assessment - Kings Fund
Home Telehealth Monitoring Outcome Assessment - Kings Fundjohnstamford
 
Mean platelet volume and other platelet volume indices in patients with acute...
Mean platelet volume and other platelet volume indices in patients with acute...Mean platelet volume and other platelet volume indices in patients with acute...
Mean platelet volume and other platelet volume indices in patients with acute...iosrjce
 
Stroke related pneumonia _ APSR 2019 - Phuc Duc Dang
Stroke related pneumonia  _ APSR 2019 - Phuc Duc DangStroke related pneumonia  _ APSR 2019 - Phuc Duc Dang
Stroke related pneumonia _ APSR 2019 - Phuc Duc Dangdangphucduc
 
Statistical Writing. Tables and Figures (Sven Sandin)
Statistical Writing. Tables and Figures (Sven Sandin)Statistical Writing. Tables and Figures (Sven Sandin)
Statistical Writing. Tables and Figures (Sven Sandin)kgr023
 
Clinical trial bms clinical trials methodology 17012018
Clinical trial bms   clinical trials methodology 17012018Clinical trial bms   clinical trials methodology 17012018
Clinical trial bms clinical trials methodology 17012018SoM
 
Secondary data talk 2010
Secondary data talk 2010Secondary data talk 2010
Secondary data talk 2010Marion Sills
 
Case study using one way ANOVA
Case study using one way ANOVACase study using one way ANOVA
Case study using one way ANOVANadzirah Hanis
 
BIOSTATISTICS MAK 1 (1).ppt555555555555555
BIOSTATISTICS MAK 1 (1).ppt555555555555555BIOSTATISTICS MAK 1 (1).ppt555555555555555
BIOSTATISTICS MAK 1 (1).ppt555555555555555JamesAmaduKamara
 

Similar to Medical statistics Basic concept and applications [Square one] (20)

ueda2013 primary prevention-d.lobna
ueda2013 primary prevention-d.lobnaueda2013 primary prevention-d.lobna
ueda2013 primary prevention-d.lobna
 
ChandanChakrabarty_1.pdf
ChandanChakrabarty_1.pdfChandanChakrabarty_1.pdf
ChandanChakrabarty_1.pdf
 
Quantitative Synthesis I
Quantitative Synthesis IQuantitative Synthesis I
Quantitative Synthesis I
 
Bio statistics 1
Bio statistics 1Bio statistics 1
Bio statistics 1
 
Knowledge, attitude and practice about hypertension among adult
Knowledge, attitude and practice about hypertension among adultKnowledge, attitude and practice about hypertension among adult
Knowledge, attitude and practice about hypertension among adult
 
Novedades en farmacología en intervencionismo
Novedades en farmacología en intervencionismoNovedades en farmacología en intervencionismo
Novedades en farmacología en intervencionismo
 
Excerise Tolerance and Post-Operative Outcomes in Patients with Pulmonary Hyp...
Excerise Tolerance and Post-Operative Outcomes in Patients with Pulmonary Hyp...Excerise Tolerance and Post-Operative Outcomes in Patients with Pulmonary Hyp...
Excerise Tolerance and Post-Operative Outcomes in Patients with Pulmonary Hyp...
 
Displaying your results
Displaying your resultsDisplaying your results
Displaying your results
 
biostat 2.pptx h h jbjbivigyfyfyfyfyftftc
biostat 2.pptx h h jbjbivigyfyfyfyfyftftcbiostat 2.pptx h h jbjbivigyfyfyfyfyftftc
biostat 2.pptx h h jbjbivigyfyfyfyfyftftc
 
Why to know statistics
Why to know statisticsWhy to know statistics
Why to know statistics
 
Home Telehealth Monitoring Outcome Assessment - Kings Fund
Home Telehealth Monitoring Outcome Assessment - Kings FundHome Telehealth Monitoring Outcome Assessment - Kings Fund
Home Telehealth Monitoring Outcome Assessment - Kings Fund
 
Mean platelet volume and other platelet volume indices in patients with acute...
Mean platelet volume and other platelet volume indices in patients with acute...Mean platelet volume and other platelet volume indices in patients with acute...
Mean platelet volume and other platelet volume indices in patients with acute...
 
Stroke related pneumonia _ APSR 2019 - Phuc Duc Dang
Stroke related pneumonia  _ APSR 2019 - Phuc Duc DangStroke related pneumonia  _ APSR 2019 - Phuc Duc Dang
Stroke related pneumonia _ APSR 2019 - Phuc Duc Dang
 
Statistical Writing. Tables and Figures (Sven Sandin)
Statistical Writing. Tables and Figures (Sven Sandin)Statistical Writing. Tables and Figures (Sven Sandin)
Statistical Writing. Tables and Figures (Sven Sandin)
 
Clinical trial bms clinical trials methodology 17012018
Clinical trial bms   clinical trials methodology 17012018Clinical trial bms   clinical trials methodology 17012018
Clinical trial bms clinical trials methodology 17012018
 
Secondary data talk 2010
Secondary data talk 2010Secondary data talk 2010
Secondary data talk 2010
 
Statistical analysis and its applications
Statistical analysis and its applicationsStatistical analysis and its applications
Statistical analysis and its applications
 
Copenhagen 23.10.2008
Copenhagen 23.10.2008Copenhagen 23.10.2008
Copenhagen 23.10.2008
 
Case study using one way ANOVA
Case study using one way ANOVACase study using one way ANOVA
Case study using one way ANOVA
 
BIOSTATISTICS MAK 1 (1).ppt555555555555555
BIOSTATISTICS MAK 1 (1).ppt555555555555555BIOSTATISTICS MAK 1 (1).ppt555555555555555
BIOSTATISTICS MAK 1 (1).ppt555555555555555
 

More from Tarek Tawfik Amin

More from Tarek Tawfik Amin (20)

Publishing of an article
Publishing of an article Publishing of an article
Publishing of an article
 
Blinding in clinical trilas
Blinding in clinical trilas Blinding in clinical trilas
Blinding in clinical trilas
 
Clincal trails phases
Clincal trails  phasesClincal trails  phases
Clincal trails phases
 
Clinical trials designs
Clinical trials designsClinical trials designs
Clinical trials designs
 
Scientific writing
Scientific writing Scientific writing
Scientific writing
 
Bias and confounding
Bias and confoundingBias and confounding
Bias and confounding
 
Data collection
Data collection Data collection
Data collection
 
Epidemiology of physical activity in the Middle East
Epidemiology of physical activity in the Middle EastEpidemiology of physical activity in the Middle East
Epidemiology of physical activity in the Middle East
 
Cancer Epidemiology part II
Cancer Epidemiology part IICancer Epidemiology part II
Cancer Epidemiology part II
 
Cancer Epidemiology part I
Cancer Epidemiology part ICancer Epidemiology part I
Cancer Epidemiology part I
 
Community Diagnosis
Community DiagnosisCommunity Diagnosis
Community Diagnosis
 
Ebola virus disease
Ebola virus diseaseEbola virus disease
Ebola virus disease
 
Linear Correlation
Linear Correlation Linear Correlation
Linear Correlation
 
Plagiarism
Plagiarism Plagiarism
Plagiarism
 
Screening test (basic concepts)
Screening test (basic concepts)Screening test (basic concepts)
Screening test (basic concepts)
 
Public awareness
Public awarenessPublic awareness
Public awareness
 
Health education and adults learning.
Health education and adults learning. Health education and adults learning.
Health education and adults learning.
 
Breast cancer risk factors
Breast cancer risk factors Breast cancer risk factors
Breast cancer risk factors
 
Types of epidemics and epidemic investigations
Types of epidemics and epidemic investigationsTypes of epidemics and epidemic investigations
Types of epidemics and epidemic investigations
 
Samples Types and Methods
Samples Types and Methods Samples Types and Methods
Samples Types and Methods
 

Recently uploaded

Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...narwatsonia7
 
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...narwatsonia7
 
Hematology and Immunology - Leukocytes Functions
Hematology and Immunology - Leukocytes FunctionsHematology and Immunology - Leukocytes Functions
Hematology and Immunology - Leukocytes FunctionsMedicoseAcademics
 
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
Russian Call Girls Gunjur Mugalur Road : 7001305949 High Profile Model Escort...
Russian Call Girls Gunjur Mugalur Road : 7001305949 High Profile Model Escort...Russian Call Girls Gunjur Mugalur Road : 7001305949 High Profile Model Escort...
Russian Call Girls Gunjur Mugalur Road : 7001305949 High Profile Model Escort...narwatsonia7
 
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service JaipurHigh Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipurparulsinha
 
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original PhotosCall Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photosnarwatsonia7
 
Glomerular Filtration and determinants of glomerular filtration .pptx
Glomerular Filtration and  determinants of glomerular filtration .pptxGlomerular Filtration and  determinants of glomerular filtration .pptx
Glomerular Filtration and determinants of glomerular filtration .pptxDr.Nusrat Tariq
 
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service MumbaiLow Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbaisonalikaur4
 
Noida Sector 135 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few C...
Noida Sector 135 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few C...Noida Sector 135 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few C...
Noida Sector 135 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few C...rajnisinghkjn
 
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...rajnisinghkjn
 
Glomerular Filtration rate and its determinants.pptx
Glomerular Filtration rate and its determinants.pptxGlomerular Filtration rate and its determinants.pptx
Glomerular Filtration rate and its determinants.pptxDr.Nusrat Tariq
 
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiCall Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiNehru place Escorts
 
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceCollege Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceNehru place Escorts
 
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...narwatsonia7
 
Call Girls Viman Nagar 7001305949 All Area Service COD available Any Time
Call Girls Viman Nagar 7001305949 All Area Service COD available Any TimeCall Girls Viman Nagar 7001305949 All Area Service COD available Any Time
Call Girls Viman Nagar 7001305949 All Area Service COD available Any Timevijaych2041
 

Recently uploaded (20)

Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
 
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
 
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
 
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
 
Hematology and Immunology - Leukocytes Functions
Hematology and Immunology - Leukocytes FunctionsHematology and Immunology - Leukocytes Functions
Hematology and Immunology - Leukocytes Functions
 
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
Russian Call Girls Gunjur Mugalur Road : 7001305949 High Profile Model Escort...
Russian Call Girls Gunjur Mugalur Road : 7001305949 High Profile Model Escort...Russian Call Girls Gunjur Mugalur Road : 7001305949 High Profile Model Escort...
Russian Call Girls Gunjur Mugalur Road : 7001305949 High Profile Model Escort...
 
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service JaipurHigh Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
 
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
 
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
 
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original PhotosCall Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
 
Glomerular Filtration and determinants of glomerular filtration .pptx
Glomerular Filtration and  determinants of glomerular filtration .pptxGlomerular Filtration and  determinants of glomerular filtration .pptx
Glomerular Filtration and determinants of glomerular filtration .pptx
 
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service MumbaiLow Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
 
Noida Sector 135 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few C...
Noida Sector 135 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few C...Noida Sector 135 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few C...
Noida Sector 135 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few C...
 
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...
 
Glomerular Filtration rate and its determinants.pptx
Glomerular Filtration rate and its determinants.pptxGlomerular Filtration rate and its determinants.pptx
Glomerular Filtration rate and its determinants.pptx
 
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiCall Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
 
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceCollege Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
 
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
 
Call Girls Viman Nagar 7001305949 All Area Service COD available Any Time
Call Girls Viman Nagar 7001305949 All Area Service COD available Any TimeCall Girls Viman Nagar 7001305949 All Area Service COD available Any Time
Call Girls Viman Nagar 7001305949 All Area Service COD available Any Time
 

Medical statistics Basic concept and applications [Square one]

  • 3. ?How • Book: Statistics at Square One 11th ed. “ Campbell and Swinscow” • SPSS Practical sessions-PASW guide. • Practical sessions using SPSS v. 17.0
  • 4. ”Statistics “ an overview Population Parameters Data Analysis Interpretation Information Sample Statistics Statistical analysis Reference range Researches
  • 5. Statistical analysis Data Statistical analysis Variables Qualitative Categorical Quantitative Numerical Depends on the sample (s) and objectives of analysis Interval/Ratio Nominal Ordinal Tables Discrete Continuous Descriptive Graphs Inferential Measures
  • 7. diabIB SB P SB P P NR AT AGE SE X SM E OK H IGH E T W IGH E T CH OL H A1C B DIAB DU DE AD 1 57 0 0 177 98 140 154 0 6.30 7.62 5 #NULL! 2 74 1 0 172 69 150 145 1 5.10 8.30 11 0 3 38 1 0 155 70 120 126 0 6.50 11.00 2 #NULL! 4 73 1 0 165 72 180 157 0 5.80 7.00 21 0 5 53 1 2 174 109 140 119 1 6.80 10.60 7 0 6 74 1 0 171 83 151 145 0 6.25 7.62 7 0 7 81 0 2 175 60 140 113 0 6.50 6.40 6 0 8 86 1 0 164 59 140 158 0 5.20 5.30 4 0 9 78 0 1 171 83 151 148 0 5.60 5.90 1 1 10 78 1 0 171 83 151 159 1 5.00 8.00 23 1 11 91 0 0 171 83 151 140 0 4.30 9.70 4 1 12 77 0 2 176 87 170 198 0 6.40 6.60 7 2 13 77 1 0 171 83 151 152 0 5.20 4.90 26 1 14 84 0 0 171 62 160 148 0 7.00 7.80 8 1 15 72 1 0 154 63 145 148 0 6.20 7.80 0 1 1 IN 2 INSUL
  • 8. I-Tables ) Tables can summarize counts, frequency (categorical), measures (numerical Contingency Frequency smoking history * SEX Crosstabulation SEX Valid male female Total Frequency 145 133 278 Percent 52.2 47.8 100.0 Count Valid Percent 52.2 47.8 100.0 Cumulative Percent 52.2 100.0 SEX male smoking history Total never stopped smoking yes 26 64 55 145 female 110 14 9 133 Total 136 78 64 278 )For comparison (2 or more variables
  • 9. Table 3 Daily servings of calcium and vitamin D rich foods in relation to body mass . index classification of the included adults (* F ood items (servings/ day Subjects classification (Obese (N=91 Milk Milk beverage Milk in cereals Milk in coffee or tea -T otal milk Yoghurt Cheese Ice cream -T otal dairy Tuna (canned) Fish Half cooked fish Shrimp/oyster Eggs Liver (including chicken livers) Others! -Dietary vitamin D (IU/ day): Median (mean ±SD) Low dietary intake c (< 200 IU/day): No. (%) -Dietary calcium (mg/ day): Median (mean ±SD) Low calcium intake d (<1000mg/day): No. (%) (Non-obese (N=125 (0.71±0.3)0.52 0.45(0.59±0.4) 0.20(0.33±0.2) 0.15(0.25±0.6) 0.90(1.03±0.3) 0.10(0.12±0.6) 0.20(0.24±0.9) 0.15(0.14±0.6) 0.25(0.45±0.6) 0.05(0.03±0.1) 0.15(0.19±0.7) 0.06(0.11±0.5) 0.05(0.08±0.1) 0.85(0.81±1.1) 0.02(0.04±0.4) 0.20(0.23±0.3) (111.6)118.1±73.5 56(62.2) (660.0)698.8±261.9 51(56.7) (0.88±0.7)0.65 0.35(0.53±0.4) 0.50(0.58±0.4) 0.20(0.23±0.6) 1.20(1.34±0.7) 0.20(0.14±0.5) 0.20(0.29±0.8) 0.06(0.09±0.3) 0.30(0.43±0.7) 0.03(0.04±0.3) 0.10(0.18±0.5) 0.25(0.27±0.6) 0.05(0.06±0.1) 0.80(0.76±0.7) 0.05(0.06±0.3) 0.40(0.55±0.5) (123.7)132.2±67.4 47(37.6) (692.0)717.9±245.9 49(39.2) P value 0.031 0.279 0.001 0.790 0.001 0.790 0.661 0.422 0.826 0.761 0.902 0.029 0.149 0.797 0.834 0.549 0.034 0.003b 0.223 0.011b a
  • 10. Assignment I ).Table 1 Basic characteristics for the patients examined (N=278 Baseline characteristics 1996 (%)Men- 1 (%)Insulin users- 2 (%)Smokers- 3 (%)Ex-smokers- 4 (%)Non-smokers- 5 )Age in years (mean ±SD- 6 )Systolic Blood pressure at starting point mmHg (mean ±SD- 7 )Systolic blood pressure two years mm Hg (mean ±SD- 8 )Duration of diabetes (median/Quartiles 1-3- 9 Missed values- 10 )Total (N=278 52.2 25.5 23.0 28.1 48.9 ±11.74 67.24 ±22.00 151.20 ±29.1 153.83 (2.75-12.25) 6.0 0.0
  • 12. II- Graphs Types of variables-1 Number of variables-2 Objectives-3 Selection of graphs Next Categorical Numerical Figure 1Outcomes of the included diabetic patients (1996) Figure 2: Smoking status of the inlcuded diabetic patients 60 other cau se of death M issin g 50 40 30 20 alive 10 Percent died from CVD 0 never smoking history stopped smoking yes
  • 13. For numerical variables Figure 3: Total cholesterol level in diabetic pateints 1996 in mmol/l 60 50 40 30 20 Std. Dev = 1.33 10 Mean = 6.25 N = 278.00 0 . 13 . 12 . 11 . 10 00 00 00 00 0 0 00 9. 0 8. 0 7. 0 0 00 6. 0 5. 0 4. 00 3. total cholesterol
  • 14. Figure 4: Systolic blood pressure at starting point among diabetic patients 1996 (mmHg) 240 220 28 247 99 68 67 200 syst. blood pressure at start 180 160 140 120 100 80 N= 133 male SEX 145 female
  • 15. Figure 6: Total cholesterol level in relation to gender and smoking status among diabetic patients 1996 95% CI total cholesterol (mmol/l) 8.5 8.0 7.5 smoking history 7.0 6.5 n ever 6.0 stopped sm oking 5.5 5.0 yes N= 26 64 male SEX 55 110 14 female 9
  • 16. Figure 7: Duration of diabetes among the included patients 1996 Checking for normality (in years) 80 70 60 Median=6.0 Mode=1 50 Normal distribution 40 30 20 Std. Dev = 6.96 10 Mean = 7.9 0 N = 278.00 0.0 5.0 2.5 10.0 7.5 15.0 12.5 20.0 17.5 25.0 22.5 - 30.0 27.5 32.5 + duration of diabetes Outliers Mode Median Mean
  • 17. (III-Measures (numerical variables Central Tendency H the data aggregate around a central point ow Mean Median Mode P ercentiles Dispersion H the data varies ow )Range (max-min Inter Quartile range Variance Standard deviation Variation coefficient
  • 18. Central Tendency M ean= summation of observations/ their number Affected by extremes of value x1+x2+x3)/ number( M ode= T most frequently occurring values in a set of observations he M edian= T middle value that divide the ordered data set into 50/ he 50 Not affected by extremes of values
  • 20. Dispersion 1 1 6 8 10 16 17 23 43 53 Range=53-1=52 Affected by extremes of values of data %25 25th percentile 1st quartile M edian=13 of data 50% 50th percentile=13 of the data 75% 75th percentiles 3rd quartile Interquartile range=3rd-1st quartiles 17=23-6 IQR not affected by extremes of values
  • 21. Standard deviation and variance 3 7 6- 2- Sample of 3, their age in years 9 17 M ean age=(3+7+17)/ 3=9 8+ T sum of the differences between the mean and individual values=0 he T mean deviation=0 he T overcome the 0= sum the difference squared/ o number-1= Variance 52=2/3-1(17-9)+2(6-9)+2(3-9) ) T amount of dispersion around the mean=52 years2 (wrong scale he H ence we need to convert back to the usual (natural) scale, use the standard deviation Variance=±7.2 years√
  • 22. T sample disperses around the mean (=9 years) by 7.2 years on both directions he
  • 23. Description of a binary (dichotomous (variable o A binary variable: H only two outcomes as (diseased or not diseased). o T proportion of the population that is he diseased (at certain point of time) is called prevalence. o T new cases occurring is called he incidence.
  • 24. Dichotomous variables P revalence= All cases (new or old)/ risk population at Incidence= New cases/ total population at risk
  • 25. P robability and Odds o Odds= chance o In a population of 1000, 200 has a certain disease. o W hen we randomly take one person out, the probability that this person is diseased= 200/ 1000= 0.2 (this is probability) o T chance (the Odds) that is person is he diseased= probability of having the disease / probability of not having the disease. o Odds= P (probability of disease)/ probability of not having the disease (1-P / = )=P 1-P 0.2/ 0.8=1/ the odds are 1 to 4. 4,
  • 26. T following table depicts the outcomes of isoniazid/ he placebo trail among children with H (death within 6 months IV Dead (within 6 (months Alive Total Placebo 21 110 131 Isoniazid 11 121 132 Interventions W hat is the risk of ?dying Risk=21/ 131=0.160 Risk=11/ 132=0.083 Absolute risk difference (ARD)=risk in placebo-risk in isoniazid= 0.077 Net relative risk (NRR)=risk in placebo/ risk in isoniazid= 1.928 Relative risk reduction (RRR)=risk in placebo-risk in isoniazid/ risk in placebo= 0.48 Number needed to treat (NNT )=1/ ARD=1/ 0.077=13
  • 27. )Odds ratio (OR o An odds ratio (OR) is a measure of association between an exposure and an outcome. o The OR represents the odds that an outcome will occur given a particular exposure, compared to the odds of the outcome occurring in the absence of that exposure. o Odds ratios are most commonly used in case-control studies, however they can also be used in crosssectional and cohort study designs as well (with some modifications and/or assumptions).
  • 28. B asic structure of case-control design PoPulation Diseased Unexposed to factor (b) Diseased (cases) Sample The Odds “ chance of exposure Is calculated between both groups E xposed to factor (a) Disease-free E xposed to factor (c) Disease-free (controls) Unexposed to factor (d) P time ast T race P resent time Starting point
  • 29. Calculation Case control study Diseased Exposed Cases+ exposed ((a Exposed+ not (diseased (b a+b Cases-not ( exposed (c Not exposed+ not (diseased (d c+d Non-exposed None Odds ratio= a/ d= ad/ c÷b/ bc Prevalence among the diseased/ prevalence among the non-diseased OR=1 Exposure does not affect odds of outcome OR>1 Exposure associated with higher odds of outcome OR<1 Exposure associated with lower odds of outcome Total
  • 30. Odds ratio Case control study Lung cancer Smoking a-80 b-30 110 c-20 d-70 90 None 80x70=5600 30x20=600 9.3=5600/600 No lung cancer Or 80/ 20÷30/ 70=9.3 Total
  • 31. B asic Structure of cohort study Diseased Disease-free The Relative Risk is calculated for exposure Develop )Disease (a Sample E xposed to factor Develop )Disease (c -Disease free Unexposed to factor P resent time Starting point Disease-free )b( F ollow Disease-free )d( Future tim e Comparing the incidence of disease in each group P opulation
  • 32. )Relative risk (RR Mammography Breast cancer No breast cancer Total Positive a-10 b-90 100 Negative c-20 d-998980 100,100 In Cohort design )RR= a/ (a+b)÷c/ (c+d 500 =0.1/0.0002=(100,100)20÷ (100)/10
  • 33. Coh ort stu dy )T relative risk (RR he L ung cancer Smokers Non 18 6 No lung cancer 582 1194 Risk for smokers=18/600=0.03 Risk for non-smokers=6/1200=0.005 RR=0.03/0.005=6 T otal 600 1200
  • 34. Cas ec ont rol stu dy )T Odds ratio (OR he L ung cancer Smokers Non 80 20 No lung cancer 30 70 Odds for smokers=80/30=2.67 Odds for non-smokers=20/70=0.29 OR=80* 70/30* 20=9.33 T otal 110 90
  • 35. Assignment I (.Table 1 Basic characteristics for the patients examined (N=278 Baseline characteristics 1996 )%(Men- 1 )%(Insulin users- 2 )%(Smokers- 3 )%(Ex-smokers- 4 )%(Non-smokers- 5 (Age in years (mean ±SD- 6 (Systolic Blood pressure at starting point mmHg (mean ±SD- 7 (Systolic blood pressure two years mm Hg (mean ±SD- 8 (Duration of diabetes (median/Quartiles 1st -3rd- 9 Missed values- 10 (Total (N=278 52.2 25.5 23.0 28.1 48.9 ±11.74 67.24 ±22.00 151.20 ±29.1 153.83 )2.75-12.25( 6.0 --
  • 36. 2a Smoking histroy (all subjects) 60 50 49 40 30 28 23 Percent 20 10 0 never smoking history stopped smoking yes
  • 37. 2b Smoking history by sex 100 80 83 60 44 40 38 Percent SEX 20 18 male 11 0 never smoking history stopped smoking 7 yes female
  • 38. 3a Age using Bar (mean used as summary) 70 69 68 Mean age (years) 67 66 65 64 male SEX female
  • 39. Boxplot age by Sex 3b 120 100 80 60 age (years) 40 20 195 0 N= This graph gives check for Data distribution and checking SEX for outliers 145 133 male female
  • 40. Height of the included subjects 4a Median=170.55 cm 50 40 30 20 Std. Dev = 8.89 10 Mean = 170.5 N = 278.00 0 5 7. 19 .0 5 19 .5 2 19 .0 0 19 .5 7 18 .0 5 18 .5 2 18 .0 0 18 .5 7 17 .0 5 17 .5 2 17 .0 0 17 .5 7 16 .0 5 16 .5 2 16 .0 0 16 .5 7 15 .0 5 15 .5 2 15 .0 0 15 height (cm)
  • 41. Duration of diabetes 4b 80 Median=6.0 years 70 60 50 40 30 20 Std. Dev = 6.96 10 Mean = 7.9 0 N = 278.00 0.0 5.0 2.5 10.0 7.5 15.0 12.5 duration of diabetes 20.0 17.5 25.0 22.5 30.0 27.5 32.5
  • 42. syst. blood pressure at sta rt Valid -5a Using F requency table: P 95≈189-190 100 110 112 115 116 120 121 122 124 125 127 130 131 132 134 135 136 137 139 140 141 144 145 147 148 150 151 151 152 153 155 158 160 161 162 163 164 165 167 168 170 171 172 175 176 177 178 179 180 182 184 185 187 189 190 194 195 200 205 209 210 216 220 Total Frequency 1 1 2 1 2 21 2 1 1 6 1 16 1 2 1 11 1 2 1 28 2 4 12 1 1 31 1 23 1 1 2 1 21 1 1 1 1 5 1 2 14 1 2 4 1 1 1 2 14 2 1 1 1 1 6 1 1 2 1 1 3 1 1 278 Percent .4 .4 .7 .4 .7 7.6 .7 .4 .4 2.2 .4 5.8 .4 .7 .4 4.0 .4 .7 .4 10.1 .7 1.4 4.3 .4 .4 11.2 .4 8.3 .4 .4 .7 .4 7.6 .4 .4 .4 .4 1.8 .4 .7 5.0 .4 .7 1.4 .4 .4 .4 .7 5.0 .7 .4 .4 .4 .4 2.2 .4 .4 .7 .4 .4 1.1 .4 .4 100.0 Valid Percent .4 .4 .7 .4 .7 7.6 .7 .4 .4 2.2 .4 5.8 .4 .7 .4 4.0 .4 .7 .4 10.1 .7 1.4 4.3 .4 .4 11.2 .4 8.3 .4 .4 .7 .4 7.6 .4 .4 .4 .4 1.8 .4 .7 5.0 .4 .7 1.4 .4 .4 .4 .7 5.0 .7 .4 .4 .4 .4 2.2 .4 .4 .7 .4 .4 1.1 .4 .4 100.0 Cumulative Percent .4 .7 1.4 1.8 2.5 10.1 10.8 11.2 11.5 13.7 14.0 19.8 20.1 20.9 21.2 25.2 25.5 26.3 26.6 36.7 37.4 38.8 43.2 43.5 43.9 55.0 55.4 63.7 64.0 64.4 65.1 65.5 73.0 73.4 73.7 74.1 74.5 76.3 76.6 77.3 82.4 82.7 83.5 84.9 85.3 85.6 86.0 86.7 91.7 92.4 92.8 93.2 93.5 93.9 96.0 96.4 96.8 97.5 97.8 98.2 99.3 99.6 100.0
  • 43. (p95, p5= M ean± Z score (probability) at the specified percentiles * (Standard deviation Probability distribution of the normal curve: page 180 -/-52 P95 SB P1= 151.2+1.645(22.0)=187.4 mmH g
  • 44. 5b-1 P5 for duration of diabetes duration of diabetes Valid 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 25 26 27 28 31 32 Total Frequency 12 35 22 21 24 20 23 19 6 6 6 13 2 7 6 5 11 8 6 5 3 5 2 2 3 1 1 2 1 1 278 Percent 4.3 12.6 7.9 7.6 8.6 7.2 8.3 6.8 2.2 2.2 2.2 4.7 .7 2.5 2.2 1.8 4.0 2.9 2.2 1.8 1.1 1.8 .7 .7 1.1 .4 .4 .7 .4 .4 100.0 Valid Percent 4.3 12.6 7.9 7.6 8.6 7.2 8.3 6.8 2.2 2.2 2.2 4.7 .7 2.5 2.2 1.8 4.0 2.9 2.2 1.8 1.1 1.8 .7 .7 1.1 .4 .4 .7 .4 .4 100.0 Cumulative Percent 4.3 16.9 24.8 32.4 41.0 48.2 56.5 63.3 65.5 67.6 69.8 74.5 75.2 77.7 79.9 81.7 85.6 88.5 90.6 92.4 93.5 95.3 96.0 96.8 97.8 98.2 98.6 99.3 99.6 100.0
  • 45. :Or using the formula M ean-Z score (1.645)* SD =-3.6 years
  • 46. Total population n=287, μ=67.24 years σ11.743 - +
  • 48. Population and Sample o In scientific research we want to make a statement (conclusion) about the population. o Studying the whole population is impossible in terms of money/time/labor. o Random sampling from the population and infer from the sample data the needed conclusions. o The task of statistics is to quantify the uncertainty (the sample is really representing that population).
  • 49. The concept of sampling Study population: You select a few sampling units Sam pling units from the study population You make an estimate “prediction” extrapolated to the study population (prevalence, outcomes etc.) Sample You collect information from these people to find answers to your research questions.
  • 50. What would be the mean systolic blood pressure ?of older subjects (65+) in Al Hassa 175 P opulation mean ( μ)= unknown 165 180 155 F rom our sample we calculate an estimate of the population parameter
  • 51. T good sample (the he (estimator : Should be :Unbiased The mean of sample = population mean )Precise: (narrow dispersion about the mean The dispersion in repeated samples is small This is a dream
  • 52. Sampling error F our individuals A, B C, D , A = 18 years B 20 years = C= 23 years D= 25 years T heir mean age is = 18+20+23+ 25= 86/ 21.5 years (population mean μ). 4=
  • 53. P robability of sampling two individuals: (6 probabilities) A+B =18+20= 38/ 2=19.0 years A+C= 18+23=20.5 years. Sampling error= population mean-sample mean A+D=18+25=21.5 years. = ranges from -2.5 to +2.5 years. B +C=20+23=21.5 years. B +D=20+25=22.5 years. C+D=23+25=24.0 years. P robability of sampling three individuals: (4 probabilities) A+B +C=18+20+23=20.33 years. E rror = ranges from -1.17 to +1.7 years. A+B +D=18+20+25=21.00 years. A+C+D=18+23+25=22.00 years. B +C+D=20+23+25=22.67 years. If C=32 (instead of 23) years and D=40 (instead of 25) years: sampling of 2= sampling error of -7.00 to +7.00 and in 3= -3.67 to +3.67 years. T greater the variability of a given variable the larger the sampling he error for a given sample size.
  • 54. )Infinite samples should represents the population it came from (good estimator
  • 55. 2 o T normal distribution he o T Standard error of the mean he o E stimation: Reference interval Confidence intervals F mean or proportion Difference between means/ proportions RR and OR
  • 56. Norm Distribution: al M any human traits, such as intelligence, personality, and attitudes, also, the weight and height, are distributed among the populations in a fairly normal way. 56 ١٤٣٥/٠٢/٦
  • 57.
  • 58. T normal distribution he (within between μ ±1 SD (σ ±68% (within between μ ±2 SD (σ ±95% SDs Definite outliers 3< 2SDs Possible outliers<
  • 59. One more T Z score which measures how many standard he deviations a particular data point is above or below the mean. oUnusual observations would have a Z score over 2 or under 2 SD. oE xtreme observations would have Z scores over 3 or under 3 SD and should be investigated as potential outliers. Z = X1 − χ s
  • 60. .Areas under the standard normal curve Z ±0.1 ±0.2 ±0.3 ±0.4 ±0.5 ±0.6 ±0.7 ±0.8 ±0.9 ±1 ±1.1 ±1.2 ±1.3 ±1.4 ±1.5 ±1.6 ±1.645 ±1.7 ±1.8 ±1.9 1.96 ±2 ±2.1 ±2.2 ±2.3 ±2.4 ±2.578 Area under curve between both points ((around the mean 0.080 0.159 0.236 0.311 0.383 0.451 0.516 0.576 0.632 0.683 0.729 0.770 0.806 0.838 0.866 0.890 0.900 0.911 0.928 0.943 0.950 0.954 0.964 0.972 0.979 0.984 0.99 B eyond both points (two tails) B eyond one point (one tail) 0.920 0.841 0.764 0.689 0.617 0.549 0.484 0.424 0.368 0.317 0.271 0.230 0.194 0.162 0.134 0.110 0.100 0.089 0.072 0.057 0.050 0.046 0.036 0.028 0.021 0.010 0.004 0.4600 0.4205 0.3820 0.3445 0.3085 0.2745 0.2420 0.2120 0.1840 0.1585 0.1355 0.1150 0.0970 0.0810 0.0670 0.0550 0.0500 0.0445 0.0360 0.0290 0.0250 0.0230 0.0180 0.0140 0.0105 0.0100 0.0020
  • 61. Calculating values from Z-scores (.Xi = Mean± Z (standard deviation (Value (percentiles) =M ean± Z score* (SD
  • 62. Random sample for estimating a population mean X1=128 ?μ X2=133 X3=129 F rom the information in the sample, we will estimate the unknown (population mean (X is an estimator for μ ?W hat could have happened if we had another random sample ?W hat is the measure of variation of sample means
  • 63. T Sampling Distribution of a Sample Statistics he ≈ L et’s assume that we want to survey a community of 400, the age of them were recorded and having the following parameters: µ = 35 years σ = 13 years ≈ L et’s assume, however, that we do not survey all 400, instead we randomly select 120 people and ask them about their ages and calculate the mean age. ≈ T hen, we put them back into the community and randomly select another 120 residents (may include members of the first sample). W did this over and over and each time we calculate the mean e age. T results will be like those in the following table. he ≈ ≈
  • 64. Sample Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 SD of the means Sample mean 34.7 35.9 35.5 34.7 34.5 34.4 35.7 34.6 37.4 35.3 34.1 35.5 34.9 36.2 35.6 35.0 35.1 36.4 35.6 33.6 13.37 Distribution of 20 random sample means ((n=20 μ ..… ..… . .… . . 33 34 35 36 .. . 37 All the results are clustered around the population value (35 years), with a few scores a bit further out and one extreme score of 37.4 (.years (random variation=1/ 20=5% ,T hose 400 people have age range from 2 to 69 years while the means of the samples have a very narrow range of value of about 4 years and 10 (.samples coincide with the population mean (35 years
  • 65. M of the samples will cluster around the population ost parameters with occasional sample result falling relatively further to one side or the other of the distribution (this called the sam pling distribution of (.sam m ple eans :H the following properties as T mean of the sampling distribution is equal to the he population mean, the average of the averages (µχ) will be the same as the population mean. T standard deviation of the sample means = the he standard error SE σ/ n, (σ= population SD). = √ T distribution of the sample means is Normal if the he population distribution is Normal. If the population distribution is Not Normal, T he distribution of the sample means is almost Normal when n is large (Central L imit T heorem).
  • 66. Standard error of the mean P opulation P arameters M ean S.D Sample mean Sample M ean S.D The degree the sample statistics are deviating /different .from the population parameters T term error indicates the fact that due to sampling error, he each sample mean is likely to deviate some what from true population mean.
  • 67.
  • 68. Central L imit T heorem .T formula for SE= SD/ he √n T formula indicates that we are estimating the SE given he .the S.D of a sample of size n .For a sam of 100 a S.D of 40 the SE= 40 / ple nd √100 = 4 .For a sam of 1000 and S.D of 40 the SE= 40 /√1000 = 1.26 ple T factors influence the SE sample size and S.D of the wo , :sample . Sample size has greater impact as it is used a denominator .For a sam of 100 a S.D of 20 the SE = 20 /√100 = 2 ple nd .For a sam of 100 a S.D of 40 the SE = 40 /√100 = 4 ple nd If there is more variability within a sample the greater the .SE
  • 69.
  • 70. (Confidence Interval (CI A confidence interval gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data.
  • 71. W need to know the smallest and the largest μ (range) we think is likely e using sample statistics. T mean of sample = μ he
  • 72.
  • 73. c= level of confidence Z c= Z critical values (under ( normal curve 90% 95% 99% 1.645 1.960 2.578 σ  χ±Ζ   c  n (C.I= Mean of the sample ±Z critical scores (SEM SEM= SD/√n
  • 74. C.I • The confidence interval provides a range that is highly likely (often 95% or 99%) to contain the true population parameter that is being estimated. • The narrower the interval the more informative is the result. • It is usually calculated using the estimate (sample mean) and its standard error (SEM).
  • 75. CI for μ Systolic blood pressure in 287 diabetic patients Descriptives syst. blood pressure at start syst. blood pressure at start Mean 90% Confidence Interval for Mean Lower Bound Upper Bound 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Descriptives Interquartile Range Skewness Mean Kurtosis 90% Confidence Lower Bound Interval for Mean Upper Bound 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis Statistic 151.20 149.02 Std. Error 1.319 153.38 (C.I= 151.20±1.65(21.997/ 287 90% √ C.I=149.02-153.38 mmH g 150.30 150.00 483.880 21.997 100 220 120 30.00 Statistic .540 155.06 .152 149.92 Std. Error .146 3.064 .291 160.20 154.72 151.20 460.033 21.448 115 205 90 30.00 .263 -.506 Random sample of 50 out of 287 .340 .668
  • 76. Descriptives syst. blood pressure at start Mean 95% Confidence Interval for Mean Lower Bound Upper Bound 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis Statistic 151.20 148.60 Std. Error 1.319 153.80 150.30 150.00 483.880 21.997 100 220 120 30.00 .540 .152 (C.I=151.20±1.96(21.997/ 287 95% √ C.I=148.60-153.80 mmH g .146 .291 Descriptives syst. blood pressure at start Mean 95% Confidence Interval for Mean 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis Lower Bound Upper Bound Statistic 155.06 148.90 Std. Error 3.064 Random Sample of 50 out of 287 161.22 154.72 151.20 460.033 21.448 115 205 90 30.00 .263 -.506 .340 .668
  • 77. Descriptives syst. blood pressure at start Mean 99% Confidence Interval for Mean Lower Bound Upper Bound 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Descriptives Skewness Kurtosis syst. blood pressure at start Mean 99% Confidence Interval for Mean 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis Lower Bound Upper Bound Statistic 151.20 147.78 154.62 150.30 150.00 483.880 21.997 100 220 120 30.00 .540 .152 Statistic 155.06 146.84 Std. Error 1.319 99% (C.I=151.20±2.58(21.997/ 287 √ C.I=147.78-154.62 mmH g .146 .291 Std. Error 3.064 163.28 154.72 151.20 460.033 21.448 115 205 90 30.00 .263 -.506 Random sample of 50 out of 287 .340 .668
  • 78. (C.I= 151.20±1.65(21.997/ 287 90% √ C.I=149.02-153.38 mmH g (C.I=151.20±1.96(21.997/ 287 95% √ C.I=148.60-153.80 mmH g 99% (C.I=151.20±2.58(21.997/ 287 √ C.I=147.78-154.62 mmH g W hat does this mean? It means that if the same population is sampled on numerous occasions and interval estimates are made on each occasion, the resulting intervals would bracket the true population parameter (ranged) in approximately 90, 95 and 99 % . of the cases
  • 79. T sample distribution of a proportion he µp =π SE ( p ) = p (1 − p ) n p =K / n CI p = p ±1.96( SE ) Z critical score equal 95%
  • 80. Smokers among diabetics Sample=400 Smokers=40 P=40/400=0.1 SE (p) = √0.1-0.9/400=0.015 CI p 95%= 0.1±1.96(0.015) [0.07-0.13] for % it is the same SE=1.5% C.I=[7-13]
  • 81. CI for the difference between two 95% (means (μ1-μ2 Smoke No Yes Difference n Mean SBP (SE (mean 214 64 153.1 144.8 8.3 1.50 2.62 χ1 − χ 2 ± 1.96 * SE ( χ1 − χ 2 ) SE = ( SE ( χ1 )) 2 + ( SE ( χ 2 )) 2 C.I= 2.4 to 14.2
  • 82. CI for percentage 95% (Smoke (n died% SE (No (212 28.8 3.11 (Yes (64 23.4 5.30 Pns − Ps ± 1.96 * SE ( Pns − Ps ) Difference= 5.4%  P1 × (100 − p1) P2 × (100 − p 2)  SE =  +  n1 n2   C.I=-6.7% to 17.4% 95%
  • 83. CI for RR and OR 95% Use available software http:/www.medcalc.org/ / calc/ odds_ratio.php http:/www.medcalc.org/ / calc/ relative_risk.php vl.academicdirect.org/applied_statistics/.../CIcalculator.xls
  • 85. Inferential Statistics Testing in research o In scientific research we would like to test if our research ideas are true. o Based on previous observations (studies) we know that the mean cholesterol of patients with diabetes is higher than those without the disease. o We will take samples and check whether the results will agree with our expectations. o Meaning we are going to test the situation using a statistical test.
  • 86. The Z-test for one sample (Serum cholesterol (μ=5 mmol/ L Diabetic patients, mean cholesterol > 5 σ=±1.5 ?Considering σ=±1.5 Is there any difference between diabetes free population and the diabetic patients . regarding serum cholesterol? Let’ s perform Z test
  • 87. (Research question (hypothesis T research hypothesis would be he The mean cholesterol of diabetics is > 5mmol/L Null hypothesis H0: μ=sample mean=5 Alternative hypothesis (H1: μ >5 (one sided Or (H1: μ≠5 (two sided
  • 88. P rocedure μ=5 Mean of sample Cholesterol level diabetic patients in mmol/L 60 If the sample mean close to the population mean The null hypothesis is TRUE 50 40 If the sample mean differs from population mean We REJECT the null 30 20 Std. Dev = 1.33 10 Mean = 6.25 N = 278.00 0 0 .0 13 0 .0 12 0 .0 11 0 .0 10 00 9. 00 8. 00 7. 00 6. 00 5. 00 4. 00 3. total cholesterol
  • 89. T ά level (P he (value T probability to obtain / he achieve the null hypothesis T probability that P he opulation mean=sample mean T here no difference between the population and .sample mean Or The maximum probability we accept to reject the null hypothesis falsely ά = 0.05
  • 90. (P > 0.05 (ά Accept the null Sample mean= population mean (P ≤ 0.05 (ά Reject the null Sample mean≠ population mean Alpha level
  • 91. (Calculation (σ=1.5 SE =μ/ n=0.3 M √ Z=(mean sample-μ)/ σ P (mean of the sample≥6)=P ≥6-5)/ (Z 0.3= 0.0005 Under the normal curve area of rejection >1.96 Z : P=0.0005 T cholesterol blood level of diabetic patients can coincide he with the population (disease free) 5 in 10,000 times T two values could be the same in 5 times if we repeated this test 10,000 tim he P < 0.05 so we reject the null T diabetics have larger mean cholesterol level than the normal population he
  • 92. In reality It is unlikely that the σ (population SD) is known. In most of the cases, σ will be unknown and we will be able to apply neither the formula nor the table of normal distribution (areas under the curve=Z score). We resort to other statistical tests.
  • 94. Possible situations in Hypothesis testing Level of significance Reality Decision Reject H0 (Type I error (ά H0 is true H0 is not true Do not reject H0 (OK (1-ά )OK (1-В )Type II error (В В= Power-1 It is the probability to reject the null hypothesis if is NOT T RUE Usually 80% is the least required for any test
  • 95. Errors of Hypothesis Testing and Power Conclusion from hypothesis testing Decisions and errors in hypothesis testing True Situation (Difference exist (H ) 1 No difference (H 0 Study results Correct decision Difference exist Reject H 0 No difference Do not reject H 0 (power or 1-β ) T ype II or β error F alse acceptance T here is no difference when it is really .present T ype I error or ά Rejection when it is true F alse rejection T here is a difference when it is really not Correct decision
  • 96. P assive smoking and lung cancer T ruth about the population Conclusions, based on results from a study of a sample of the population Reject the null hypothesis (rates in the study appear to (be different Accept the null hypothesis (rates in the study appear (similar P assive smoking is related to lung .cancer Not related to .lung cancer T ype I E rror Incorrect rejection P assive sm oking is related to lung cancer when it is really not.. T ype II E rror Incorrect acceptance P assive sm oking is not related to lung cancer when it is reallydoes.
  • 97. The Alpha-Fetoprotein (AFP) test has both Type I and Type II error . possibilities This test screens the mother’ s blood during pregnancy for AFP and . determines risk . Abnormally high or low levels may indicate Down syndrome Ha: patient is unhealthy H0: patient is healthy Error Type I (False positive or False Rejection) is: Test wrongly indicates that patient has a Down syndrome, which means that pregnancy must be .aborted for no reason Error Type II (False negative or False Acceptance) is: Test is negative and the child will be born with multiple anomalies
  • 98. HypotHesis test This is the distribution given the null hypothesis is true
  • 99. type i and type ii error False acceptance False rejection
  • 100. one sample The distribution of X under the null and alternative hypotheses.
  • 101. t-distribution In real life situations we will estimate the unknown population SD . using Sample SD Results are standardized to :the t-distribution Z test for normal distribution The population SD is known χ −µ t= s n Z= χ −µ σ n
  • 102. t-distribution df=No. of observations (sample size)-1 Heavier tails than the Z distribution
  • 103. (Degree of freedom (df For all sample statistics: variance, SD, we used n-1 All the observations in any given sample are free .except one= Complementary effect
  • 104. Degree of freedom total =50 12 restricted 16 7 df = n-1 15
  • 106. t-test-steps to determine the statistical difference W hen? descriptive statistics: mean ± standard deviation Number of samples One sample vs. population mean t = χ − µ / SD n T independent wo samples 2 SD12 SD2 χ1 − χ 2 / + n1 n2 T dependent (two paired): Repeated measures tMatched pairs d− dependent = SE ( d −) Steps: 1- State the hypothesis to be tested: Null (non-directional-two tailed) mean= mean Alternative (unidirectional-one tail) mean ≠ mean 2- F the calculated t value: using the formulae. ind 3- F the degree of freedom: all = n-1 (two sample independent df=n1-1+n2-1 ind (n1+n2-2). 4- F the P value using the tables of t-distribution. ind 5- Conclude: if < 0.05 = rejection. If > 0.05 the null is accepted.
  • 107. t-test (student’s t-test) one sample t = χ − µ / SD n ?Using diabetes data: Is the mean age of diabetics > 65 years H0:μ=65 H1:μ≠65 t one sample =67.24-65/SD/√n=3.18 t distribution P=0.002 Reject the null Diabetics are significantly older than 65 years Statistics age (years( N Mean Std. Error of Mean Std. Deviation Variance Valid Missing 278 0 67.24 .704 11.743 137.902
  • 108. (P value (two sided One-Sample Test Test Value = 65 age (years( t 3.182 df 277 Sig. (2-tailed( .002 Mean Difference 2.24 95% Confidence Interval of the Difference Lower Upper .85 3.63 Degree of freedom Assuming that the distribution of age is normal ( Population SD is unknown (σ
  • 109. t-test for comparison of means of two independent samples H0: Smoking has no effect on systolic blood pressure Mean S= Mean NS or Mean S-mean NS=0 H1: smoking has an effect Mean S≠ Mean NS or Mean S-Mean NS≠0 :Assumptions •Independent observations (2 samples) •Normally distributed •Equal variances (for the pooled t-test)
  • 110. T hree formulae Expected difference if H0 is true Standardized t = χ −χ −0 1 2 2 S12 S2 + n1 n2 If SDs are equal t= χ1 − χ 2 2 Sp n1 + SD of the difference t= 2 Sp n2 2 (n1 − 1) S12 + (n2 − 1) S 2 S = (n1 − 1) + (n2 − 1) 2 p If SDs are not equal χ1 − χ 2 2 1 2 2 S S + n1 n2 Pooled SD Decision based on L evene’s test
  • 111. Variances are apparently equal Group Statistics syst. blood pressure at start SMOKING no smokers N Mean 153.11 144.82 214 64 Std. Deviation 21.995 20.934 Std. Error Mean 1.504 2.617 Independent Samples Test Levene's Test for Equality of Variances F syst. blood pressure at start Equal variances assumed Equal variances not assumed Sig. .006 .936 t-test for Equality of Means t Sig. (2-tailed( df Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper 2.674 276 .008 8.29 3.100 2.188 14.392 2.747 107.982 .007 8.29 3.018 2.308 14.272 Two separate t-test Not significant it means equal variances P value <0.05, reject H0
  • 112. Paired t-test If we have paired data (two repeated measurements on the same subjects) or before and after If the difference of the paired observations are Normally distributed.
  • 113. (P aired samples (dependent • • (P aired /dependent 2-sample t-test) To compare observations collected form the same group of individuals on 2 separate occasions (dependent observations or paired samples). T paired t statistics is calculated by: he - Calculate the difference between the 2 measurements taken on each individual. md - Calculate the mean of the differences. - Calculate the SE of the observed differences. SE d - Under the null hypothesis of no difference or difference = 0, the paired t statistic takes the form. md - 0 - t= Mean difference / SE of the difference. t= SEd - It has a normal distribution with degrees of freedom = (n-1)
  • 114. E xample F our students had the following scores in 2 subsequent tests. Is there a significant difference in their performance? Number Name T 1 est T 2 Dif est 1 Mike 35% 32- 67% 2 Melanie 50% 4 46% 3 Melissa 90% 4 86% 4 Mitchell 78% 13- 91% S D Dif = 17.152, SE Dif = 8.58Mean Dif = -9.25, Calculated Paired t = -9.25/8.58 = -1.078, df=n-1 = 3 md - 0 t= SEd
  • 115. df P value0.01 Level of significance for one-tail test 0.05 0.02 0.01 0.005 Level of significance for two-tail test 0.20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 35 50 ∞ 0.10 0.05 0.02 0.01 3.078 1.886 1.638 1.533 1.476 1.440 1.415 1.397 1.383 1.372 1.363 1.356 1.350 1.345 1.341 1.340 1.333 1.330 1.328 1.325 1.323 1.306 1.299 1.282 6.314 2.920 2.353 2.132 2.015 1.943 1.895 1.860 1.833 1.812 1.796 1.782 1.771 1.761 1.753 1.746 1.740 1.734 1.729 1.725 1.721 1.690 1.676 1.645 12.706 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 2.228 2.201 2.179 2.160 2.145 2.131 2.120 2.110 2.101 2.093 2.086 2.080 2.030 2.009 1.960 31.821 6.965 4.541 3.747 3.365 3.143 2.998 2.896 2.821 2.764 2.718 2.681 2.650 2.624 2.602 2.583 2.567 2.552 2.539 2.528 2.518 2.438 2.403 2.326 63.657 9.925 5.841 4.604 4.032 3.707 3.499 3.355 3.250 3.169 3.106 3.055 3.012 2.977 2.947 2.921 2.898 2.878 2.861 2.845 2.831 2.724 2.678 2.576 T P value = 0.20, the null is accepted! he
  • 116. Conclusion T observed difference can be he encountered in 36 (actual P value =0.362 out of 100 cases. i.e. we accept the null hypothesis of no difference between first and 2nd test.
  • 117. Paired Samples Statistics Mean Pair 1 syst. blood pressure at start syst. blood pressure after 2 years Std. Deviation N Std. Error Mean 151.20 278 21.997 1.319 153.83 278 29.076 1.744 Paired Samples Test Paired Differences Mean Pair 1 syst. blood pressure at start - syst. blood pressure after 2 years -2.63 Std. Deviation Std. Error Mean 17.920 1.075 95% Confidence Interval of the Difference Lower Upper -4.74 -.51 t -2.443 df Sig. (2-tailed( 277 .015
  • 118. T of significance est Interval/ ratio data P arametric assuming normal distribution (Known Population Variance (σ One sample Z-test Z test, rejection limit > ±1.96 χ−µ Z= σ n One sample vs. population One sample t-test Unknown Population Variance t-test Reject if P ≤ 0.05 Number of samples T samples wo Dependent t-paired test Independent t-test independent
  • 119. The Chi-Square test χ 2 Used for hypothesis testing for categorical variables M any types depends on design, distribution of variables and objectives of testing
  • 120. χ 2 :E xample Vaccination against Influenza deceases the risk .to get the disease :Study Compare the effectiveness of 5 vaccines with .respect to the probability to get influenza (Comparison will be in respect to a nominal variable (getting influenza: yes or no
  • 121. Effectiveness of Five Vaccines Data cross tabulated 2X5: response variable: Influenza Frequency within Vaccines% Vaccines Influenz a No Influenz a Yes T otal Vaccines Influenz a No Influenz a Yes T otal 1 2 3 4 5 237 198 245 212 233 43 52 25 48 57 280 250 270 260 290 1 2 3 4 5 84.6 79.2 90.7 81.5 80.3 15.4 20.8 9.3 18.5 19.7 100 100 100 100 100 T otal 1125 225 1350 T otal 83.3 16.7 100 T probability to get influenza he he null hypothesis states that the probability to get influenza is independent of the vaccin T alternative states that a dependency exists he
  • 122. Effectiveness of Five Vaccines :If H0 is true =The probability to influenza in every group should be the same , the probability in the total population (Equal to: 225/1350=0.167 (16.7% , Vaccine 1 used in 280, if H0 is true .we expect that 16.7% (≈47) to get influenza However this is not true
  • 123. Expected frequencies F any cell: E or xpected F requency= Row total* column total/grand total Vaccines Observed-1 E xpected Observed-2 E xpected Observed-3 E xpected Observed-4 E xpected Observed-5 E xpected T otal Influenz a No Influenza Yes T otal 237 233.3 198 208.3 245 225.0 212 216.7 233 241.7 43 46.7 52 41.7 25 45.0 48 43.3 57 48.3 280 1125 225 1350 Column total 250 Row total 280X225/1350 270 260 1125/1350*260 290 Grand total
  • 124. Pearson Chi-square test .Calculate the expected frequencies (assuming H0 is true) for all the ten cells Calculate Chi square: Of = observed frequency Ef = Expected frequency χ =∑ 2 (O f − E f ) 2 Ef Reject H0 if χ2 is large Use the Chi-square distribution (After determining the degree of freedom (df (df= (r-1)* (c-1
  • 126. Critical values for Chi-square df Level of Significance 0.99 1 2 3 4 5 . . 30 0.90 0.70 0.50 0.30 0.20 0.10 0.05 0.01 0.001 0.00016 0.0201 0.115 0.297 0.554 0.0158 0.211 0.584 1.064 1.610 0.148 0.713 1.424 2.195 3.000 0.455 1.386 2.366 3.357 4.351 1.074 2.408 3.665 4.878 6.064 1.642 3.219 4.642 5.989 7.289 2.706 4.605 6.251 7.779 9.236 3.841 5.991 7.815 9.488 11.070 6.635 9.210 11.341 13.277 15.086 10.827 13.815 16.268 18.465 20.517 14.953 20.599 25.508 29.336 33.530 36.250 40.256 43.773 50.892 59.703 χ2critical= 9.488 Calculated=16.555 df=(2-1)(5-1)=4 P=0.002 There is a relation (dependence) between type of vaccine and influenza prevention
  • 127. SMOKING * SEX Crosstabulation SEX male SMOKING no smokers Total 90 42.1% 55 85.9% 145 52.2% female 124 57.9% 9 14.1% 133 47.8% Total 214 100.0% 64 100.0% 278 100.0% Exact Sig. (2-sided( Exact Sig. (1-sided( .000 Count % within SMOKING Count % within SMOKING Count % within SMOKING .000 Chi-Square Tests Pearson Chi-Square Continuity Correctiona Likelihood Ratio Fisher's Exact Test Linear-by-Linear Association N of Valid Cases Value 38.017b 36.279 41.649 37.880 df 1 1 1 1 Asymp. Sig. (2-sided( .000 .000 .000 .000 278 a. Computed only for a 2x2 table b. 0 cells (.0%( have expected count less than 5. The minimum expected count is 30.62. At least 80% of cells must have Ef >5
  • 128. We can’ t use Pearson Chi-square if the expected frequency is <5 In this case we use Fisher’ s Exact test
  • 129. status * SEX Crosstabulation Count SEX male status alive died from CVD other cause of death Total 24 4 2 30 female 15 1 2 18 Total 39 5 4 48 (Expected f=4*30/48=2.5 (<5 Fisher Exact test provides correction (E f=5*18/48=1.875 (<5
  • 130. Chi-Square Tests Pearson Chi-Square Likelihood Ratio Linear-by-Linear Association N of Valid Cases Value .935a .991 .004 2 2 Asymp. Sig. (2-sided( .626 .609 1 .951 df 48 a. 4 cells (66.7%( have expected count less than 5. The minimum expected count is 1.50. Chi-square is not valid
  • 131. Chi-Square Tests Pearson Chi-Square Continuity Correctiona Likelihood Ratio Fisher's Exact Test Linear-by-Linear Association N of Valid Cases 37.880 df 1 1 1 1 Exact Sig. (2-sided( Exact Sig. (1-sided( .000 Value 38.017b 36.279 41.649 Asymp. Sig. (2-sided( .000 .000 .000 .000 .000 278 a. Computed only for a 2x2 table b. 0 cells (.0%( have expected count less than 5. The minimum expected count is 30.62.
  • 132. McNemar test Paired data in a cross tabulation (eczematous persons on both arms use ointment A or B (randomized 54 Ointment B No+ Total Ointment A + No 10 5 16 23 26 28 Total 15 39 54 M cNemar test only take the discordant pairs into account Χ2=(23-10)2/23+10 df=1
  • 133.