INTRODUCTION TO
BIOSTATISTICS
DEFINITIONS
Statistics refers to numerical facts.
how data are
 Collected
 Organized
 Summarized
 Presented
 Analyzed
 Interpreted
Application of this knowledge in health
Sciences is Known as Biostatistics
Definitions
Population is a set of measurement of interest to
the sample collector. It may or may not be living e.g
population of Karachi, All TB patients in Karachi
Sample is any subset of measurements selected
from the population e.g. All TB patients coming to
CHK during November 2005
Element – Every single unit in the sample is known
as element.It may also be defined as an entity on
which measurements are obtained. Every TB
patient coming to CHK in November 2005
Cont…
Observation – Set of measurement
obtained for each element e.g. recording
of age wt, Hb, cholesterol etc
Variables – Any thing which changes
its value in different places or things e.g.
Age, wt Ht, blood sugar, Hb, Electrolyte
etc
Data – Facts and figures collected,
summarised and analysed.
MEASURES OF CENTRAL
TENDENCY
• Mean
• Median
• Mode
MEAN
The MEAN (or arithmetic mean) is
also known as the AVERAGE. It Is
calculated by totaling the results of
all the observations and dividing by
the total number of observations.
Note that the mean can only be
calculated for numerical data.
x  x
n
 6 + 4 + 4 + 3 + 2 + 4 + 5
7

28
7
 4 years
Average age of
children in years
Children 1 2 3 4 5 6 7
Age 6 4 4 3 2 4 5
MEAN
Cont…
• Advantages:
1) Familiar and intuitively clear to most people
2) Every data set has one and only one mean
3) Useful for performing statistical procedures
• Disadvantages:
1) May be affected by extreme values
2) Tedious to compute
Advantages / Disadvantages
of the Mean
Median
1. Measure of central tendency
2. Middlemost or most central item in
the set of ordered numbers
– If odd n, middle value of sequence
– If even n, average of 2 middle values
3. Not affected by extreme values
Calculating Median
Median = item in the data array
( )th
n + 1
2
Number of items in the array
Median of
Odd Sample Size
Median
PositioningPoint
Median






n 1
2
7 1
2
4 0
4
.
Number 1 2 3 4 5 6 7
of Child
Age 2 3 4 4 4 5 6
Ages of the
children
Median of
Even Sample Size
Median is 4
Positioning Point
Median
   




n 1
2
8 1
2
4.5
4 4
2
4
Ages of
the
children
Number of Child 1 2 3 4 5 6 7 8
Age 2 3 4 4 4 5 6 60
MODE
• The MODE is the most
frequently occurring value in a
set of observations.
• The mode is not very useful for
numerical data that are
continuous. It is most useful for
numerical data that have been
grouped into classes.
Mode
Examples
• No Mode
Raw Data:10.3 4.9 8.9 11.7 6.3 7.7
• One Mode
Raw Data: 2 3 4 4 4 5 6
• More Than 1 Mode
Raw Data: 21 28 28 41 43 43
Range is defined as the difference in
value between the highest (maximum)
and the lowest (minimum) observation
e.g. for previous data the lowest value is
2 and highest is 6
Hence the range is 6-2 = 4
Measures of Variation
Cont…
• Variance is defined as the sum of the
squares of the deviation about the
sample mean divided by the total
number of items
• Standard deviation it is the square root
of the variance
Measures of Variation
Standard Deviation
• The STANDARD DEVIATION is a
measure, which describes how much
individual measurements differ, on the
average, from the mean.
• A large standard deviation shows that
there is a wide scatter of measured values
around the mean, while a small standard
deviation shows that the individual values
are concentrated around the mean with
little variation among them.
Variance &
Standard Deviation
8
Computation of Variance &
Standard Deviation
Given a sample consisting of 7
children with following age
distribution compute the variance and
standard deviation.
Ages in complete years
2 3 4
4 4 5
6
Solution
s
n
2
1


 (x x)
2

_
Step 1: Compute the
Sample Mean
 x
n
= 4
x =
_
Observations
(x)
(1)
2
3
4
4
4
5
6
28
Solution
= 10
(7 - 1)
= 1.66
s = 1.66
= 1.3
s
n
2
1


 (x x)
2

_
Observation Mean
( x ) ( x ) (x – x) (x - x )2
( 1 ) ( 2 ) (1)-(2) [(1)-(2)]2
2 4 -2 4
3 4 -1 1
4 4 0 0
4 4 0 0
4 4 0 0
5 4 1 1
6 4 2 4
10
Step 2: Compute the
sum of (x x )
2
OBJECTIVES ?
By the end of session participants are able to
Define Biostatistics, variable, population,
and sample.
Identify different types of variable.
Describe the difference between common
word meanings and the same words if used in
statistics
Compute Measures of Central tendency
Compute Measures of Dispersion
Describe purpose of calculating standard
deviation
Tutorial
A population was surveyed to find out
hemoglobin level of 9 females of
reproductive age. The hemoglobin
recorded were as follows
9, 9, 9, 6, 7, 9, 11, 12, 9
a) Compute measures of Central tendency
b) Compute Measures of Dispersion
For the data set in the following exercise compute
(a) the mean, (b) the median, (c) the mode, (d) the
range, (e) the variance, (f) the standard deviation,
treat the data set as sample. Select the measure of
central tendency that you think would be most
appropriate for describing the data. Give reasons
to justify your choice
The results of a study by Dosman et al. (A-9) allowed them to
conclude that breathing cold air increases the bronchial reactivity
to inhaled histamine in asthmatic patents. The study subjects
were seven asthmatic patients aged 19 to 33 years. The baseline
forced expiratory values (in liters per minute) for the subjects in
their sample were as follows:
3.94 1.47 2.06 2.36 3.74 3.43 3.78
source: J. A. Dosman, W. G. Hodgson, and D. W.
Cockcroft, “Effect of Cold Air on the Bronchial
Response to Inhaled Histamine in Patients with
Asthma,” American Review of Respiratory Disease, 144
(1991), 45-50.
Home Assignment
Thankyou

Intro to Biostat. ppt

  • 1.
  • 2.
    Statistics refers tonumerical facts. how data are  Collected  Organized  Summarized  Presented  Analyzed  Interpreted Application of this knowledge in health Sciences is Known as Biostatistics
  • 3.
    Definitions Population is aset of measurement of interest to the sample collector. It may or may not be living e.g population of Karachi, All TB patients in Karachi Sample is any subset of measurements selected from the population e.g. All TB patients coming to CHK during November 2005 Element – Every single unit in the sample is known as element.It may also be defined as an entity on which measurements are obtained. Every TB patient coming to CHK in November 2005 Cont…
  • 4.
    Observation – Setof measurement obtained for each element e.g. recording of age wt, Hb, cholesterol etc Variables – Any thing which changes its value in different places or things e.g. Age, wt Ht, blood sugar, Hb, Electrolyte etc Data – Facts and figures collected, summarised and analysed.
  • 5.
    MEASURES OF CENTRAL TENDENCY •Mean • Median • Mode
  • 6.
    MEAN The MEAN (orarithmetic mean) is also known as the AVERAGE. It Is calculated by totaling the results of all the observations and dividing by the total number of observations. Note that the mean can only be calculated for numerical data.
  • 7.
    x  x n 6 + 4 + 4 + 3 + 2 + 4 + 5 7  28 7  4 years Average age of children in years Children 1 2 3 4 5 6 7 Age 6 4 4 3 2 4 5 MEAN Cont…
  • 8.
    • Advantages: 1) Familiarand intuitively clear to most people 2) Every data set has one and only one mean 3) Useful for performing statistical procedures • Disadvantages: 1) May be affected by extreme values 2) Tedious to compute Advantages / Disadvantages of the Mean
  • 9.
    Median 1. Measure ofcentral tendency 2. Middlemost or most central item in the set of ordered numbers – If odd n, middle value of sequence – If even n, average of 2 middle values 3. Not affected by extreme values
  • 10.
    Calculating Median Median =item in the data array ( )th n + 1 2 Number of items in the array
  • 11.
    Median of Odd SampleSize Median PositioningPoint Median       n 1 2 7 1 2 4 0 4 . Number 1 2 3 4 5 6 7 of Child Age 2 3 4 4 4 5 6 Ages of the children
  • 12.
    Median of Even SampleSize Median is 4 Positioning Point Median         n 1 2 8 1 2 4.5 4 4 2 4 Ages of the children Number of Child 1 2 3 4 5 6 7 8 Age 2 3 4 4 4 5 6 60
  • 13.
    MODE • The MODEis the most frequently occurring value in a set of observations. • The mode is not very useful for numerical data that are continuous. It is most useful for numerical data that have been grouped into classes.
  • 14.
    Mode Examples • No Mode RawData:10.3 4.9 8.9 11.7 6.3 7.7 • One Mode Raw Data: 2 3 4 4 4 5 6 • More Than 1 Mode Raw Data: 21 28 28 41 43 43
  • 15.
    Range is definedas the difference in value between the highest (maximum) and the lowest (minimum) observation e.g. for previous data the lowest value is 2 and highest is 6 Hence the range is 6-2 = 4 Measures of Variation Cont…
  • 16.
    • Variance isdefined as the sum of the squares of the deviation about the sample mean divided by the total number of items • Standard deviation it is the square root of the variance Measures of Variation
  • 17.
    Standard Deviation • TheSTANDARD DEVIATION is a measure, which describes how much individual measurements differ, on the average, from the mean. • A large standard deviation shows that there is a wide scatter of measured values around the mean, while a small standard deviation shows that the individual values are concentrated around the mean with little variation among them.
  • 18.
  • 19.
    Computation of Variance& Standard Deviation Given a sample consisting of 7 children with following age distribution compute the variance and standard deviation. Ages in complete years 2 3 4 4 4 5 6
  • 20.
    Solution s n 2 1    (x x) 2  _ Step1: Compute the Sample Mean  x n = 4 x = _ Observations (x) (1) 2 3 4 4 4 5 6 28
  • 21.
    Solution = 10 (7 -1) = 1.66 s = 1.66 = 1.3 s n 2 1    (x x) 2  _ Observation Mean ( x ) ( x ) (x – x) (x - x )2 ( 1 ) ( 2 ) (1)-(2) [(1)-(2)]2 2 4 -2 4 3 4 -1 1 4 4 0 0 4 4 0 0 4 4 0 0 5 4 1 1 6 4 2 4 10 Step 2: Compute the sum of (x x ) 2
  • 22.
    OBJECTIVES ? By theend of session participants are able to Define Biostatistics, variable, population, and sample. Identify different types of variable. Describe the difference between common word meanings and the same words if used in statistics Compute Measures of Central tendency Compute Measures of Dispersion Describe purpose of calculating standard deviation
  • 23.
    Tutorial A population wassurveyed to find out hemoglobin level of 9 females of reproductive age. The hemoglobin recorded were as follows 9, 9, 9, 6, 7, 9, 11, 12, 9 a) Compute measures of Central tendency b) Compute Measures of Dispersion
  • 24.
    For the dataset in the following exercise compute (a) the mean, (b) the median, (c) the mode, (d) the range, (e) the variance, (f) the standard deviation, treat the data set as sample. Select the measure of central tendency that you think would be most appropriate for describing the data. Give reasons to justify your choice The results of a study by Dosman et al. (A-9) allowed them to conclude that breathing cold air increases the bronchial reactivity to inhaled histamine in asthmatic patents. The study subjects were seven asthmatic patients aged 19 to 33 years. The baseline forced expiratory values (in liters per minute) for the subjects in their sample were as follows: 3.94 1.47 2.06 2.36 3.74 3.43 3.78 source: J. A. Dosman, W. G. Hodgson, and D. W. Cockcroft, “Effect of Cold Air on the Bronchial Response to Inhaled Histamine in Patients with Asthma,” American Review of Respiratory Disease, 144 (1991), 45-50.
  • 25.
  • 26.