Ch 3 DATA.doc

3
Source and presentation of data
[Q:
1. Define and classify data. (BSMMU, January 2009,
January 2011),]
2. Write short notes on: Data (BSMMU, January, 2010)]
A set of values recorded on one or more observational units is
called data.
Types of data
The statistical data can be divided into two broad categories:
1. Qualitative
2. Quantitative
1. Quantitative data are numerical arising from counts of
measurement
 Continuous- if the values of measurement take any
number in a range such as height (150cm-180cm),
weight (50kg-60kg) etc. You can count, order and
measure continuous data.
 Discrete (or fixed) - If the value of measurement are
integers (whole number), such as number of students in

Biostatistics-11
a class. Discrete data can be a fraction. For example:
total no. of students in a class.
2. Qualitative data – arise when individuals may fall to
separates classes and such classes has no numerical
relation with one another such as sex (male/female) skin
colour (brown/black/white ) eye colour (brown, blue) etc.
Difference between qualitative and quantitative data
qualitative quantitative
1. no magnitude
2. Persons with same
character are counted to
form group, e.g. attacked,
died etc.
3. Discrete.
4. Used mainly in
pharmacology.
5. result expressed as ratio,
proportion, percentage
etc.
1. have magnitude
2. Arranged by both
character and frequency.
3. discrete or continuous
4. Used mainly in anatomy
and physiology.
5. statistical methods are
employed to analyse such
data e.g. mean, range, SD
etc.
Outlier
 It is an unusually large or an unusually small value
compared to the others in a data set.
 An outlier might be the result of an error in measurement,
in which case it will distort the interpretation of the data,
having undue influence on many summary statistics, for
example, the mean.
 If an outlier is a genuine result, it is important because it
might indicate an extreme of behaviour of the process
under study. For this reason, all outliers must be examined
carefully before embarking on any formal analysis. Outliers
should not routinely be removed without further
justification.
Other classifications of data

Biostatistics-12
A. according to data collection procedure
a. Primary data: obtained by experiment.
b. Secondary data: obtained by review.
B. according to variables
a. univariable
b. bivariable
c. multivariable
C. according to compilation
a. Raw data: data before compilation
b. Derived data: calculated from primary value of
data.
Sources of data
The main sources for collection of medical statistics are:
1. Experiments
performed In the laboratories or in the hospital wards
2. Surveys
Surveys are carried out for epidemiological studies In the
field by trained teams to find the incidence or prevalence
of heath or disease situations,
3. Records
Records are maintained as a routine in registers or books
over a long period of time for various purposes such as for
vital statistics – births, marriages and deaths and for
illnesses
Organization of data
Organization of data is the shorting or classification of
collected data into characteristics group or classes as per age,
sex and social class etc. to make them concise, simple and
meaningful.
Analysis and Interpretation of data can be done by manually or
by computer. Different type of statistical method, techniques
and test are also utilized.
Presentation of Data

Biostatistics-13
Data should be presented in such a way that it
become concise without losing the details
arouse interest in the reader
become simple and meaningful to form impressions
need few words to explain
define the problem and suggest the solution too, and
become helpful in further analysis.
For good presentation of data, full labelling, simplicity, and
honesty are essential requirements.
Methods of Presentation of Data
[Q
1. Enumerate the methods of data presentation (BSMMU,
January 2010, July 2010)
2. Discuss different methods of data presentation.
(BSMMU, January, 2010, new curriculum)]
3. Discuss the methods of data presentation. (BSMMU,
January, 2009)]
There are three main methods of presenting frequencies of a
character or a variable.
1. Array (Arrangement)
 Ascending (lowest to highest): 1, 2, 4, 8, 10, and 11.
 Descending (highest to lowest): 11, 10, 8, 6, 4, 2, 1.
2. Tabulation
3. Drawing
Graphical (Drawing) methods are better suited than numerical
(Array, Tabulation) methods for identifying patterns in the data.
Numerical approaches are more precise and objective.
Since the numerical and graphical approaches compliment each
other, it is wise to use both
TABULATION

Biostatistics-14
Tabulation is devices for presenting data from a mass of
statistical data. Preparation of frequency distribution table is
the first requirement
Frequency distribution table
It groups large number of series or observations of master
table and presents the data very concisely.
Frequency is the number of individual in each group or the
count of individuals having a particular quality called frequency.
Cumulative Frequency is the running total of the frequencies.
Example:
Frequency Cumulative frequency:
4 4
6 10 (4 + 6)
3 13 (4 + 6 + 3)
2 15 (4 + 6 + 3 + 2)
6 21 (4 + 6 + 3 + 2 + 6)
4 25 (4 + 6 + 3 + 2 + 6 + 4)
Cumulative frequency is used to determine the number of
observations that lie above (or below) a particular value.
The cumulative frequency is found from a frequency
distribution table by adding each frequency to the sum of its
predecessor.
The last value will always equal the total for all observations, as
all frequencies will have been added.
Frequency distribution is the arrangement of data into class
intervals showing the frequency of each class. Frequency

Biostatistics-15
distribution describes how the data are distributed around the
mean
Frequency Distribution Table: Presentation of qualitative data of
height in markings
Heights of Markings
Groups in
cm
Markings Frequency
of
each group
160-161 10
162-163 15
164-165 17
166-167 19
168-169 20
170- 171 1 26
1 72-173 H-" 1-~ 1111 29
174~ 175 AAq , AW AW 30
176-177 HW 11 72
178-179
//// ////
//// //// ////
//// //// //// //
//// //// //// ////
//// //// //// ////
//// //// //// //// //// /
//// //// //// //// //// ////
//// //// //// //// //// ////
//// //// //// //// //
//// //// //
10
15
17
19
20
26
29
30
22
12
Total 200
Requirement of construction of frequency distribution
table
a. Range or Lowest value and highest value
b. Class interval
c. Whole set of data
d. Tally mark
Class interval methods

Biostatistics-16
1. Inclusive method – In this class interval upper limit of one
class intervals included in that class only and the class
interval is determined by taking the difference between the
upper or lower limits of adjacent two classes e.g. 20-29,
30-39, 40-49, 50-59 etc represents an inclusive series.
2. Exclusive method – In this class interval upper limit of one
class interval is the lower limit of succeeding class interval
and the class interval is determined by taking the difference
between the upper and lower limits of same class e.g. 25-
30, 30- 35, 35-40, 40 -45, 45-50 etc. The exclusive-types of
class-intervals can also be expressed as :
0 and below 10 or 0 - 9.9
10 and below 20 or 10 - 19.9
20 and below 30 or 20 - 29.9 and so on.
DRAWINGS
The frequencies of a characteristic can be presented by two
kinds of drawings- graphs and diagrams.
Presentation of quantitative, continuous or unmeasured
data is through graphs. The common graphs in use are:
1. Histogram
2. Frequency polygon
3. Frequency curve
4. Line chart or graph
5, Scatter or dot diagram.
Presentation of qualitative, discrete or counted data is
through diagrams. The common diagrams in use are:
1. Bar diagram
2. Pie or sector diagram
3. Pictogram or picture diagram
4. Map diagrams or spot map.
GRAPH
1. Histogram

Biostatistics-17
[Q: Write short notes on: i) Histogram(BSMMU, MD,
January 2011, July 2010)]
It is a graphical presentation of frequency distribution in which
variable characters of the different groups are Indicated on the
horizontal line (x-axis) called abscissa while frequency, i.e.,
number of observations is marked on the vertical line (y-axis)
called ordinate. Frequency of each group will form a column or
rectangle.
In table below the numbers of deaths from scarlet fever in a
certain study were as follows:
Age last birthday (years) 0- 1- 2- 3- 4- 5- 6- 7- 8- 9- 10- 15---19
Number
of deaths 18 43 50 60 36 24 22 21 6 5 14 3
A histogram of these figures is shown in Fig. below

Biostatistics-18
3. Frequency Polygon
[Q: Write shorts notes on: Frequency polygon(BSMMU, MD,
July, 2010)]
It is an area diagram of frequency distribution developed over
a histogram. Join the mid- points of class intervals at the height
of frequencies by straight lines. It gives a polygon, I.e., a figure
with many angle.
Lower
Limit
Upper Limit Count
25
30
35
40
45
50
30
35
40
45
50
55
1
4
8
15
3
1
A frequency table and a relative frequency polygon for
response times in a study on weapons and aggression are
shown below.

Biostatistics-19
4. Frequency curve (or normal distribution or Gaussian
distribution)
When the member of observation is very large and class
interval is reduced, the frequency polygon tends to loss its
angulations and gives rise to a smooth curve known as
frequency curve .Such a curve is obtained in normal distribution
of individual in a large sample. Here the frequency distribution
is symmetrical around a single peak so that mean, median and
mode coincide. It is constructed from the smallest frequencies
at the extremes of classification to the highest frequency at the
peak in the middle.
Characteristics of a normal curve:
1. It is bell shaped.
2. Mean, median & mode, coincide
3. It is symmetrical
4. It has two inflections
Score: N = 85, Mean = 488.447059, StdDv = 81.5223466, Max = 693, Min = 313
250 300 350 400 450 500 550 600 650 700 750
Score in the examination
0
2
4
6
8
10
12
14
16
18
20
22
24
No
of
obs

Biostatistics-20
4. Cumulative Frequency diagram or `Ogive'
[CUMULATIVE FREQUENCY
Cumulative frequency is used to determine the number of
observations that lie above (or below) a particular value.
The cumulative frequency is found from a frequency
distribution table by adding each frequency to the sum of its
predecessor.
The last value will always equal the total for all observations, as
all frequencies will have been added.]
Ogive is a graph of the cumulative frequency distribution. To
draw this, an ordinary frequency distribution table in a
quantitative data has to be converted into a cumulative
frequency table.
The cumulative frequencies are plotted corresponding to the
group limits of the characteristic. On joining the points by a
smooth free `hand curve, the diagram made Is called Ogive.
The blood cholesterol level of twenty-five over 50 years of age
sedentary workers was measured (to the nearest mg/dl) and
recorded as follows:
242, 228, 217, 209, 253, 239, 266, 242, 251, 240, 223, 219, 246,
260, 258, 225, 234, 230, 249, 245, 254, 243, 235, 231, 257.
A frequency distribution table and the cumulative frequency
will be as follows.
[The data ranges from 209 mg/dl to 266 mg/dl, so the data are
grouped in class intervals of 10 to produce the following table:]
blood
cholesterol
level (x)
Tally Frequency (f) Cumulative
frequency

Biostatistics-21
200-<210 I 1 1
210-<220 II 2 3
220-<230 III 3 6
230-<240 llll 5 11
240-<250 llll ll 7 18
250-<260 llll 5 23
260-<270 ll 2 25
5. Line chart
This is a frequency polygon presenting variation by line. It
shows the trend of an event occurring over a period of time -
rising, falling or showing fluctuations such as of cancer deaths,
infant mortality rate, birth rate, death rate etc. The class Interval
may be a month, a year, 5 years or 10 years.

Biostatistics-22
6. Scatter or Dot Diagram.
It is prepared after tabulation in which frequencies of at least
two variables have been cross classified. It is a graphic
presentation, made to show the nature of correlation between
two variable characters in the same person (s) or group(s)
such as height and weight in men aged 20 years. Hence it is
also called correlation diagram. The characters are read on the
base (height) and vertical (weight) axes and the perpendiculars
drawn from these readings meet to give one scatter point.
Varying frequencies of the characters give a number of such
points or dots that show a scatter. [Ref. Mohajon]

Biostatistics-23
DIAGRAM
1. Bar Diagram
[Q: Write short notes on: Bar-diagram. (BSMMU, January
2011, January, 2011 )]
Length of the bars, drawn vertical or horizontal, indicates the
frequency of a character.
There are three types of bar diagrams for comparison of data:
a. Simple,
b. multiple and
c. proportional bar diagram
a. Simple bar diagram: series of bars having space between
any two from equal to half the width of the bar and each
bar’s height represents the frequency of one variable.

Biostatistics-24
b. Multiple bar diagram: Two or multiple bars drawn side
by side in a group without leaving any gap. Each of the bar
in a group represent different phenomenon.
Fig.: Multiple bar diagram of drug consumption by different
castes in India

Biostatistics-25
c. Compound or proportional bar diagram .Each bar is
subdivided into two or more parts so that each part
represent the particular component of total value of each
bar.
Fig.: Compound or proportional bar diagram of journey by
different transport medium by people in a country.
[Q:
 How histogram differ from bar diagram? (BSMMU, July
2010).
 What is the distinction between a histogram and a bar
diagram? (BSMMU, July, 2009)]
2. Pie or sector diagram

Biostatistics-26
[Q: Write short notes on: (i) Pie diagram, (BSMMU, January,
2010)]
Fig.: pie chart of drug consumption in India
This is another way of presenting discrete data of qualitative
characters such as blood groups, age groups, sex groups,
causes of mortality or social groups in a population. The
frequencies of the groups are shown in a circle. Degrees of
angle denote the frequency and area of the sector. Size of
each angle is calculated by multiplying the class percentage
with 3.6. i.e,
360
100 or by the formula
0
×360
Class frequency
T
otal observations
= ×100
class frequency
class percentage
tota lobservation
3. Pictogram- Here the bars of the bar chart are replaced by
pictures and the pictures are drawn in horizontal line. Each
picture indicate a unit of 10, 20, 100 etc. of happenings and

Biostatistics-27
fraction of a unit has to be ignored though half (.5) may be
denoted by the half picture.
4. Map diagram shows the geographical distribution of
frequencies of a characteristics and the number of dot
denote the frequency in units. Two different dots may be
marked an area of map to show the attacked and death.
Fractions are ignored.
[Q: How component bar diagram differ from pie diagrams?
(BSMMU, January, 2010)]
Advantage of graphic presentation over tabular
presentation:
 Simple, appealing & attractive.
 Important tool in visual analysis.
 Enable easy comparison between related factors.
 More suitable for decision makers as these save time &
energy.

Ch 3 DATA.doc

Recommended

Recommended

More Related Content

Similar to Ch 3 DATA.doc

Similar to Ch 3 DATA.doc (20)

More from AbedurRahman5

More from AbedurRahman5 (19)

Recently uploaded

Recently uploaded (20)

Ch 3 DATA.doc