1. CHAPTER 2:
METHODS OF DATA COLLECTION
AND PRESENTATION
Objectives
At the end of this chapter students will be able to:
Arrange raw data in an array and then classified
data to construct a frequency table and a
cumulative frequency table.
To organize data using frequency distribution.
To present data using suitable graphs or
diagrams
2. 2.1 Methods of Data Collection
• Data: is the raw material of statistics. It can be
obtained by
Measurement
counting
Observing
3. Sources of data
• The statistical data may be classified under two
categories depending up on the sources.
1.Primary data: - Data collected by the investigator
or researcher by himself for the purpose of a
specific inquiry or study.
2.Secondary data: - When an investigator uses
data, which have already been collected by others.
Sources of secondary data are:-
books
Journals
reports etc.
4. 2.2. Methods of Data Presentation
• The presentation of data is broadly classified in to the
following two categories:
Tabular presentation
Diagrammatic and Graphic presentation
The process of arranging data in to classes or
categories according to similarities technically is called
classification.
It eliminates inconsistency and also brings out the
points of similarity and/or dissimilarity of collected
items/data.
Classification is necessary because it would not be
possible to draw inferences and conclusions if we have
a large set of collected [raw] data.
5. 2.2.1. Frequency distribution
• Frequency: is the number of times a certain value or class
of values occurs.
• Frequency distribution (FD): is the organization of raw
data in table form using classes and frequency. There are
three types of FD and there are specific procedures for
constructing each type.
• The three types are:-
I. Categorical FD
II. Ungrouped FD and
III. Grouped FD
6. Categorical FD
I. Categorical FD: Used for data that can be placed
in specific categories; such as nominal, ordinal level
of data.
Example 2.1: Twenty five patients were given a
blood test to determine their blood type. The data
is as shown below: A B B AB O A O O B AB B B B O
A O O O AB AB A O O B A.
Solution: Since the data are categorical by taking
the four blood types as classes we can construct a
FD as shown below.
7. coun’t
Step 1: Make a table as
shown below
Step 2: Tally data and
place the result under the
column Tally
Step 3: Count the tallies
and place the result under
the column frequency.
Step 4: find the
percentage of values in
each class by the formula
(%= f/n * 100%; where:
f= frequency,
n =total number of
observation.)
CLASS TALLY FREQUANCY PERCENRT
A //// 5 5/25* 100 =
20%
B //// // 7 28%
AB
//// 4 16%
O //// //// 9 9/25*100 =
36%
8. II. Ungrouped Frequency
Distribution (UFD)
UFD: is often constructed for small set of data or
data of discrete variable.
Constructing ungrouped frequency distribution:
First find the smallest and largest raw score in
the collected data.
Arrange the data in order of magnitude and
count the frequency.
To facilitate counting one may include a column
of tallies.
9. Coun’t
Example 2.2:The
following data represent
the number of days of
sick leave taken by each
of 50 workers of a
company over the last 6
weeks.
i. Construct ungrouped
frequency distribution
ii. How many workers
had at least 1 day of sick
leave?
iii. How many workers
had between 2 and 6
days of sick leave?
2 0 0 5 8 3 4 1 0 0 7 1
7 1 5 4 0 4 0 1 8 9 7 0
1 7 2 5 5 4 3 3 0 0 2 5
1 3 0 2 4 5 0 5 7 5 1 1
0 2
10. Coun’t
• Solution:
• i. Since this data set contains
only a relatively small number of
distinct or different values, it is
convenient to represent it in a
frequency table which presents
each distinct value along with its
frequency of occurrence.
• ii. Since 12 of the 50 workers had
no days of sick leave, the answer
is 50-12=38
• iii. The answer is the sum of the
frequencies for values 3, 4 and 5
that is 4+5+8=17
Class Frequen
cy
Cumulative
frequency
0 12 12
1 8 20
2 5 25
3 4 29
4 5 34
5 8 42
7 5 47
8 2 49
9 1 50
11. 3. Grouped Frequency Distribution (GFD)
When the range of the data is large the data must be
grouped in to classes that are more than one unit in
width.
Definition of some basic terms
Grouped frequency distribution: is a FD when several
numbers are grouped into one class.
Class limits (CL): It separate one class from another.
The limits could actually appear in the data and have
gaps between the upper limits of one class and the
lower limit of the next class.
Unit of measure (U): This is the possible difference
between successive values. E.g. 1, 0.1, 0.01,0.001, etc
12. Coun’t
Class boundaries: Separate one class in a grouped
frequency distribution from the other. The boundary has
one more decimal place than the raw data. There is
no gap between the upper boundaries of one class and
the lower boundaries of the succeeding class. Lower class
boundary is found by subtracting half of the unit of
measure from the lower class limit and upper class
boundary is found by adding half unit measure to the
upper class limit.
Class width (W): The difference between the upper and
lower boundaries of any consecutive class. The class
width is also the difference between the lower limit or
upper limits of two consecutive class.
13. Coun’t
Class mark (Midpoint): It is found by adding the lower
and upper class limit (boundaries) and divided the sum
by two.
Cumulative frequency: It is the number of observation
less than or greater than the upper class boundary of
class.
CF (Less than type): it is the number of values less than
the upper class boundary of a given class.
CF (Greater than type): it is the number of values
greater than the lower class boundary of a given class.
Relative frequency (Rf ):The frequency divided by the
total frequency. This gives the present of values falling
in that class.
Rfi = fi/n= fi/ ∑fi
14. Coun’t
Relative cumulative frequency (RCf): The running total of
the relative frequencies or the cumulative frequency
divided by the total frequency gives the present of the
values which are less than the upper class boundary or the
reverse.
CRfi = Cfi/n= Cfi/∑fi
15. Coun’t
STEPS IN CONSTRUCTING A GFD
1. Find the highest and the smallest value
2. compute the range; R = H – L
3. Select the number of class desired (K)
I. Choose arbitrary between 5 and 15.
II. Using struggles formula K= 1 + 3.322Logn ;
n = Total frequency
4. Find the class width (W) by dividing the range by the
number of classes and round to the nearest integer the
result you get. W = R/K
5. Identify the unit of measure usually as 1, 0.1, 0.01,
16. Coun’t
6.Pick a suitable starting point less than or equal to
the minimum value. Your starting point is lower
limit of the first class.Then continue to add the
class width to get the rest lower class limits.
7. Find the upper class limits UCLi = LCLi-U. then
continue to add width to get the rest upper class
limits
8.find class boundaries- LCBi = LCLi – ½ U
- UCBi = UCLi + ½ U
9. Find class mark
CMi = (UCLi + LCLi)/ 2 or CMi = (UCBi + LCBi)/ 2.
17. Coun’t
10. Tally the data
11. Find the frequencies
12. Find the cumulative frequencies .Depending on
what you are trying to accomplish, it may be
necessary to find the cumulative frequency.
13. If necessary find Rf and RCf.
• Example 2.3: The blood glucose level, in milligrams
per deciliter, for 60 patients is shown below.
Construct a grouped frequency distribution for the
data set.
18. Coun’t
55 70 85 90 93 86 103 74 92
63 10
1
83 82 100 97 97 10
9
84
84 75 92 68 114 84 101 81 91
82 11
5
86 69 59 56 84 77 90
77 97 80 101 61 74 87 80
58 81 78 88 86 59 82 83
59 78 116 72 62 105 65 78
• Solution:-
1) Highest value = 116,
Lowest value = 55
2) Range = 116 – 55 = 61
3) K = 1+ 3.322Log60 =
1 + 3.322(1.78) = 6.9 ≈ 7
4) W = R / K = 61/7 = 8.7 ≈ 9
5) U = 1
6) LCL1=55
7) Find the upper class
limits.
8) Find class boundaries
9) Find class mark
20. 2.2.2. Diagrammatic presentation of data:
Bar charts, Pie-chart, Cartograms
The most convenient and popular way of describing
data is using graphical presentation.
It is easier to understand and interpret data when
they are presented graphically than using words or a
frequency table.
A graph can present data in a simple and clear way
The three most commonly used diagrammatic
presentation for discrete as well as qualitative data
are: Pie charts
Bar charts
Pictogram
21. pie chart
A. pie chart is a circle that is divided in to sections or
wedges according to the percentage of frequencies in
each category of the distribution. The angle of the
sector is obtained using:
Example 2.4:Using the immunization status of children
in certain area given in example 2.5, draw the pie
chart.
35.50%
27.20%
37.01%
relative frequency in percent
Not immunized
parcially immunized
fully immunized
22. Bar Charts
Bar Charts
Used to represent & compare the frequency
distribution of discrete variables and attributes or
categorical series.
Bars can be drawn either vertically or horizontally.
In presenting data using bar diagram
All bars must have equal width and the distance
between bars must be equal.
The height or length of each bar indicates the size
(frequency) of the figure represented.
23. Coun’t
There are different types of bar charts.
The most common being:
Simple bar chart
Component or sub divided bar chart.
Multiple bar charts.
I.Simple bar chart
Are used to display data on one variable.
They are thick lines (narrow rectangles)
having the same breadth. The magnitude
of a quantity is represented by the height
/length of the bar.
24. I . Simple bar chart
Example 2.5 Consider the immunization status
of children in certain area;
immunization
Status (class)
number/
frequency
Relative frequency
In percentage
Non immunized 75 37.5%
Partially immunized 57 27.2%
Fully immunized 78 37.1%
total 210 100%
Draw a simple bar chart of the immunization
status of children.
26. Component Bar chart
II. Component Bar chart
When there is a desire to show how a total
(or aggregate) is divided in to its component
parts, we use component bar chart.
The bars represent total value of a variable
with each total broken in to its component
parts and different colors or designs are used
for identifications
Example 2.6:Consider data on immunization
status of women by marital status
27. Coun’t
Marital status Immunization status
immunization Not immunization
No % No % Total
Single
58 24.7 177 75.3
235
married 156 34.7 294
65.3
450
divorced 10 35.7 18
64.3 28
widowed 7 50.0 7
50 14
total 231 31.0 496
68.2 727
Draw a component (sub-divided) bar chart of the
immunization status of women by marital status
29. III. Multiple Bar charts
III. Multiple Bar charts
These are used to display data on more than one
variable.
They are used for comparing different variables at
the same time.
• Example 2.7: Draw a multiple bar chart to
represent the immunization status of women
by marital status given in Example 2.6.
Solution:
31. 2.2.4 Graphical Presentation of data
2.2.4 Graphical Presentation of
data
The histogram, frequency polygon
and cumulative frequency graph or
ogive is most commonly applied
graphical representation for
continuous data.
32. Coun’t
• Procedures for constructing statistical
graphs:
Draw and label the x and y axes.
Choose a suitable scale for the frequencies or
cumulative frequencies and label it on the y-
axes.
Represent the class boundaries for the
histogram or ogive or the mid points for the
frequency polygon on the x-axes.
Plot the points.
Draw the bars or lines to connect the points.
33. Histogram
Histogram:- a graph which displays the data
by using vertical bars of various heights to
represent frequencies.
Class boundaries are placed along the horizontal
axes.
Class marks and class limits are sometimes used as
quantity on the x-axis.
• Example 2.8:Construct a histogram to represent
the blood glucose level for 60 patients given in
example 2.3.
Solution:
35. Frequency polygon
Frequency polygon
If we join the mid-points of the tops of the
adjacent rectangles of the histogram with line
segments a frequency polygon is obtained.
When the polygon is continued to the x-axis just
outside the range of the lengths the total area
under the polygon will be equal to the total area
under the histogram.
• Example 2.9:Construct a Frequency polygon to
represent the following data.
36. Coun’t
Class
limit
frequency Class
mark
Class
boundaries
R.F %RF Less
than C.F.
More
than C.F.
15-24 3 19.5 14.5-24.5 0.06 6% 3 50
25-34 4 29.5 24.5-34.5 0.08 8% 7 47
35-44 10 39.5 34.5-44.5 0.20 20% 17 43
45-54 15 .49.5 44.5-54.5 0.30 30% 32 33
55-64 12 59.5 54.5-64.5 0.24 24% 44 18
65-74 4 69.5 64.5-74.5 0.08 8% 48 6
75-84 2 79.5 74.5-84.5 0.04 4% 50 2
37. Coun’t
Solution:
Adding two class marks with fi = 0, we have 9.5
at the beginning, and 89.5 at the end, the
following frequency polygon is plotted.
0
5
10
15
20
9-Jan 19.5 29.5 39.5 .49.5 59.5 69.5 79.5 89.5
frequency
38. Ogive (cumulative frequency polygon)
An Ogive (pronounced as “oh-jive”) is a line that
depicts cumulative frequencies, just as the
cumulative frequency distribution lists
cumulative frequencies.
Note that the Ogive uses class boundaries along
the horizontal scale, and graph begins with the
lower boundary of the first class and ends with
the upper boundary of the last class.
Ogive is useful for determining the number of
values below some particular value.
39. Coun’t
There are two type of Ogive namely less than
Ogive and more than Ogive.
The difference is that less than Ogive uses less
than cumulative frequency and more than
Ogive uses more than cumulative frequency on
y-axis.
Example 2.10: i) Draw a less than Ogive for data
of blood glucose level of the 60 patients given in
Example 2.3.
41. Coun’t
0
10
20
30
40
50
60
14.5 24.5 34.5 44.5 54.5 64.5 74.5 84.5
More than cumulative frequency
Note: For both ogives, one class with frequency zero is
added for similar reason with the frequency polygon.