stats_chap02_notes(L2,3,4) frequency distribution

Chapter 2
Frequency Distributions
and Graphs

Chapter 2 Overview
Introduction
 2-1 Organizing Data
 2-2 Histograms, Frequency
Polygons, and Ogives
 2-3 Other Types of Graphs
 2-4 Paired Data and Scatter Plots
2
Bluman, Chapter 2

Chapter 2 Objectives
1. Organize data using frequency distributions.
2. Represent data in frequency distributions
graphically using histograms, frequency
polygons, and ogives.
3. Draw and interpret a stem and leaf plot.
4. Draw and interpret a scatter plot for a set of
paired data.
3
Bluman, Chapter 2

2-1 Organizing Data
 Data collected in original form is called
raw data.
 A frequency distribution is the
organization of raw data in table form,
using classes and frequencies.
 Data that can be placed in categories is
organized in categorical frequency
distributions.
4
Bluman, Chapter 2

Categorical Frequency Distribution
Twenty-five army indicates were given a blood
test to determine their blood type.
Raw Data: A,B,B,AB,O O,O,B,AB,B
B,B,O,A,O A,O,O,O,AB AB,A,O,B,A
Construct a frequency distribution for the data.
5
Bluman, Chapter 2

Categorical Frequency Distribution
Twenty-five army indicates were given a blood
test to determine their blood type.
Raw Data: A,B,B,AB,O O,O,B,AB,B
B,B,O,A,O A,O,O,O,AB AB,A,O,B,A
6
Bluman, Chapter 2
Categories Frequency
A 5
B 7
O 9
AB 4

Bluman, Chapter 2 7
 Ungrouped frequency distribution: It shows the frequency of an item in each
separate data value rather than groups of data values.
 Grouped frequency distribution: In this type, the data is arranged and separated
into groups called class intervals. The frequency of data belonging to each class
interval is noted in a frequency distribution table. The grouped frequency table
shows the distribution of frequencies in class intervals.
 Relative frequency distribution: It tells the proportion of the total number of
observations associated with each category.
 Cumulative frequency distribution: It is the sum of the first frequency and all
frequencies below it in a frequency distribution. You have to add a value with the
next value then add the sum with the next value again and so on till the last. The
last cumulative frequency will be the total sum of all frequencies.

Grouped Frequency Distribution
 Grouped frequency distributions are
used when the range of the data is large.
 The smallest and largest possible data
values in a class are the lower and
upper class limits. Class boundaries
separate the classes.
 To find a class boundary, average the
upper class limit of one class and the
lower class limit of the next class.
8
Bluman, Chapter 2

 The class width can be calculated by
subtracting
successive lower class limits (or boundaries)
successive upper class limits (or boundaries)
upper and lower class boundaries
 The class midpoint Xm can be calculated
by averaging
upper and lower class limits (or boundaries)
9
Bluman, Chapter 2

 Mutually exclusive- overlapping.
1. Also known as overlapping classification
2. UCL of an interval and LCL of next interval are same.
3. This is usually applicable for continuous variable.
4. An observation which is equivalent to common class limit is excluded from the
class interval where it is UCL and taken in the class where it is LCL.
 Mutually inclusive- non-overlapping.
1. Also known as non-overlapping classification.
2. There is no common class limit between two intervals.
3. This is usually applicable for discrete variable.

Bluman, Chapter 2 10

Rules for Classes in Grouped
Frequency Distributions
1. There should be 5-20 classes.
2. The classes must be mutually exclusive.
3. The classes must be continuous.
4. The classes must be exhaustive.
5. The classes must be equal in width
(except in open-ended distributions).
11
Bluman, Chapter 2

Constructing a Grouped Frequency
Distribution
The following data represent the record
high temperatures for each of the 50 states.
Construct a grouped frequency distribution
for the data using 7 classes.
112 100 127 120 134 118 105 110 109 112
110 118 117 116 118 122 114 114 105 109
107 112 114 115 118 117 118 122 106 110
116 108 110 121 113 120 119 111 104 111
120 113 120 117 105 110 118 112 114 114
12
Bluman, Chapter 2

Distribution
STEP 1
Make the classes
13
Bluman, Chapter 2

Distribution
STEP 2 Tally the data.
STEP 3 Find the frequencies.
STEP 4 Find the cumulative frequencies by
keeping a running total of the frequencies.
14
Bluman, Chapter 2

Distribution
For convenience sake, we will choose the lowest
data value, 100, for the first lower class limit.
The subsequent lower class limits are found by
adding the width to the previous lower class limits.
Class Limits
100 -
105 -
110 -
115 -
120 -
125 -
130 -
104
109
114
119
124
129
134
The first upper class limit is one
less than the next lower class limit.
The subsequent upper class limits
are found by adding the width to the
previous upper class limits.
15
Bluman, Chapter 2

Distribution
The class boundary is midway between an upper
class limit and a subsequent lower class limit.
104,104.5,105
Class
Limits
Class
Boundaries
Frequen
cy
Cumulative
Frequency
100 - 104
105 - 109
110 - 114
115 - 119
120 - 124
125 - 129
130 - 134
99.5, 104.5
104.5,109.5
109.5, 114.5
114.5, 119.5
119.5, 124.5
124.5, 129.5
129.5,134.5
2
8
18
13
7
1
1
2
2+8=10
10+18=28
28+13=41
41+7=48
48+1=49
49+1=50
16
Bluman, Chapter 2

Type of cumulative frequency
Lesser Than Cumulative Frequency
Lesser than cumulative frequency is obtained by adding
successively the frequencies of all the previous classes including
the class against which it is written. The cumulate starts from the
lowest to the highest size. In other words, when the number of
observations is less than the upper boundary of a class that's
when it is called lesser than cumulative frequency.
Greater Than Cumulative Frequency
Greater than cumulative frequency is obtained by finding the
cumulative total of frequencies starting from the highest to the
lowest class. It is also called more than type cumulative
frequency. In other words, when the number of observations is
more than or equal to the lower boundary of the class that's when
it is called greater than cumulative frequency.

Let us look at example to understand the two types.
Height f UCB c.f( less than) LCB c.f(more
than)
140-145 2 145.5 2 139.5 15
145-150 5 150.5 7 144.5 13
150-155 3 155.5 10 149.5 8
155-160 4 160.5 14 154.5 5
160-165 1 165.5 15 159.5 1

Relative Cumulative Frequency
Example: A car dealer wants to calculate the total sales for the past month and wants to
know the monthly sales in percentage after weeks 1, 2, 3, and 4. Create a relative
cumulative frequency table and present the information that the dealer needs.
Week No. of Cars Sold
1 10
2 17
3 14
4 11

Solution:
First total up the sales for the entire month:
10 + 17 + 14 + 11 = 52 cars
Then find the relative frequencies for each week by dividing the
number of cars sold that week by the total:
•The relative frequency for the first week is: 10/52 = 0.19
•The relative frequency for the second week is: 17/52 = 0.33
•The relative frequency for the third week is: 14/52 = 0.27
•The relative frequency for the fourth week is: 11/52 = 0.21
To find the relative cumulative frequencies, start with the frequency
for week 1, and for each successive week, total all of the previous
frequencies

21
Week Cars Sold Relative Frequency
Cumulative
Frequency
1 10 0.19 0.19
2 17 0.33 0.19 + 0.33 = 0.52
3 14 0.27 0.52 + 0.27 = 0.79
4 11 0.21 0.79 + 0.21 = 1
Note that the first relative cumulative frequency is always the
same as the first relative frequency, and the last relative
cumulative frequency is always equal to 1.

Histograms, Frequency Polygons,
and Ogives (Grouped Data)
Most Common Graphs in Research
1. Histogram
2. Frequency Polygon
3. Cumulative Frequency Polygon (Ogive)
22
Bluman, Chapter 2

2-2 Histograms, Frequency
Polygons, and Ogives
The histogram is a graph that
displays the data by using vertical
bars of various heights to represent
the frequencies of the classes.
The class boundaries are
represented on the horizontal axis.
23
Bluman, Chapter 2

Histograms
A histogram is the graphical representation of data where data
is grouped into continuous number ranges and each range
corresponds to a vertical bar.
 It is convenient way to represent FD
 Comparison between frequency of two different class is
possible.
 It is useful to calculate mode also
How to Make a Histogram?
The process of making a histogram using the given data is
described below:
 Convert CL into CB and plot x -axis
 From rectangles, taking class intervals as a base (x-axis)
 And frequency as length (y axis)
 Use frequency density in case of uneven length
24
Bluman, Chapter 2

Example: Construct a histogram for the following frequency distribution table that
describes the frequencies of weights of 25 students in a class.
Weights (in lbs) Frequency (Number of students)
65 - 70 4
70 - 75 10
75 - 80 8
80 - 85 4

Definition of Frequency Polygons
Frequency Polygons can be defined as a form of a
graph that interprets information or data that is
widely used in statistics. This visual form of data
representation helps in depicting the shape and
trend of the data in an organized and systematic
manner. Frequency polygons through the shape of
the graph depict the number of occurrence of
class intervals. This type of graph is usually drawn
with a histogram but can be drawn without
a histogram as well. While a histogram is a graph
with rectangular bars without spaces, a frequency
polygon graph is a line graph that represents
cumulative frequency distribution data.
27
Bluman, Chapter 2

Steps to Construct Frequency
Polygons
28
Bluman, Chapter 2
Step 1: Calculate the midpoint of each of the class intervals which is the
class marks.
Step 2: Once the classmarks are obtained, mark them on the x-axis.
Step 3: Since the height always depicts the frequency, plot the frequency
according to each class mark. It should be plotted against the classmark
itself.
Step 4: Once the points are marked, join them with a line segment similar
to a line graph.
The curve that is obtained by this line segment is
the frequency polygon.

Frequency Polygons
29
Bluman, Chapter 2

Ogives
 The ogive is a graph that represents the
cumulative frequencies for the classes in
a frequency distribution.
30
Bluman, Chapter 2
More Than Cumulative Frequency Curve
In the more than cumulative frequency curve or ogive, we use
the lower limit of the class to plot a curve on the graph. The
curve or ogive is constructed by subtracting the total from first-
class frequency, then the second class frequency, and so on.

More Than Cumulative Frequency Curve
The steps to plot a more than curve or ogive are:
 Step 1: Mark the lower limit on the x-axis
 Step 2: Mark the cumulative frequency on the y-axis.
 Step 3: Plot the points (x,y) using lower limits (x) and their corresponding
Cumulative frequency (y).
 Step 4: Join the points by a smooth freehand curve.

Less Than Cumulative
Frequency Curve
 In the mess than cumulative frequency curve or ogive, we use
the upper limit of the class to plot a curve on the graph. The
curve or ogive is constructed by adding the first-class
frequency to the second class frequency to the third class
frequency, and so on. The downward cumulation result is less
than the cumulative frequency curve. The steps to plot a less
than cumulative frequency curve or ogive are:
 Step 1: Mark the upper limit on the x-axis
 Step 2: Mark the cumulative frequency on the y-axis.
 Step 3: Plot the points (x,y) using upper limits (x) and their
corresponding Cumulative frequency (y).
 Step 4: Join the points by a smooth freehand curve.

Example: Graph the two ogives for the following frequency distribution of the weekly wages
of the given number of workers.
Weekly wages No. of workers C.F. (Less than) C.F. (More than)
0-20 4 4 18 (total)
20-40 5 9 (4 + 5) 14 (18 - 4)
40-60 6 15 (9 + 6) 9 (14 - 5)
60-80 3 18 (15 + 3) 3 (9 - 6)

Less than curve or ogive:
Mark the upper limits of class intervals on the x-axis and
take the less than type cumulative frequencies on the y-
axis. For plotting less than type curve, points (20,4),
(40,9), (60,15), and (80,18) are plotted on the graph and
these are joined by freehand to obtain the less than ogive.
Greater than curve or ogive:
Mark the lower limits of class intervals on the x-axis and
take the greater than type cumulative frequencies on the y-
axis. For plotting greater than type curve, points (0,18),
(20,14), (40,9), and (60,3) are plotted on the graph and
these are joined by freehand to obtain the greater than type
ogive.

 Question:
The following represents scores that a class of 20 students received on their most
recent Biology test. Plot a less than type Ogive.
58, 79, 81, 99, 68, 92, 76, 84, 53, 57, 81, 91, 77, 50, 65, 57, 51, 72, 84, 89
Question 2.
In a city, the weekly observations made in a study on the cost of a living
index are given in the following table: Draw a frequency polygon for the
data below with a histogram.

37
Bluman, Chapter 2
Cost of Living Index Number of weeks
140 - 150 2
150 - 160 8
160 - 170 14
170 - 180 20
180 - 190 10
190 - 200 6
Total 60

Question 3:
If the weight range for a class of 45
students is distributed by 35 - 45, 45 -
55, 55 - 65, 65 - 75. What would be the
class marks for each weight range?
Question 4: Form a frequency
distribution from the following data ,
taking 4 as the magnitude of class
interval.
39
Bluman, Chapter 2

Shapes of Distributions
40
Bluman, Chapter 2

Shapes of Distributions
41
Bluman, Chapter 2

2.3 Other Types of Graphs
Bar Graphs
42
Bluman, Chapter 2

2.4 Scatter Plots and Correlation
 A scatter plot is a graph of the ordered
pairs (x, y) of numbers consisting of the
independent variable x and the
dependent variable y.
 A scatter plot is used to determine if a
relationship exists between the two
variables.

Example 2-16: Wet Bike Accidents
A researcher is interested in determining if there is a
relationship between the number of wet bike accidents
and the number of wet bike fatalities. The data are for a
10-year period. Draw a scatter plot for the data.
Step 1: Draw and label the x and y axes.
Step 2: Plot each point on the graph.
No. of accidents, x 376 650 884 1162 1513 1650 2236 3002 4028 4010
No. of fatalities, y 5 20 20 28 26 34 35 56 68 55

Example 2-16: Wet Bike Accidents
No. of accidents, x 376 650 884 1162 1513 1650 2236 3002 4028 4010
No. of fatalities, y 5 20 20 28 26 34 35 56 68 55

Analyzing the Scatter Plot
1. A positive linear relationship exists when
the points fall approximately in an ascending
straight line from left to right and both the x
and y values increase at the same time.
2. A negative linear relationship exists when
the points fall approximately in a descending
straight line from left to right.
3. A nonlinear relationship exists when the
points fall in a curved line.
4. It is said that no relationship exists when
there is no discernable pattern of the points.

Analyzing the Scatter Plot
(a) Positive linear relationship (b) Negative linear relationship
(c) Nonlinear relationship (d) No relationship

Graphical Representation of the Frequency Distribution for
Ungrouped Data
Pie Chart: A pie chart is a type of graph that visually
displays data in a circular chart. It records data in a circular
manner and then it is further divided into sectors that show
a particular part of data out of the whole part.

Let us look at the following example of the following pie chart that
represents the ingredients used to prepare a butter cake.
Example: The whole pie represents a value of 100. It is divided into
10 slices or sectors. The various colors represent the ingredients
used to prepare the cake. What would be the exact quantity of each
of the ingredients represented in specific colors in the following pie
chart?
Solution: As we can see, the pie is divided into 10 slices or sectors.
To calculate the exact amount of ingredients that are added to the
cake, we divide the whole sector's value, i.e., 100 by the number of
sectors. So, 100 ÷ 10 = 10. Hence, looking at the color divisions
made in the pie chart we can conclude that:

Quantity of Flour 30
Quantity of Sugar 20
Quantity of Egg 40
Quantity of Butter 10
Pie Chart Formula
We know that the total value of the pie is always 100%. It is also
known that a circle subtends an angle of 360°. Hence, the total
of all the data is equal to 360°. Based on these, there are two
main formulas used in pie charts:
To calculate the percentage of the given data, we use the
formula: (Frequency ÷ Total Frequency) × 100
To convert the data into degrees we use the formula: (Given
Data ÷ Total value of Data) × 360°

Sometimes, the value of the components are expressed in
percentage. In such cases,

Example: Observe the following pie chart that represents the money spent by Ana at the
unfair. The indicated color shows the amount spent on each category. The total value of the
data is 20 and the amount spent on each category is interpreted as follows:
 Ice Cream - 4
 Toffees - 4
 Popcorn - 2
 Rides - 10

To convert this into pie chart percentage, we apply the formula: (Frequency ÷ Total
Frequency) × 100
Let us convert the above data into a percentage:
Amount spent on rides: (10/20)× 100 = 50%
Amount spent on toffees: (4/20)× 100 = 20%
Amount spent on popcorn: (2/20)× 100 = 10%
Amount spent on ice-cream: (4/20)× 100 = 20%
Steps to Construct Pie Chart
We use the following steps to construct a pie chart and using the above-mentioned formulas,
we can calculate the data.
 Step 1: Write all the data into a table and add up all the values to get a total.
 Step 2: To find the values in the form of a percentage divide each value by the total and
multiply by 100.
 Step 3: To find how many degrees for each pie sector we need, we take a full circle of
360° and use the formula: (Frequency/Total Frequency) × 360°
 Step 4: Once all the degrees for creating a pie chart are calculated, draw a circle (pie chart)
using the calculated measurements with the help of a protractor.

Example: Draw a pie diagram to represent the following data, which shows the
expenditure of paddy cultivation in 2 acres of land.
Also, 1. Find the percentage of the head in which more money had been spent?
2. What percentage of money was spent for seeds?

Solution:
Expenditure of paddy cultivation in 2 acres.

Converting into percentage, We have Wages =
[10000/36000] ×100% = 27.7%
2. ₹2000 was spent for seeds. Converting into percentage,
We have,
Seeds = [2000/36000] ×100% = 5.55%
Example: Construct a pie chart to visually display the
favorite fruits of the students in a class based on the given
data: Mango-45, Orange-30, Plum-15, Pineapple-30,
Melon-30.

What is Bar Graph?
A bar graph is a graph that shows complete data with rectangular bars and the
heights of bars are proportional to the values that they represent. The bars in the
graph can be shown vertically or horizontally. Bar graphs are also known as bar
charts and it is a pictorial representation of grouped data. It is one of the ways
of data handling. Bar graph is an excellent tool to represent data that are:
 independent of one another and
 that do not need to be in any specific order while being represented.
The bars give a visual display for comparing quantities in different categories.
The bar graphs have two lines, horizontal and vertical axis, also called the x and
y-axis along with the title, labels, and scale range.

Properties of Bar Graph
Some properties that make a bar graph unique and different from
other types of graphs are given below:
 All rectangular bars should have equal width and should have
equal space between them.
 The rectangular bars can be drawn horizontally or vertically.
 The height of the rectangular bar is equivalent to the data they
represent.
 The rectangular bars must be on a common base.

Types of Bar Graphs
Bar Graphs are mainly classified into two types:
 Vertical Bar Graph
 Horizontal Bar Graph
The bars in bar graphs can be plotted horizontally or vertically, but the most commonly
used bar graph is the vertical bar graph.
Vertical Bar Graphs
When the given data is represented vertically in a graph or chart with the help of
rectangular bars that show the measure of data, such graphs are known as vertical bar
graphs. The rectangular bars are vertically drawn on the x-axis, and the y-axis shows the
value of the height of the rectangular bars which represents the quantity of the variables
written on the x-axis.
Horizontal Bar Graphs
When the given data is represented horizontally by using rectangular bars that show the
measure of data, such graphs are known as horizontal bar graphs. In this type, the variables
or the categories of the data have to be written and then the rectangular bars are
horizontally drawn on the y-axis and the x-axis shows the length of the bars equal to the
values of different variables present in the data.

Pictograph Definition
A pictograph is a representation of data using images or
symbols. Pictographs in maths are typically used in concepts
like data handling. They help in laying the foundation for data
interpretation based on pictorial information. Now after knowing
the pictograph definition, let us understand pictographs using a
scenario.
Pictograph Example
A survey was conducted for 40 children by a fast food
junction to understand the demand for different flavors of
pizza available in their outlet. The results were as follows:

63
Can you identify the most loved flavor by observing the above table?
If 1 full pizza represents 4 children, then what would a
quarter slice represent?
The scenario that we discussed above represents information in a
pictographic manner. Here, the symbol for a full pizza is used to represent
data (i.e. the number of students). We need to do simple math to
understand how many children voted for each of the flavors.
Multiply the number of symbols for the given flavor with the
value of each symbol.
For example, the number of children who liked Pepperoni =
2× 4+(1/4)×4= 8+1 = 9
Key to a Pictograph
We use a key to denote the value of the symbol. In the above
example, the key was 1 icon of the pizza representing 4

stats_chap02_notes(L2,3,4) frequency distribution

Recommended

Recommended

More Related Content

Similar to stats_chap02_notes(L2,3,4) frequency distribution

Similar to stats_chap02_notes(L2,3,4) frequency distribution (20)

Recently uploaded

Recently uploaded (20)

stats_chap02_notes(L2,3,4) frequency distribution