2. Descriptive statistics is the term given to the analysis of data that
helps describe, show or summarize data in a meaningful way such
that, for example, patterns might emerge from the data. Descriptive
statistics do not, however, allow us to make conclusions beyond the
data we have analysed or reach conclusions regarding any
hypotheses we might have made. They are simply a way to describe
our data.
Descriptive statistics are very important because if we simply
presented our raw data it would be hard to visualize what the data
was showing, especially if there was a lot of it. Descriptive statistics
therefore enables us to present the data in a more meaningful way,
which allows simpler interpretation of the data. For eg. if we had the
Temp. of 100 location, we may be interested in the overall temp. We
would also be interested in the distribution or spread of the temp.
Descriptive statistics allow us to do this. How to properly describe
data through statistics and graphs.
3. Strengths of descriptive statistics
Other than the clarity with which descriptive statistics can
clarify large volumes of data, there are no uncertainties
about the values you get (other than only measurement
error, etc.).
Limitations of descriptive statistics
Descriptive statistics are limited in so much that they only
allow you to make summations about the people or objects
that you have actually measured. You cannot use the data
you have collected to generalize to other people or objects
(i.e., using data from a sample to infer the
properties/parameters of a population). For example, if you
tested a drug to beat cancer and it worked in your patients,
you cannot claim that it would work in other cancer
patients only relying on descriptive statistics (but
inferential statistics would give you this opportunity).
4. Steps of Quantitative analysis
Tabulation of Data
Graphical Representation
Comparison of Distribution
5. Tabulation of Data
Daily rainfall data of station-A (in mm)
Rainfall
Tally Bar Frequency
48 I 1
50 I 1
51 II 2
52 IIII 5
55 IIII III 8
59 IIII IIII I 11
63 II 2
59 52 55 59 52
55 59 48 59 55
59 63 55 55
51 59 59 51
55 50 52 59
52 55 59 52
63 55 59 59
6. Frequency Distribution
Rainfall
(Class Interval)
No of Days
(Frequency)
30 – 35 5
35 – 40 6
40 – 45 11
45 – 50 18
50 – 55 19
55 – 60 15
60 – 65 13
65 – 70 1
70 – 75 2
90
Frequency is the number of occurrence of a repeating events per unit time
or number of time the events occurred in a experiments or study
7. Preparation of frequency distribution table:
Uses of Frequency Distribution
It is quite useful for data analysis.
It assists in estimating the frequencies of the population on
the basis of the ample.
It facilitates the computation of different statistical measures.
Following point should be kept in mind while
preparing a frequency distribution table
For the convenience in computation and comparison the class
interval should be kept equal unless there is some specific
reason;
The number of class interval should not ordinarily exceed 20
and should not in general be less then 10;
The class limits as well as class interval should be multiple of
5, as far as possible.
8. Types of frequency distributions.
1) Grouped frequency distribution
2) Ungrouped frequency distribution
3) Cumulative frequency distribution
4) Relative frequency distribution
5) Relative cumulative frequency distribution
9. Grouped Frequency Distribution
A grouped frequency distribution is an ordered listed of a
variable X, into groups in one column with a listing in a
second column, which is called the frequency column. A
grouped frequency distribution is an arrangement class
intervals and corresponding frequencies in a table.
Ungrouped Frequency Distribution
A frequency distribution with an interval width of 1 is
called ungrouped frequency distribution. Ungrouped
frequency distribution is an arrangement of the observed
values in ascending order. The ungrouped frequency
distribution are those data, which are not arranged in
groups. They are known as individual series.
10. Cumulative Frequency Distribution
One of the important types of frequency distribution is Cumulative frequency
distribution. In cumulative frequency distribution, the frequencies are shown
in the cumulative manner. The cumulative frequency for each class interval is
the frequency for that class interval added to the preceding cumulative total.
Cumulative frequency can also defined as the sum of all previous frequencies
up to the current point.
Relative Frequency Distribution
It's a distribution where we mention relative frequencies against each class
interval.. Relative frequency of a class is the frequency obtained by dividing
frequency by the total frequency. Relative frequency is the proportion of the
total frequency that is in any given class interval in the frequency distribution.
Relative Cumulative Frequency Distribution
The cumulative frequency of a class divided by the total frequency is called
relative cumulative frequency. It is also called percentage cumulative
frequency since it is expressed in percentage. The table showing relative
cumulative frequencies is called the relative cumulative frequency
distribution or percentage cumulative frequency distribution.
12. A histogram is a graphical display of data using bars of different
heights. In a histogram, each bar groups numbers into ranges. Taller
bars show that more data falls in that range. A histogram displays the
shape and spread of continuous sample data.
0
2
4
6
8
10
12
14
16
18
20
30 – 35 35 – 40 40 – 45 45 – 50 50 – 55 55 – 60 60 – 65 65 – 70 70 – 75
No
of
Days
Rainfal in MM
HISTOGRAM
13. A frequency polygon is almost identical to a histogram, which is used
to compare sets of data or to display a
cumulative frequency distribution. It uses a line graph to represent
quantitative data.
14. A frequency-curve is a smooth curve for which the total area is taken
to be unity. The frequency-curve for a distribution can be obtained by
drawing a smooth and free hand curve through the mid-points of the
upper sides of the rectangles forming the histogram.
0
2
4
6
8
10
12
14
16
18
20
30 – 35 35 – 40 40 – 45 45 – 50 50 – 55 55 – 60 60 – 65 65 – 70 70 – 75
No
of
Days
Rainfal in MM
FREQUENCY CURVE
15. Rainfall No of Days Less than type More than type
30 – 35 5 5 90
35 – 40 6 11 85
40 – 45 11 22 79
45 – 50 18 40 68
50 – 55 19 59 50
55 – 60 15 74 31
60 – 65 13 87 16
65 – 70 1 88 3
70 – 75 2 90 2
Cumulative Frequencies
A curve that represents the cumulative frequency distribution of grouped
data on a graph is called a Cumulative Frequency Curve or an Ogive.
Representing cumulative frequency data on a graph is the most efficient
way to understand the data.
Types: Les than type & more than type
16. Ogive
0
10
20
30
40
50
60
70
80
90
30 – 35 35 – 40 40 – 45 45 – 50 50 – 55 55 – 60 60 – 65 65 – 70 70 – 75
Cumulative
frequencies
Rainfall in mm
Less than type More than type
Cumulative Frequency Curve
18. Measure of Central Tendency
Frequency distribution of the raw represents the variation
of a variable. The differences between the distribution may
be highlighted in terms of their various characteristics. One
of these characteristics is the value around which each
distribution tend to have maximum concentration . This
value is best representative of the whole distribution and
the different way which is measured are known as Central
Tendency or the measure of location
19. Measure of Central Tendency
Mean: Arithmetic Mean (Simple & Weighted),
Geometric Mean & Harmonic Mean
Median: Quartile, Deciles & Percentiles
Mode
20. Arithmetic Mean (Ungrouped Data): 𝑋 =
𝑋
𝑁
Where 𝑋= Mean
𝑋 = Sum of all the observations
N = Number of Observations
So, to find out Mean (𝑋)
Sum of all observation ( 𝑋 )= 1675
No of Observations= (N)=30
𝑋 =
1675
30 = 55.83
59 52 55 59 52 59 63 55 55 63
55 59 48 59 55 51 59 59 51 55
55 50 52 59 52 55 59 52 59 59
1
21. Arithmetic Mean (Grouped Data): 𝑋 =
𝑓𝑋
𝑁
Direct Method
So, to find out Mean (𝑋)
Sum of all observation ( 𝑓𝑋 )= 1640
No of Observations= (N)=30
𝑋 =
1640
30 = 54.66
Rainfall in mm No. of days
45-50 2
50-55 15
55-60 11
60-65 2
Rainfall in mm No. of days (f) Mid Value (x) (fx)
45-50 2 47.5 95
50-55 15 52.5 787.5
55-60 11 57.5 632.5
60-65 2 62.5 125
N=30 fx =1640
1
22. Arithmetic Mean (Ungrouped Data):
𝑋 = 𝐴 +
𝑓𝑑
𝑁
Shortcut Method
Where
𝑋= Mean
A= Assumed Mean
d= Deviation of X from assumed mean A (x-A)
So, to find out Mean (𝑋)
Assumed Mean (A)= 50
Sum of all observation ( 𝑓𝑑 )= 175
No of Observations= (N)=30
𝑋 = 50 +
175
30
= 50+ 5.83
=55.83
2
Rainfall in mm
(x)
Assumed Mean
(A)
Deviation of x from
assumed mean (d)
59
50
9
52 2
55 5
59 9
52 2
59 9
63 13
55 5
55 5
63 13
55 5
59 9
48 -2
59 9
55 5
51 1
59 9
59 9
51 1
55 5
55 5
50 0
52 2
59 9
52 2
55 5
59 9
52 2
59 9
59 9
175
23. Arithmetic Mean (Grouped Data): 𝑋 = 𝐴 +
𝑓𝑑
𝑁
× 𝑖
Shortcut Method
--
So, to find out Mean (𝑋)
Assumed mean (A)= -5
Sum of all observation (N)= 365
Class Interval (i)=10
Temp °C No of Days (f) Mid Point (m)
or X
[m-(-5)/10] =
(d)
fd
-40 to -30 10 -35 -3 -30
-30 to -20 28 -25 -2 -56
-20 to -10 30 -15 -1 -30
-10 to – 0 42 -5 0 0
0 to 10 65 +5 +1 +65
10 to 20 180 +15 +2 +360
20 to 30 10 +25 +3 +30
N=365 𝒇𝒅=339
2
d=m-(A)/i
= - 35 – -5/ 10
=- 3
𝑿 = -5+
𝟑𝟑𝟗
𝟑𝟔𝟓
× 𝟏𝟎
= -5+9.92
= 4.92
-25—
5/10
24. Ci f m = x d = m-A/i fd
10-20 5 15 15-30/10= -1.5 -7.5
20-30 10 25 -0.5 -5.0
30-40 5 35 +0.5 2.5
40-50 15 45 +1.5 22.5
35 12.5
Calculate Mean using Shortcut Method
A=30
Mean= 30+3.6
=33.6
26. So, to find out Mean (𝑋) for each student:
Formula: 𝑋𝑤 =
𝑊𝑋
𝑊
Student GEO-01 𝑋=
720
10 = 72
Student GEO-02 𝑋=
695
10 = 69.5
Student GEO-03 𝑋=
740
10 = 74
Student GEO-04 𝑋=
658
10 = 65.8
27. Geometric Mean
The geometric mean is a type of average , usually used for growth rates,
like population growth or interest rates. While the arithmetic
mean adds items, the geometric mean multiplies items. Also, you can
only get the geometric mean for positive numbers.
Formula: GM (Xgeom)= ( 𝑖=1
𝑁
𝑥𝑖)1/N =
𝑛
𝑎1𝑎2 … . . 𝑎𝑛
Find the Geometric mean of following number
= 9
9 ∗ 11 ∗ 15 ∗ 7 ∗ 19 ∗ 20 ∗ 12 ∗ 11 ∗ 26
=9
13556743200
= 13.36
9 11 15 7 19 20 12 11 26
28. Harmonic Mean
Harmonic mean is used to calculate the average of a set of numbers. The number
of elements will be averaged and divided by the sum of the reciprocals of the
elements.
It is calculated by dividing the number of observations by the sum of reciprocal of
the observation.
it is appropriate for situations when the average of rates is desired.
(ungrouped Data )
Formula: HM=
𝑁
(
1
𝑥
)
=
𝑁
1
𝑥1
+
1
𝑥2
+⋯……+
1
𝑥𝑛
where N=9 & (
1
𝑥
)= 0.73
HM=
9
0.73
=12.38
X 1/X
9 0.11
11 0.09
15 0.07
7 0.14
19 0.05
20 0.05
12 0.08
11 0.09
26 0.04
(
1
𝑥
) = 0.73
29. if a vehicle travels a certain distance d
outbound at a speed x (e.g. 60 km/h)
and returns the same distance at a
speed y (e.g. 20 km/h), then its average
speed is the harmonic mean of x and y
(30 km/h) – not the arithmetic mean (40
km/h).
31. Calculate mean from the data provided to you.
Prepare a frequency distribution from the
data provided to you.
From frequency distributions table calculate
the mean using group data formula
32. Median
Median is the middle most
value of the observations after
arranging them in an ascending
or descending order
[The arithmetic mean of a
distribution, in which there are some
extremely high or low value , will
either over estimate or under
estimate the average position and
hence it is not best representative
value, in this situation the best
suitable is median]
Formula:
For Ungrouped Data
Median =
𝑛
2
𝑎𝑛𝑑/𝑜𝑟
𝑛+1
2
1
STATION
RAINFALL
(MM)
AKHUAPADA 13
BONTH 48
ANANDAPUR 12
CHAMPUA 16
CHANDBALI 92
DAITARI 65
GHATGAON 21
JAJPUR 18
JHUMPURA 21
JOSHIPUR 24
KARANJIA 27
KEON-GARH 33
KEONJHAR 36
RAJ KANIKA 93
SWAMPATNA 10
THAKUR-DA 18
1 SWAMPATNA 10
2 ANANDAPUR 12
3 AKHUAPADA 13
4 CHAMPUA 16
5 JAJPUR 18
6 THAKUR-DA 18
7 JHUMPURA 21
8 GHATGAON 21
9 JOSHIPUR 24
10 KARANJIA 27
11 KEON-GARH 33
12 KEONJHAR 36
13 BONTH 48
14 DAITARI 65
15 CHANDBALI 92
16 RAJ KANIKA 93
Ascending Order
Median Value is 21
34. Median
For Grouped Data
Median = l +
𝑛
2
−𝑐
𝑓
X h
Where
l=Lower limit in which the median lies
n=Total number of frequencies
c=Cumulative frequency BEFORE MEDIAN CLASS
f=frequency of the median class
h=class interval of the median class
l=20 n=16, c=6, f= 6, h=20
So, 20 + (
16
2
−6
6
)X 20 = 20 +
8−6
6
X 20 = 20 + 6.66
= 26.66
Rainfall No of Station
Cumulative
Frequency
0-20 6 6
20-40 6 12
40-60 1 13
60-80 1 14
80-100 2 16
35. Quartile
The values which divide the number of observation
into four equal part are known as Quartiles. There
are 3 quartile Q1, Q2 & Q3 which are below:
1st Quartile, Q1= l +
𝑛
4
−𝑐
𝑓
X h
2nd Quartile, Q2= Median
3rd Quartile Q3 = 𝑙 +
3𝑛
4
−𝑐
𝑓
X h
37. Deciles
Deciles are similar to quartiles. But while quartiles sort data into
four quarters, deciles sort data into ten equal parts. Where 5th
decile is Median.
Decile (Dm)= l +
𝑗𝑋𝑛
10
−𝑐
𝑓
X h Where j=1.2…..9
Percentiles
There are 99 percentiles which divided the observation into
hundred equal parts. The rth percentiles is given by.
Percentile (Pm)= l +
𝑟𝑋𝑛
100
−𝑐
𝑓
X h Where r=1,2,3……..99
39. Mode
The mode of a set of data values is the value that appears most often.
Example: 1, 2, 2, 3, 4, 7, 9
Mode= 2
The mode or modal value of the distribution is that value which has the
maximum frequency.
In case a frequency distribution the mode is computed by the following
formula.
Mode = 𝑙 + (
𝑓𝑚−𝑓1
2𝑓𝑚
−𝑓1
−𝑓2
ℎ)
Where.
l= lower boundary of modal class
h = size of modal class
fm =frequency corresponding to the modal class
f1 & f2 =frequency proceeding & following to the modal class
40. Mode = = 𝑙 + (
𝑓𝑚−𝑓1
2𝑓𝑚−𝑓1
−𝑓2
ℎ)
=15+(
150−35
2𝑋150−35−70
5)
= 15 + (
115
195
𝑥5) = 15+2.94
=17.94
Area of Land
(acres)
No. of
families
5 – 10 20
10 – 15 35
15 – 20 150
20 – 25 70
25 – 30 44
30 – 35 38
here:
l = 15
h = (20-15)= 5
fm = 150
F1 = 35
f2 = 70
Where.
l= lower boundary of modal class
h = size of modal class
fm =frequency corresponding to the modal
class
f1 & f2 =frequency proceeding & following to
the modal class
41. Temp °F
(X)
No of
Days
(f)
0-10 10
10-20 28
20-30 30
30-40 42
40-50 65
50-60 50
60-70 10
70-80 55
80-90 45
90-100 30
l = 40
h = (20-10)= 10
fm = 65
f1 = 42
f2 = 50
Mode = = 𝑙 + (
𝑓𝑚−𝑓1
2𝑓𝑚−𝑓1
−𝑓2
ℎ)
Answer = 46.05
42. SAMPLING
What is sampling?
Sampling is a statistical process whereby researchers
choose the type of the sample. The crucial point here is to
choose a good sample (a small part or quantity intended to
show what the whole is like.)
What is a population?
In sampling meaning, a population is a set of units that
we are interested in studying. These units should have at least
one common characteristic. The units could be people, cases
(organizations, institutions), and pieces of data (for example –
customer transactions).
What is a sample?
A sample is a part of the population that is subject to
research and used to represent the entire population as a
whole. What is crucial here is to study a sample that provides a
true picture of the whole group. Often, it’s not possible to
contact every member of the population. So, only a sample is
studied when conducting statistical or marketing research.
43. There are two basic types of sampling methods:
1. Probability sampling: 2. Non-probability sampling
1. Probability Sampling
In simple words, probability sampling (also known as random sampling or
chance sampling) utilizes random sampling techniques and principles to
create a sample. This type of sampling method gives all the members of a
population equal chances of being selected.
For example, if we have a population of 100 people, each one of the persons has
a chance of 1 out of 100 of being chosen for the sample.
Advantages of probability sampling:
A comparatively easier method of sampling
Lesser degree of judgment
High level of reliability of research findings
High accuracy of sampling error estimation
Can be done even by non-technical individuals
The absence of both systematic and sampling bias.
Disadvantages:
Monotonous work
Chances of selecting specific class of samples only
Higher complexity
Can be more expensive and time-consuming.
44. Types of Probability Sampling Methods
Simple Random Sampling
Stratified Random Sampling
Systematic Sampling
Cluster Random Sampling
45. Simple Random Sampling
This is the purest and the clearest probability sampling
design and strategy. It is also the most popular way of a
selecting a sample because it creates samples that are
very highly representative of the population.
Simple random is a fully random technique of selecting
subjects. All you need to do as a researcher is ensure that all
the individuals of the population are on the list and after
that randomly select the needed number of subjects.
This process provides very reasonable judgment as you
exclude the units coming consecutively. Simple random
sampling avoids the issue of consecutive data to occur
simultaneously.
46. Stratified Random Sampling
A stratified random sample is a population sample that
involves the division of a population into smaller groups,
called ‘strata’. Then the researcher randomly selects the
final items proportionally from the different strata.
It means the stratified sampling method is very
appropriate when the population is
heterogeneous. Stratified sampling is a valuable type of
sampling methods because it captures key population
characteristics in the sample.
In addition, stratified sampling design leads to increased
statistical efficiency. Each stratа (group) is highly
homogeneous, but all the strata-s are heterogeneous
(different) which reduces the internal dispersion. Thus,
with the same size of the sample, greater accuracy can be
obtained.
47. Systematic Sampling
This method is appropriate if we have a complete list of
sampling subjects arranged in some systematic order such as
geographical and alphabetical order.
The process of systematic sampling design generally includes
first selecting a starting point in the population and then
performing subsequent observations by using a constant
interval between samples taken.
This interval, known as the sampling interval, is calculated by
dividing the entire population size by the desired sample size.
For example, if you as a researcher want to create a systematic
sample of 1,000 workers at a corporation with a population of
10,000, you would choose every 10th individual from the list of
all workers.
48. Cluster Random Sampling
This is one of the popular types of sampling methods that
randomly select members from a list which is too large.
A typical example is when a researcher wants to choose
1000 individuals from the entire population of the India. It
is impossible to get a complete list of every individual. So,
the researcher randomly selects areas (such as cities) and
randomly selects from within those boundaries.
Cluster sampling design is used when natural groups occur
in a population. The entire population is subdivided into
clusters (groups) and random samples are then gathered
from each group.
Cluster sampling is a very typical method for research. It’s
used when you can’t get information about the whole
population, but you can get information about the clusters.
The cluster sampling requires heterogeneity in the clusters
and homogeneity between them. Each cluster must be a
small representation of the whole population.
49. Types of Non-Probability Sampling Methods
There are many types of non-probability sampling
techniques and designs, but here we will list some of the
most popular.
Convenience Sampling
As the name suggests, this method involves collecting units
that are the easiest to access: your local school, the mall,
your nearest church and etc. It forms an accidental
sample. It is generally known as an unsystematic and
careless sampling method.
Respondents are those “who are very easily available for
interview”. For example, people intercepted on the street,
Facebook fans of a brand and etc.
This technique is known as one of the easiest, cheapest,
and least time-consuming types of sampling methods.
50. Quota Sampling
Quota sampling methodology aims to create a sample
where the groups (e.g. males vs. females workers)
are proportional to the population.
The population is divided into groups (also called
strata) and the samples are gathered from each
group to meet a quota.
For example, if your population has 40% female and
60% males, your sample should consist those
percentages.
Quota sampling is typically done to ensure the
presence of a specific segment of the population.
Judgment Sampling
Judgmental sampling is a sampling methodology
where the researcher selects the units of the
sample based on their knowledge. This type of
sampling methods is also famous as purposive
sampling or authoritative sampling.
51. Difference between non-probability and
probability sampling is that the first one does not
include random selection.
Non-probability sampling?
Non-probability sampling is a group of sampling
techniques where the samples are collected in a way
that does not give all the units in the population equal
chances of being selected. Probability sampling does
not involve random selection at all.
For example, one member of a population could have
a 10% chance of being picked. Another member could
have a 50% chance of being picked.
Most commonly, the units in a non-probability sample
are selected on the basis of their accessibility. They
can be also selected by the purposive personal
judgment of you as a researcher.
52. Advantages of non-probability sampling:
When a respondent refuses to participate, he may be replaced by
another individual who wants to give information.
Less expensive
Very cost and time effective.
Easy to use types of sampling methods.
Disadvantages of non-probability sampling:
The researcher interviews individuals who are easily accessible and
available. It means the possibility of gathering valuable data is
reduced.
Impossible to estimate how well the researcher representing the
population.
Excessive dependence on judgment.
The researchers can’t calculate margins of error.
Bias arises when selecting sample units.
The correctness of data is less certain.
It focuses on simplicity instead of effectiveness.