This document discusses key concepts in research methodology and statistics. It defines statistics as dealing with the collection, analysis, and interpretation of quantitative and qualitative data. It then discusses various types of graphs used to visually represent data, such as bar graphs, pie charts, histograms, boxplots, and scatterplots. It also defines common measures of central tendency (mean, median, mode), dispersion (range, variance, standard deviation, IQR), and skewness.
2. This Photo by Unknown Author is licensed under CC BY-SA-NC
Statistics deals with the collection, presentation,
analysis and interpretation of
quantitative/qualitative information.
5. The facts, observations and all the relevant
information that have been collected from research
and investigations is known as data
This Photo by Unknown Author is licensed under CC BY-SA
6.
7. Graphs
They are the visual representation of a data for easy
understanding and to save time. They present frequency
distributions to see the shape of the distribution easily.
8. Bar Graphs
are a graphical representation of data based on statistics and numerical figures. A bar graph
uses the two axes – x-axis and y-axis to plot rectangular bars.
Types of Bar Graph
•Horizontal bar graph
•Vertical bar graph
•Double bar graph (Grouped bar graph)
•Multiple bar graph (Grouped bar graph)
•Stacked bar graph
•Bar line graph
9. Horizontal bar graph
When the Y-axis represents the observation to be
compared, and the x-axis represents the magnitude of the
observations, then the bars run horizontally along the x-axis
up to the point of value proportional to the observation.
10. Vertical Bar Graph
Vertical bar graphs are just the opposite of horizontal bar
graphs. Vertical bar graphs are preferred more than
horizontal bar graphs. When the X-axis represents the
observation to be compared, and the y-axis represents
the magnitude of the observations
11. Grouped Bar Graph
Double Bar Graph
make a comparison among various observations or
categories using two parameters. However, those two
parameters should be measured in similar quantities,
which means that they should be of the same unit.
12. Multiple Bar Graph
make a comparison among various observations on the basis
of multiple parameters. You can include as many
parameters as you wish, however, each parameter should
have the same unit of measurement.
13. Stacked Bar Graph
A stacked bar graph also represents various parameters in
a single graph. The difference is that in a stacked bar
graph all the parameters are represented in a single bar.
So you can say that there are segments of a total in a
single bar.
14. Pie chart
type of graph that uses a circular graph to view data. The
graph's pieces are equal to the percentage of the total in
each group. In other words, the size of each slice of the
pie is proportional to the size of the group as a whole. The
entire "pie" represents 100% of a total, while the "slices"
represent parts of the whole.
15. Histogram
X axis represents variables and frequencies depending
on it are represented on Y axis which constitutes the
height of its rectangle
16. Boxplot
It is a graphical representation of dispersions and
extreme scores. This graph represents minimum, maximum
and quartile scores in the form of a box with whiskers.
The box includes the range of scores falling into the
middle 50% of the distribution (Inter Quartile Range =
75th percentile - 25th percentile) and the whiskers are
lines extended to the minimum and maximum scores in the
distribution.
17. Scatterplot
Scatterplots are also known as scattergrams and scatter
charts.
•X-axis representing values of a continuous variable. By
custom, this is the independent/ Exposure variable
•Y-axis representing values of a continuous variable.
Traditionally, this is the dependent/ Outcome variable
•Symbols plotted at the (X, Y) coordinates of your data.
18. Arithmetic Mean
The arithmetic mean or average as referred in common
Arithmetic Mean of Grouped Data:
Suppose we have data in form of X1, X2…….…….Xn
observations with corresponding frequencies f1,
f2…………………fn. The arithmetic mean will be
Example 3:- Calculate the average number of children
per family from the following data.
19. Median
The median is that value of the variable which divides the
group into two parts, one part comprising all the values
greater and the other, all the values less than the median.
In case of ungrouped data, when the number of
observations is odd/even, the median is the middle value
after the observations have been arranged
20. Determine the median for the following
data sets
1) 132, 139, 131, 138, 132, 139, 133, 137,
139
2) 25, 10, 16, 25, 12, 22, 20, 23, 13, 10
3) 56, 23, 48, 78, 94, 35, 88, 69, 44, 53, 27
21. Mode
Mode is the value the occurrence of which is most
frequent. It is the value around which the observations
are clustered in a given distribution.
Determine the mode for the following data sets
1) 132, 139, 131, 138, 132, 139, 133, 137, 139
2) 3, 3, 3, 5, 5, 5, 3, 6, 4, 8, 5, 4, 2, 4, 3, 5
3) 56, 23, 48, 78, 94, 35, 88, 69, 44, 53, 27
22. Mode
In case of frequency distribution, it is the value of the
variable that has the highest frequency. In case of
continuous frequency distribution, the value of the mode is
computed using interpolation formula:
Where,
l=lower limit of the modal class,
f1=frequency of the modal class,
f0=frequency of the class preceding the modal class,
f2=frequency of the class succeeding the modal class,
h=width of the modal class.
23. The modal class is the class corresponding to
the maximum frequency.
27. Dispersion in Statistics
Dispersion in statistics is a way of describing how
to spread out a set of data is.
Measures of Dispersion
•Absolute Measures of Dispersion (one data set)
•Relative Measures of Dispersion (two or more
datasets)
The measures of dispersion contain almost the
same unit as the quantity being measured. There
are many Measures of Dispersion :
1.Range
2.Variance
3.Standard Deviation
4.IQR
28. Range:
Range is the measure of the difference between the
largest and smallest value of the data variability. The
range is the simplest form of Measures of Dispersion.
Example: 1,2,3,4,5,6,7
Range = Highest value – Lowest value
29. Variance (σ2)
simple terms, the variance can be calculated by
obtaining the sum of the squared distance of each term
in the distribution from the Mean, and then dividing this
by the total number of the terms in the distribution.
(σ2) = ∑ ( X − μ)2 / N
X=observation x, x=1….n
N=No. Of observation
μ= Mean
30. Standard Deviation
Standard Deviation can be represented as the square
root of Variance. To find the standard deviation of any
data, you need to find the variance first. Standard
Deviation is considered the best measure of dispersion.
Formula:
Standard Deviation = √σ
31. Quartile Deviation
Quartile Deviation is the measure of the difference
between the upper and lower quartile. This measure of
deviation is also known as the interquartile range.
Formula:
Interquartile Range: Q3 – Q1.
32. Relative Measures of
Dispersion
Relative Measure of Dispersion in Statistics are the
values without units. A relative measure of dispersion is
used to compare the distribution of two or more
datasets.
33. Co-efficient of Range:
it is calculated as the ratio of the difference between
the largest and smallest terms of the distribution, to
the sum of the largest and smallest terms of the
distribution.
Formula:
L – S / L + S
where L = largest value
S= smallest value
34. Co-efficient of Variation:
The coefficient of variation is used to compare the 2
data with respect to homogeneity or consistency.
Formula:
C.V = (σ / X) 100
X = standard deviation
σ = mean
35. Co-efficient of Standard
Deviation:
The co-efficient of Standard Deviation is the ratio of
standard deviation with the mean of the distribution of
terms.
Formula:
σ = ( √( X – X1)) / (N - 1)
Deviation = ( X – X1)
σ = standard deviation
N= total number
36. Co-efficient of Quartile
Deviation:
The co-efficient of Quartile Deviation is the ratio of
the difference between the upper quartile and the lower
quartile to the sum of the upper quartile and lower
quartile.
Formula:
( Q3 – Q1) / ( Q3 + Q1)
Q3 = Upper Quartile
Q1 = Lower Quartile
38. Skewness
is a measure of asymmetry or
distortion of symmetric
distribution. It measures the
deviation of the given
distribution of a random
variable from a symmetric
distribution, such as normal
distribution.
Types of Skewness
1. Positive Skewness (right
skewed)
2. Negative Skewness (left
skewed)
39. Skewness can be measured using several methods; however,
Pearson mode skewness and Pearson median skewness are
the two frequently used methods.
The formula for Pearson mode skewness:
The formula for Person median skewness:
Where:
X = Mean value
Mo = Mode value Md = Median value
s = Standard deviation of the sample data