2. Dispersion in Statistics
Dispersion in statistics is a way of describing how
to spread out a set of data is.
Measures of Dispersion
•Absolute Measures of Dispersion (one data set)
•Relative Measures of Dispersion (two or more
datasets)
The measures of dispersion contain almost the
same unit as the quantity being measured. There
are many Measures of Dispersion :
1.Range
2.Variance
3.Standard Deviation
4.IQR
3. Range:
Range is the measure of the difference between the
largest and smallest value of the data variability. The
range is the simplest form of Measures of Dispersion.
Example: 1,2,3,4,5,6,7
Range = Highest value – Lowest value
4. Variance (σ2)
simple terms, the variance can be calculated by
obtaining the sum of the squared distance of each term
in the distribution from the Mean, and then dividing this
by the total number of the terms in the distribution.
(σ2) = ∑ ( X − μ)2 / N
X=observation x, x=1….n
N=No. Of observation
μ= Mean
5. Standard Deviation
Standard Deviation can be represented as the square
root of Variance. To find the standard deviation of any
data, you need to find the variance first. Standard
Deviation is considered the best measure of dispersion.
Formula:
Standard Deviation = √σ
6. Quartile Deviation
Quartile Deviation is the measure of the difference
between the upper and lower quartile. This measure of
deviation is also known as the interquartile range.
Formula:
Interquartile Range: Q3 – Q1.
7. Relative Measures of
Dispersion
Relative Measure of Dispersion in Statistics are the
values without units. A relative measure of dispersion is
used to compare the distribution of two or more
datasets.
8. Co-efficient of Range:
it is calculated as the ratio of the difference between
the largest and smallest terms of the distribution, to
the sum of the largest and smallest terms of the
distribution.
Formula:
L – S / L + S
where L = largest value
S= smallest value
9. Co-efficient of Variation:
The coefficient of variation is used to compare the 2
data with respect to homogeneity or consistency.
Formula:
C.V = (σ / X) 100
X = standard deviation
σ = mean
10. Co-efficient of Standard
Deviation:
The co-efficient of Standard Deviation is the ratio of
standard deviation with the mean of the distribution of
terms.
Formula:
σ = ( √( X – X1)) / (N - 1)
Deviation = ( X – X1)
σ = standard deviation
N= total number
11. Co-efficient of Quartile
Deviation:
The co-efficient of Quartile Deviation is the ratio of
the difference between the upper quartile and the lower
quartile to the sum of the upper quartile and lower
quartile.
Formula:
( Q3 – Q1) / ( Q3 + Q1)
Q3 = Upper Quartile
Q1 = Lower Quartile
13. Skewness
is a measure of asymmetry or
distortion of symmetric
distribution. It measures the
deviation of the given
distribution of a random
variable from a symmetric
distribution, such as normal
distribution.
Types of Skewness
1. Positive Skewness (right
skewed)
2. Negative Skewness (left
skewed)
14. Skewness can be measured using several methods; however,
Pearson mode skewness and Pearson median skewness are
the two frequently used methods.
The formula for Pearson mode skewness:
The formula for Person median skewness:
Where:
X = Mean value
Mo = Mode value Md = Median value
s = Standard deviation of the sample data
19. Correlation
•Covariance only shows the direction
of the linear relationship between
two Variables (I.e., Positive,
Negative, or No Covariance). It
cannot measure the strength of the
relationship between the two
variables.
•To measure both the strength and
direction of the linear relationship
between two variables, we use a
statistical measure called
correlation.
20. •The Value of
Correlation
Coefficient (r) will
be Positive.
•The Value of Correlation
Coefficient (r) will be
Negative.
•The Value of the
Correlation Coefficient (r)
will be Zero
22. T Test
A t test is a statistical test that is used to compare
the means of two groups.
It is often used in hypothesis testing
The null hypothesis (H0) is that the true difference
between these group means is zero.
•The alternate hypothesis (Ha) is that the true
difference is different from zero.
23. The t test assumes your data:
1.are independent
2.are (approximately) normally distributed
3.have a similar amount of variance within each group
being compared (a.k.a. homogeneity of variance)
24.
25. chi-square test
A Pearson’s chi-square test is a statistical test for
categorical data
There are two types of Pearson’s chi-square tests:
•The chi-square goodness of fit test is used to
test whether the frequency distribution of a
categorical variable is different from your
expectations.
•The chi-square test of independence is used to
test whether two categorical variables are related
to each other.
29. Regression models
❑ describe the relationship between variables
by fitting a line to the observed data.
❑ Linear regression models use a straight line,
while logistic and nonlinear regression models
use a curved line.
❑ Regression allows you to estimate how
a dependent variable changes as the
independent variable(s) change.
30. Simple linear regression
It is used to estimate the relationship
between two quantitative variables.
1.How strong the relationship is between
two variables (e.g., the relationship
between rainfall and soil erosion).
2.The value of the dependent variable at
a certain value of the independent
variable (e.g., the amount of soil erosion at
a certain level of rainfall).
31. Assumptions of simple linear
regression
1.Homogeneity of variance: the size of the error in
our prediction doesn’t change significantly across
the values of the independent variable.
2.Independence of observations: the observations
in the dataset were collected using statistically
valid sampling methods, and there are no hidden
relationships among observations.
3.Normality: The data follows a normal distribution.
4.The relationship between the independent and
dependent variable is linear: the line of best fit
through the data points is a straight line (rather
than a curve or some sort of grouping factor).
32. Simple linear regression formula
•y is the predicted value of the dependent variable (y)
for any given value of the independent variable (x).
•B0 is the intercept, the predicted value of y when
the x is 0.
•B1 is the regression coefficient – how much we
expect y to change as x increases.
•x is the independent variable ( the variable we expect
is influencing y).
•e is the error of the estimate, or how much variation
there is in our estimate of the regression coefficient.
35. The multiple regression model is
based on the following
assumptions:
•There is a linear relationship between the
dependent variables and the independent
variables
•The independent variables are not too
highly correlated with each other
•yi observations are selected independently
and randomly from the population
•Residuals should be normally distributed