2. What is Dispersion in Statistics?
๏ Dispersion is the state of getting dispersed or spread. Statistical
dispersion means the extent to which a numerical data is likely to
vary about an average value. Dispersion helps to understand the
distribution of the data.
Measures of Dispersion
๏ In statistics, the measures of dispersion help to interpret the
variability of data i.e. to know how much homogenous or
heterogenous the data is.
3. Characteristics of Measures of Dispersion
๏ A measure of dispersion should be rigidly defined
๏ It must be easy to calculate and understand
๏ Not affected much by the fluctuations of observations
๏ Based on all observations
4. Classification of Measures of Dispersion
๏ The measure of dispersion is categorized as:
(i) An absolute measure of dispersion
๏ An absolute measure of dispersion contains the same unit as the original data set.
The absolute dispersion method expresses the variations in terms of the average of
deviations of observations i.e., range, quartile deviation.
(ii) A relative measure of dispersion
๏ The relative measures of dispersion are used to compare the distribution of two or
more data sets. This measure compares values without units.. They are the
coefficient of range, the coefficient of mean deviation, the coefficient of quartile
deviation, the coefficient of variation, and the coefficient of standard deviation.
5.
6. Range
๏ A range is the most common and easily understandable
measure of dispersion. It is the difference between two
extreme observations of the data set.
If X max and X min are the two extreme observations then
Range = X max โ X min
It is simply the difference between the maximum value (highest
value) and the minimum value (lowest value) given in a data
set.
Example: 1, 3,5, 6, 7 => Range = 7 -1= 6
7.
8.
9.
10.
11. Merits of Range
It is the simplest of the measure of dispersion
Easy to calculate
Easy to understand
Independent of change of origin
Demerits of Range
It is based on two extreme observations. Hence, get affected by
fluctuations
A range is not a reliable measure of dispersion
Dependent on change of scale
12. Mean Deviation
๏ง The average of numbers is known as the mean and the arithmetic
mean of the absolute deviations of the observations from a measure of
central tendency is known as the mean deviation (also called mean
absolute deviation).
13.
14.
15.
16.
17.
18.
19. Merits of Mean Deviation
๏ Based on all observations
๏ It provides a minimum value when the deviations are taken
from the median
๏ Independent of change of origin
Demerits of Mean Deviation
๏ Not easily understandable
๏ Its calculation is not easy and time-consuming
๏ Dependent on the change of scale
๏ Ignorance of negative sign creates artificiality and becomes
useless for further mathematical treatment
20. Standard Deviation
๏ A standard deviation is the positive square root of the
arithmetic mean of the squares of the deviations of the given
values from their arithmetic mean. It is denoted by a Greek
letter sigma, ฯ. It is also referred to as root mean square
deviation.
๏ The standard deviation uses the squares of the residuals
๏ Steps;
Find the sum of the squares of the residuals
Find the mean
Then take the square root of the mean
21.
22.
23.
24.
25.
26.
27.
28. ๏ MERITS
๏ It is a rigidly defined measure of dispersion
๏ It is based on all observations
๏ It is capable of being treated mathematically.
๏ It is very much affected by fluctuations of sampling and hence is
widely used sampling theory and test of significance
๏ DEMERITS
๏ It is difficult to understand
๏ It is difficult to calculate
๏ It gives more weight to extreme values because the values are
squared up.
๏ As it is an absolute measure of variability, it can not be used for the
purpose of comparison
29. Variance
๏ Variance is the arithmetic mean of the squares of
deviations of all the items of the distributions from
arithmetic mean .
๏ In other words, variance is the square of the Standard
deviation
Variance = ฯ2
30. THE COEFFICIENT OF VARIATION
๏ The coefficient of variation is a measure of relative variability
๏ It is used to measure the changes that have taken place in a
population over time
๏ To compare the variability of two populations that are
expressed in different units of measurement
๏ It is expressed as a percentage
31.
32. A test of significance is a formal procedure for comparing
observed data with a claim (also called a hypothesis), the
truth of which is being assessed.
Four types of tests of significance in statistics,
1. Studentโs T-Test or T-Test
2. F-test or Variance Ratio Test
3. Fisherโs Z-Test or Z-Test
4. ฯ2 -Test (Chi-Square Test).
33. What is a Chi-Square
๏ A chi-square (ฯ2) statistic is a measure of the difference
between the observed and expected frequencies of the
outcomes of a set of events or variables.
๏ The data used in calculating a chi-square statistic must be
random, raw, mutually exclusive, drawn from independent
variables, and drawn from a large enough sample.
34.
35. MERITS
๏ Chi-square is useful for analyzing such differences in categorical variables,
especially those nominal in nature.
๏ ฯ2 can be used to test whether two variables are related or independent
from one another.
๏ It can be used to test the goodness-of-fit between an observed distribution
and a theoretical distribution of frequencies.
๏ Can test association between variables
๏ Identifies differences between observed and expected values
36. DEMERITS
Can't use percentages
Data must be numerical
The number of observations must be 20+
The test becomes invalid if any of the expected values are below
5
37. What is Data Analysis?
๏ Data analysis is defined as a process of cleaning,
transforming, and modeling data to discover useful information
for business decision-making.
๏ The purpose of Data Analysis is to extract useful information
from data and taking the decision based upon the data
analysis.
๏ Data analysis is the practice of working with data to collect
38. Data analysis process
๏ Identify the question youโd like to answer. What problem is trying to solve? What do
you need to measure, and how will you measure it?
๏ Collect the raw data sets youโll need to help you answer the identified question. Data
collection might come from internal sources or from secondary sources
๏ Clean the data to prepare it for analysis.
๏ Analyze the data. By manipulating the data using various data analysis techniques
and tools, you can begin to find trends, correlations, outliers, and variations that begin
to tell a story.
๏ Interpret the results of your analysis to see how well the data answered your original
question. What recommendations can you make based on the data? What are the
39. What is Frequency Distribution?
๏ A frequency distribution is a representation, either in a
graphical or tabular format, that displays the number of
observations within a given interval.
๏ The interval size depends on the data being analyzed and
the goals of the analyst. The intervals must be mutually
exclusive and exhaustive.
40. ๏ Frequency distribution in statistics is a representation that displays
the number of observations within a given interval.
๏ The representation of a frequency distribution can be graphical or
tabular so that it is easier to understand.
๏ Frequency distributions are particularly useful for normal
distributions, which show the observations of probabilities divided
among standard deviations.
๏ In finance, traders use frequency distributions to take note of price
action and identify trends.
41. ๏ As a statistical tool, a frequency distribution provides a visual
representation for the distribution of observations within a
particular test.
๏ Analysts often use frequency distribution to visualize or
illustrate the data collected in a sample.
๏ Both histograms and bar charts provide a visual display using
columns, with the y-axis representing the frequency count,
and the x-axis representing the variable to be measured.
42. ๏ In general, a histogram chart will typically show a normal
distribution, which means that the majority of occurrences will fall
in the middle columns.
๏ Frequency distributions can be a key aspect of charting normal
distributions which show observation probabilities divided
among standard deviations.
๏ Frequency distributions can be presented as a frequency table,
a histogram, or a bar chart. Below is an example of a frequency
distribution as a table.
43. What is a Binomial Distribution?
๏ The binomial distribution is a discrete probability distribution that
represents the probabilities of binomial random variables in a binomial
experiment.
๏ The binomial distribution is defined as a probability distribution related
to a binomial experiment where the binomial random variable specifies
how many successes or failures occurred within that sample space.
๏ The binomial distribution is a probability distribution that applies to
binomial experiments. Itโs the number of successes in a specific
number of tries.
44. ๏ถ The binomial is a type of distribution that has two possible
outcomes (the prefix โbiโ means two, or twice).
๏ถ For example, a coin toss has only two possible outcomes: heads
or tails and taking a test could have two possible outcomes: pass or
fail.
๏ถ A Binomial Distribution shows either (S)uccess or (F)ailure.
45. Characteristics of a Binomial distribution
1: The number of observations n is fixed.
2: Each observation is independent.
3: Each observation represents one of two outcomes
("success" or "failure").
4: The probability of "success" p is the same for each
outcome.