2. Unit – 1
Measures of Central Tendency
&
Measures of Dispersion
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
3. Statistics: Meaning and Definition
“ Statistics is the science of estimates and probabilities”
“Statistics may be defined as the collection, presentation, analysis, and
interpretation of numerical data”
“Statistics is a science which deals with the method of collecting,
classifying, presenting, comparing, analyzing, and interpreting
numerical data. Collected to through light on enquiry”.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
4. Functions of Statistics
• To collect and present facts in a systematic manner.
• To help in the formulation and testing of the hypothesis.
• To help in facilitating the comparison of data.
• To help in predicting future trends.
• To help to find the relationship between the variables.
• Simplifies the mass of complex data.
• To help to formulate policies.
• To help governments to make decisions.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
5. Limitations of Statistics
• Does not study the qualitative phenomenon
• Does not deal with individual items
• Statistical results are true only on an average.
• Statistical data should be uniform and homogeneous
• Statistical Results depend on the accuracy of the data
• Statistical conclusions are not universally true.
• Statistical results can be interpreted only if a person has sound
knowledge of statistics
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
6. Collection and presentation of data
• Data Collection
• Primary data – Collected first time by the investigator, They are in the shape
of raw materials
• Secondary Data – Already collected data for a purpose other than the
problem at hand.
Primary Data Secondary Data
Collection Purpose For the problem in hand For the other problems
Collection Process Very involved Rapid and easy
Collection Cost High Relatively low
Collection Time Long Short
Suitability Its suitability is positive It may or may not suit the object of the survey
Originality It is original It is not original
Precautions Not Extra precaution required to use the
data
It should be used with extra care
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
8. Measure of Central Tendency
• Meaning:
A measure of central tendency is a single value that describes the way
in which a group of data cluster around a central value. To put in other
words, it is a way to describe the centre of a data set. A measure of
central tendency is a measure that tells us where the middle of a bunch
of data lies.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
9. Application of Central Value:
• Central tendency also allows you to compare one data set to another.
• Central tendency is also useful when you want to compare one piece
of data to the entire data set.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
10. Different Measures of Central Tendency:
• Mean
• Median
• Mode
• Geometric Mean
• Harmonic Mean
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
11. Mean
• Mean: Mean is the most common measure of central tendency. It is
simply the sum of the numbers divided by the number of numbers in
a set of data. This is also known as average.
• The Arithmetic Mena is a good measure of central tendency
Reasons:
• It takes all the observation into account while calculating
• It is can be used for further mathematical treatment
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
12. Mean
• Mathematical characteristics of Arithmetic Mean
• The sum of the deviations, of all the values of x, from their arithmetic
mean, is zero.
• Sum of square deviation taken from the AM is always least among
such deviations taken from other measures of other tendency
• Mean of the combined series: If we know the sizes and means of two
component series, then we can find the mean of the resultant series
obtained on combining the given series.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
13. Median
• Median is another measure of Central Tendency which locates the
middle most value in given set of data
• Median is the measure of Central Tendency different from any of the
means
• Median is a single value from the data set that measures the central
item in the data
• Median is that value of the variable which divides the group in two
equal parts, one part comprising of the values greater than and the
other less than Median
• This single item is the middlemost or most central item in the set of
numbers. As said earlier half of the items lie above this point and the
other half lie below it
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
14. Median
Median M = L +
𝑁
2
−𝑀 ∗𝐶
𝑓
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
15. Median
Merits:
• Rigidly defined
• Easy to calculate for non-mathematical person
• Since, it is a positional average, not affected by the extreme
observations. Useful in the skewed distribution
• Computed while dealing with open ended classes
• Located by simple inspection and even graphically
• This is the only average which will deal with qualitative characteristics
Demerits:
It doesn’t take all the observation into account while calculating
average
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
16. Mode
• Mode is one of the measure of central tendency that is different from the
mean that somewhat like the median
• The mode is the value that is repeated most often in the data set
• The mode is defined as the highest or the most popular value in the given data
• Mode is the value which occurs most frequently in a set of observations and
around which the other items of the set clusters densely located
• It is the value at the point around which the items tend to be most heavily
concentrated. It is regarded as the most typical of a series of values
• Mode is the value which has the greatest frequency density in its immediate
neighbourhood
• Mode is termed as the fashionable value of the distribution
• Example: Average size of the shoe sold in a shop is 7
• Average Indian Male is 5 feet 6 inch
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
17. Mode
Z = L +
𝑓−𝑓1 ∗𝐶
2𝑓−𝑓1−𝑓2
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
18. Mode
• Merits and Demerits
• Merits:
• Easy to calculate and understand; done by merely inspection
process
• Not affected by observations
• Convenient for open ended class
• Demerits:
• Mode is not rigidly defined
• Mode is not suitable for further mathematical treatment
• Affected to a greater extent with the fluctuation of samplings
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
19. Empirical Relationship between Mean (ഥ
𝑿 ), Median
(M) and Mode (Z) (Slightly Skewed)
Z = 3M – 2 ഥ
𝑿
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
20. Geometric Mean
• GM is nth root of product of quantities of the series. It is observed by
multiplying the values of items together and extracting the root of the
product corresponding to the number of items.
• Thus, square root of the products of two items and cube root of the
products of the three items are the geometric mean
• It is never larger than the arithmetic mean
• If there are zeroes and negative numbers in the series, the geometric mean
cannot be used.
• Logarithms can be used to find the geometric mean to reduce large
numbers and to save time
• Appropriate in situations where, there is an average percentage rate of
change over a period of time.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
21. Geometric Mean
•GM = Antilog
σ 𝑓 𝑙𝑜𝑔𝑥
𝑁
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
22. Merits and demerits of GM
• Merits of GM
• It is based on all the observation in the series
• It is rigidly defined
• It is suited for averages and ratios
• It is less affected by extreme values
• It is useful for studying social and economic data
• Demerits of GM
• It is not simple to understand
• It requires computational skill
• It cannot be computed if any items are zero or negative
• It has restricted applications
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
23. Harmonic Mean
• It is the total number of items of a value, divided by the sum of
reciprocal of values of a variable
• It is a specified average which solves problems involving
variables expressed in “Time rates” that vary according to time
• Example: Speed in km/hr., min/day, Price/chapter
• Harmonic mean (HM) is suitable only when time factor is a
variable and the act being performed remains constant
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
25. Harmonic Mean
• Merits of Harmonic Mean
• It is based on all observation
• It is rigidly defined
• ‘Suitable in case of series having wide dispersion
• It is suitable for further mathematical treatment
• Demerits of Harmonic Mean
• It is not easy to compute
• Cannot be used when one of the items is zero
• It cannot represent distribution
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
26. Relationship between AM, HM and GM
The relationship between AM, GM, and HM can be represented by the
formula AM x HM = GM2. The geometric mean (GM) equals the
product of the arithmetic mean (AM) and the harmonic mean (HM) .
27. Characteristics of Good Average
• It should be easy calculate and simple to follow.
• Average should represent the entire mass of data.
• Averages are always capable of further algebraic treatment.
• A good average should be an absolute number.
• A good average is one which is not affected by skewness in the
distribution.
• It should not be unduly affected by extreme values.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
28. Pre-requisites of Good Measures of Central
Tendency
• It should be rigidly defined
• It should be based on all observations
• It should be easy to understand and calculate
• It should have sampling stability
• It should not be unduly affected by extreme observations
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
29. Measures of Dispersion
• Meaning
• Dispersion is the scattered ness of the data series around it average
• Dispersion is the extent to which values in a distribution differ from
the average of the distribution
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
30. Measures of Dispersion
• Why Dispersion?
• Determine the reliability of an average
• Serve as a basis for the control of the variability
• To compare the variability of two or more series and
• Facilitate the use of other statistical measures.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
31. Measures of Dispersion
• Characteristics of an Ideal Measure of Dispersion?
• It should be rigidly defined.
• It should be easy to understand and easy to calculate.
• It should be based on all the observations of the data.
• It should be easily subjected to further mathematical treatment.
• It should be least affected by the sampling fluctuation .
• It should not be unduly affected by the extreme values.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
32. Measures of Dispersion
• Different Measures of Dispersion
• The range
• The inter quartile range and quartile deviation
• Percentile
• Decile
• The mean deviation or average deviation
• The standard deviation
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
33. The Range
Range is the crude measure of dispersion;
Calculated as
R = High – Low
R = H – L
Its relative measure is called co-efficient of Range R
=
𝐻−𝐿
𝐻+𝐿
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
34. Quartile Deviation
Quartile Deviation - QD = Q3 – Q1 (Inter quartile range)
QD = =
𝑄3−𝑄1
2
(Semi - Inter quartile range)
Co-efficient of QD = =
𝑄3−𝑄1
𝑄3+𝑄1
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
35. Mean Deviation
Mean Deviation =
σ f I X−A I
𝑛
or
σ f I d I
𝑛
Relative Measure of Mean Deviation
Co-efficient of Mean Deviation =
Mean Deviation
Average about which it is calculated
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
36. Standard Deviation
𝜎 =
σ ሻ
𝑓(𝑥 2
𝑁
− ( ቇ
σ 𝑓𝑋
𝑁
2
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
37. Standard Deviation
Characteristics of Standard Deviation:
- SD is very satisfactory and most widely used measure of
dispersion
- Amenable for mathematical manipulation
- It is independent of origin, but not of scale
- If SD is small, there is a high probability for getting a value close to
the mean and if it is large, the value is father away from the mean
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
38. Co-efficient of Variance
If Co-efficient of Variance for a given data is more, the data is said to
be less consistent, the other hand if C.V is less it means that
variability in the data is less and more consistentfrom the mean
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
39. Unit – 2
Correlation and Regression
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
40. Correlation and Correlation Co-efficient:
• Correlation
• Correlation is a statistical measure that indicates the extent to which
two or more variables fluctuate together.
• A positive correlation indicates the extent to which those variables
increase or decrease in parallel;
• A negative correlation indicates the extent to which one variable
increases as the other decreases.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
41. Correlation and Correlation Co-efficient:
• Correlation Co-efficient
• A correlation coefficient is a statistical measure of the degree to
which changes to the value of one variable predict change to the
value of another.
• When the fluctuation of one variable reliably predicts a similar
fluctuation in another variable, there’s often a tendency to think that
means that the change in one causes the change in the other.
• However, correlation does not imply causation. There may be, for
example, an unknown factor that influences both variables similarly.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
42. Correlation Co-efficient Value:
r lies between -1 and +1
• If r lies between 0 to 1, that means positive correlation exists
• If r is exactly 1, the correlation is perfect positive correlation
• If r lies between -1 to 0, that means negative correlation exists
• If r is -1, that implies perfect negative correlation
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
43. Correlation Analysis
• Correlation Analysis
• Correlation analysis is a method of statistical evaluation used to study
the strength of a relationship between two, numerically measured,
continuous variables (e.g. height and weight).
• Applications of Correlation:
• The most valuable use of a correlation is in predicting the future of a business
direction.
• Correlation is used to assess the direction of change
• It is used to measure the performance measures and for data mining
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
44. Different types of Correlation
• Positive Correlation
• Positive correlation occurs when an increase in one variable increases
the value in another.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
45. Different types of Correlation
• Negative Correlation
• Negative correlation occurs when an increase in one variable
decreases the value of another.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
46. Different types of Correlation
• No Correlation
• No correlation occurs when there is no linear dependency between
the variables.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
47. Different types of Correlation
• Perfect Positive Correlation
• Perfect correlation occurs when there is a functional dependency
between the variables.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
48. Different types of Correlation
• High degree of Positive Correlation
• A correlation is stronger the closer the points are located to one
another on the line.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
49. Different types of Correlation
• Low degree of Positive Correlation
• A correlation is weaker the farther apart the points are located to one
another on the line.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
50. Different Methods of Studying Correlation Analysis
• Scatter diagram method
• Karl Pearson’s Co-efficient of Correlation (Covariance method)
• Two way frequency table (Bivariate correlation method)
• Ranks method or Spearman’s Rank Correlation
• Concurrent Deviation Method
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
51. Different Methods of Studying Correlation Analysis
• Scatter diagram method
• It is one of the simplest way or method of diagrammatic representation of a
bivariate distribution and provides us one of the simplest tool of ascertaining the
correlation between two variables
• The “n” points are plotted as dots of two variables (Examples heights and weight).
The diagram of dots so obtained is known as “Scatter Diagram”
• From the scatter diagram, we can form a fairly good, tough rough idea about the
relationship between the two variables.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
52. Different Methods of Studying Correlation Analysis
• Scatter diagram
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
53. Different Methods of Studying Correlation Analysis
• Karl Pearson’s Co-efficient of Correlation
r =
𝑛 σ 𝑥𝑦− σ 𝑥 . σ 𝑦
𝑛.σ 𝑥2−(σ 𝑥ሻ
2
∗𝑛.σ 𝑦2−(σ 𝑦ሻ
2
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
54. Different Methods of Studying Correlation Analysis
• Ranks method or Spearman’s Rank Correlation
𝜌 = 1 −
6 σ 𝐷2
𝑛3−𝑛
𝜌 = 1 −
6 (σ 𝐷2+𝐶𝐹ሻ
𝑛3−𝑛
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
55. Regression Analysis
• Regression Analysis
• The regression analysis is a statistical process for estimating the relationships
among variables. Regression is the attempt to explain the variation in a
dependent variable using the variation in independent variables.
• Uses or Application of Regression:
• The most common use of regression in business is to predict events that have
yet to occur. Demand analysis, for example, predicts how many units consumers
will purchase.
• Another key use of regression models is the optimization of business processes.
A factory manager might, for example, build a model to understand the
relationship between oven temperature and the shelf life of the cookies baked
in those ovens.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
56. Regression Equation
• X on Y
(x- ҧ
𝑥) = bxy (y-ത
𝑦)
• Y on X
(y-ത
𝑦) = byx (x- ҧ
𝑥)
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
57. Regression Coefficients
bxy =
𝑛 σ 𝑥𝑦−(σ 𝑥ሻ (σ 𝑦ሻ
𝑛 σ 𝑦2− (σ 𝑦ሻ
2 or bxy = r .
𝜎𝑥
𝜎𝑦
byx =
𝑛 σ 𝑥𝑦−(σ 𝑥ሻ (σ 𝑦ሻ
𝑛 σ 𝑥2− (σ 𝑥ሻ
2 or byx = r .
𝜎𝑦
𝜎𝑥
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
58. Simple and Multiple regression
• Simple regression:
• The linear regression model used to describe the relationship between a
dependent variable y and an independent variable x is given by
y=a+bx
• Multiple regression
• Multiple regression is a statistical technique that can be used to analyze the relationship
between a single dependent variable and several independent variables. The objective of
multiple regression analysis is to use the independent variables whose values are known to
predict the value of the single dependent value.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
59. Unit – 3
Probability Distribution
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
60. Important terminologies
• Experiment
• Trail
• Event
• Mutually Exclusive Event
• Dependent and independent event
• Equally Likely event
• Simple and Compound events
• Exhaustive events
• Complementary events
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
61. Important terminologies
• Experiment
The term experiment refer to describe an act which can be repeated under same
given conditions.
Random experiment: An experiment is called random experiment if when
conducted repeatedly under essentially homogeneous conditions, the result is not
unique or results is not certain but may be any one of the various possible
outcomes.
Or
An Experiment having random outcomes
Or
Experiments whose results are depends on chance
Example: Tossing a coin, rolling a dice
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
62. Important terminologies
• Trail
Performing of a random experiment is called a trial
Example: Tossing experiment of a coin has done two times, that means two
trials
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
63. Important terminologies
• Event
Outcome or combination of outcomes of an experiment are termed as
events
Example: Tossing a coin – You may get H or T – These are events
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
64. Important terminologies
• Mutually Exclusive Events:
Two events are said to be mutually exclusive or incompatible, when both cannot
happen simultaneously in a single trial or in other words, the occurrence of any
one of them avoid the occurrence of the other.
In other words “if happening of one event prevents the happening of the other
events such events we call it as mutually exclusive events”
Example: Tossing a coin leads to two events Head (H) or Trail (T)
If head turns up in tossing a coin, then head prevents tail to turn-up and vice-versa
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
65. Important terminologies
• Independent and Dependent Event
Two or more events are said to be independent when the outcome of one
doesn’t affect, and is not affected by the other.
Example: Tossing of coin twice, happening of head during the first trail will
not affect the happening of other in the next trial
The occurrence and non-occurrence of one event in any one trial affect the
probability of other event in other trial
Example: Drawing a card without replacement.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
66. Important terminologies
• Equally Likely
Events are said to be equally likely when one doesn’t occur more often than
the others. This means none of them is expected to occur in preference of
other.
In other words – equal chance of occurrence and importance for all the
events to occur
Example: When you roll a dice, occurrence of all the 6 faces i.e. 1, 2, 3, 4, 5,
6 are equally likely
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
67. Important terminologies
• Simple and Compound Events
In case of simple events we consider the probability of the happening or not
happening of single events
Compound events, we consider the joint occurrence of two or more events
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
68. Important terminologies
• Exhaustive Events
Events are said to be exhaustive when their totality includes all the possible
outcomes of a random experiment.
In other words, if the sum of individual chance of occurrence is equal to 1
Example1 : Rolling dice, once the possible outcomes are 1, 2, 3, 4, 5 and 6,
hence the exhaustive number of cases is 6
Example 2: If we roll two dice once the exhaustive number of cases is 62
= 36
Similarly for rolling of three dice leads to 216 outcomes and summation of
possibilities or probability of occurrence of all these events is 1
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
69. Important terminologies
• Complementary events
Let there be two events A and B, A is called the complementary event of B
(and Vice versa), if A and B are mutually exclusive and exhaustive.
Example: When the dice is thrown, the occurrence of an even number
and odd number are complementary events.
Simultaneous occurrence of two events A and B is generally written as AB
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
70. Definition of Mathematical Probability
• If there be a random experiment with “N” outcomes which are mutually
exclusive, exhaustive and equally likely
• Let there be an event “A”, Let “M” outcome occur for the event “A”
(Favourable outcomes), then the probability of occurrence of “A” can be
written as follows
P (A) =
𝑚
𝑁
=
𝑂𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝐹𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒 𝑡𝑜 "𝐴"
𝑇𝑜𝑡𝑎𝑙 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑂𝑢𝑡𝑐𝑜𝑚𝑒𝑠
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
71. Theorems of Probability or Rules of Probability
The two important theorems of probability
• The addition theorem
• The multiplication theorem
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
72. Theorems of Probability or Rules of Probability
The two important theorems of probability
• The addition theorem
P(A or B) = P(A U B) = P(A) + P(B) – Mutually Exclusive events
P(A or B) = P(A U B) = P(A) + P(B) – P(A ⊓ B) – Events overlap
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
73. Theorems of Probability or Rules of Probability
The two important theorems of probability
• The Multiplication theorem
P(A and B) = P(A) × P(B) - Independent events
• P(A ⊓ B) = P(A) × P(B/A) ; P(A) ≠ 0
• P(B ⊓ A) = P (B) × P(A/B) ; P(B) ≠ 0
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
74. Bayes Theorem of Probability
•P(A / B) =
𝑃 (𝐴 ∩ 𝐵ሻ
𝑃 (𝐵ሻ
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
75. Random Variable
Random means “Unpredictable”
A random variable x is a variable whose possible values are numerical
outcomes of a random phenomenon.
There are two types of a random variable
- Discreate Random variable
- Continuous random variable
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
76. Discrete and Continuous Random Variable
• Discrete Random Variable
• Continuous Random Variable
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
78. Theoretical Probability Distribution
•Binomial Distribution
• It is also known as “Bernoulli Distribution”, Probability distribution
expressing the probability of one set of dichotomous alternatives i.e.
success or failur
• Bernoulli trail: A trail having only two outcomes
Example: Tossing a coin: H or T
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
79. Theoretical Probability Distribution
• Binomial Distribution
• Binomial Probability Distribution
Let “x” be a random variable for a binomial variable with “n” trail and P(Success) = p,
then probability of “x” number of success is given by
P(x) = n𝐶𝑥 . 𝑝𝑥
. 𝑞𝑛−𝑥
• Where
x = Number of success in “n” trail
n = Number of trail
p = probability of success in a single trail
q = (1-p) = (1-Success)
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
80. Theoretical Probability Distribution
•Binomial Distribution
• Constants of Binomial Distribution
• Mean = np
• Variance = npq
• Standard Deviation = 𝑛𝑝𝑞
• Parameters of Binomial Distribution
n, p, q
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
81. Theoretical Probability Distribution
•Poisson Distribution
• Poisson distribution may be expected in cases where the chance of any
individual event being a success is small.
• The distribution is used to describe the behaviour of rare events such as
the number of accidents on road, Number of printing mistakes in books.
• It has been called “the Law of Impossible Events”
• P(x) =
𝑒−𝜆 𝜆𝑥
𝑥!
Where x= 1, 2, 3, 4…
𝜆 = Parameters of the Poisson distribution
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
82. Theoretical Probability Distribution
•Poisson Distribution
• Constants of Poisson Distribution
• The mean of Poisson distribution = 𝜆
• The standard deviation = 𝜆
Parameters Poisson Distribution − 𝜆
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
83. Theoretical Probability Distribution
• Normal Distribution
• Normal curve is “bell shaped” and symmetrical in it appearance
• The height of the normal curve is at its maximum at the mean. Hence the mean and
mode of the normal distribution coincide. Thus for a Normal Distribution Mean,
Median and Mode are all equal.
• There is one maximum point of the normal curve which occurs at the mean
• Since there is only one maximum point, the normal curve is uni-modal i.e. it has only
one Mode
• As dissatisfied from Binomial and Poisson distribution where the variable is discrete.
The variable distributed according to the normal curve is continuous.
• The first and third quartile are equidistance from the Median
• The area under the normal curve distributed as follows
• Mean ± 1𝜎 covers 68.27% area and 34.135 % area will lie on either side of the Mean
• Mean ± 2𝜎 covers 95.45% area
• Mean ± 3𝜎 Covers 99.73% area
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
85. Unit – 4
Time Series Analysis
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
86. Objective of Time Series Analysis
• The assumption underlying time series analysis is that the time
series data behaves the same in the future as that in the past.
Time series analysis is used to detect the pattern underlying
data, isolate the influencing factors which in turn used to
estimate the future accurately. Thus, the time series data helps
us to cope with the uncertainty about the future.
• To review and evaluate the progress made in the plans are
based on the time series data. For example, Finance Minstry of
Govt. of India (GOI) reviewing the gross domestic product )
GDP of the economy during the financial year and chalking out
the strategies to further the growth.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
87. Variations/Components in Time Series Analysis
• In typical time-series there are three main components
which seem to be independent of one another and seems
to be influencing time-series data.
• An important step in analysing time series is to consider
the types of data patterns. A time series data can contain
some or all of the following elements. They are:
• Trend (T)
• Cyclical (C)
• Seasonal (S)
• Irregular (I)
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
88. Variations/Components in Time Series Analysis
• Trend (T)
Trend (T) : The trend is the long term pattern of a time series. A
trend can be positive or negative depending on whether, the
time series exhibits an increasing long term pattern or a
decreasing long term pattern. The rate of trend growth usually
varies over time.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
89. Variations/Components in Time Series Analysis
• Cyclical(T)
• Cyclical (C) : Time series data may show up and down movement around a
given trend. For example, business cycle over the years show upward
trend and touches its peak and then it may show slump and hits the
bottom. The pattern repeats but not a regular interval of time. The
duration of a cycle depends on the type of business or industry.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
90. Variations/Components in Time Series Analysis
• Seasonal
• Seasonal (S): It is a special case of a cycle component of time series in
which the magnitude and duration of the cycle do not vary but happen at
a regular interval each year. Seasonality occurs when the time series
exhibits regular variation during the same periods (Month, Year or same
quarter every year)
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
91. Variations/Components in Time Series Analysis
• Irregular
• Irregular or Random: This type of variation is unpredictable. This is caused
by short term unanticipated and non-recurring factors. These follows np
specific pattern
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
92. Methods of Evaluation the Trend
• These are also called the forecasting methods of Time Series Analysis
Some of them are
• Freehand Method
• Moving Average Method
• Semi-average Method
• Least-Square Method
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
93. Methods of Estimating Seasonal Index
• Method of Simple Averages
• Ratio to trend method
• Ratio to moving average method
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
94. Unit – 5
Hypothesis Testing
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
95. Hypothesis
• A Hypothesis is an assumption or a statement that may or may not be
true
• It is tested base on the data / information obtained from a sample
• It is used to make decisions related to business
Example:
1. Whether a new drug is more effective than the new drug
2. Whether the proportion of smokers in a class is different from 0.30
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
96. Characteristics of a good hypothesis
• Conceptually clear
• Specificity
• Testability
• Availability of techniques
• Theoretical relevance
• Consistency
• Objectivity
• Simplicity
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
97. Sources
• Theory : Goal of business (theory) Hypo: the rate of return on CE is an
index of business success; higher the EPS more favorable is the
financial leverage
• Observation: Ex: price & demand for a product
• Intuition & Personal experience
• Findings of Studies
• Continuity of research
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
98. One Tailed and Two Tailed Test
• One Tailed Test
The test is called one-sided (One-tailed) only if the null hypothesis gets
rejected when a value of the test statistics falls in one specific tail of the
distribution.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
99. One Tailed and Two Tailed Test
• Two Tailed Test
If the null hypothesis gets rejected when a value of test statistic falls in
either one or the other of the two tails of its sample distribution.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
100. Formulation of Hypothesis
• Criteria to fulfil while formulating the hypothesis
• A hypothesis must be formulated in simple, clear and declarative form
• A broad hypothesis might not be empirically testable
• A hypothesis must be measurable and quantifiable so that the statistical
authenticity of the relationship can be established
• A hypothesis is a conjunctural statement based on the existing literature and
theories about the topic and not based on the gut feel or subjective
judgement of the researcher
• Validation of the hypothesis would necessarily involve testing the statistical
significance of the hypothesized relation.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
101. Formulation of Hypothesis
• Null Hypothesis
• is a statement about a population parameter that is assumed to be true.
• Null hypotheses are formulated for testing statistical significance
• It is the presumption that is accepted as correct unless, there is strong evidence against it.
• It is a starting point. The researcher test whether the value stated in the null hypothesis is true.
Example: There is no relationship between families’ income level and expenditure on recreation
• Alternate Hypothesis
• Is not specific and is not directly tested.
• It is complementary to null hypothesis.
• It is accepted when null hypothesis (H0) is rejected.
Example: There is a relationship between families’ income level and expenditure on recreation
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
102. Functions / Role of Hypothesis
• Guides the direction of study
• Gives an idea for setting order among facts
• Specifies sources of data
• Determines data needs
• Suggests type of research
• Determines the technique of analysis
• Helps in development of theories
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
104. Errors in Hypothesis – Type 1 and Type 2
A Type I error means rejecting the null hypothesis when it’s actually true. It
means concluding that results are statistically significant when, in reality,
they came about purely by chance or because of unrelated factors.
A Type II error means not rejecting the null hypothesis when it’s actually
false. This is not quite the same as “accepting” the null hypothesis, because
hypothesis testing can only tell you whether to reject the null hypothesis
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
105. Parametric and Non Parametric Test
Parametric Test Non-Parametric Test
• Parametric analysis to test group means
• Information about population is completely known
• Specific assumptions are made regarding the
population
• Applicable only for variable
• Samples are independent
• Assumed normal distributions
• Handles Interval data or Ratio data
• Results can be significantly affected by outliers
• Perform well when the spread of each group is
different, might not provide valid results if groups
have a same spread
• Have more statistical power
• Nonparametric analysis to test group medians
• No Information about the population is available
• No assumptions are made regarding population
• Applicable to both variable and attributes
• Not necessarily the samples are Independent
• No Assumed Shape / distribution
• Handles Ordinal data, Nominal (or Interval or Ratio),
ranked data
• Results cannot be seriously affected by outliers
• Perform well when the spread of each group is same,
might not provide valid results if groups have a
different spread
• It is not so powerful like parametric test
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
106. Z Test
Formulas to remember – Testing of Hypotheses
Z Test
Hypothesis Test Statistics / Test Procedure Decision Rule
Test for equality of mean
Two Tail
Test
Ho : µ = µ0
Ho : µ ≠ µ0
𝑍𝐶𝑎𝑙 =
𝑥ҧ − µ
𝜎
𝑛
𝑥ҧ = Sample Mean
µ = Known value of population means
𝜎
𝑛
= Standard Deviation of µ
Two Tail Test
Reject H0 when
𝑍𝐶𝑎𝑙 ≤ −1.960
𝑍𝐶𝑎𝑙 ≥ 1.960 At 5% Level of Significance
𝑍𝐶𝑎𝑙 ≤ −2.58
𝑍𝐶𝑎𝑙 ≥ 2.58 At 1% Level of Significance
One Tail
Test
Upper Tailed Test
Ho : µ > µ0
H1 : µ < µ0
Lower Tailed Test
Ho : µ < µ0
H1 : µ > µ0
One Tail Test
- Upper tailed Z test (µ ≥ µ0)
Reject H0 when
𝑍𝐶𝑎𝑙 ≥ 1.645 𝑎𝑡 5% 𝐿𝑂𝑆
𝑍𝐶𝑎𝑙 ≥ 2.326 𝑎𝑡 1% 𝐿𝑂𝑆
- Lower tailed Z test (µ ≤ µ0)
Reject H0 when
𝑍𝐶𝑎𝑙 ≤ 1.645 𝑎𝑡 5% 𝐿𝑂𝑆
𝑍𝐶𝑎𝑙 ≤ 2.326 𝑎𝑡 1% 𝐿𝑂𝑆
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
107. Z Test Hypothesis Test Statistics / Test Procedure Decision Rule
Test for Equality of two means
Two Tail
Test
Ho : µ1 = µ2
Ho : µ1 ≠ µ2
𝑍𝐶𝑎𝑙 =
𝑋
ത1 − 𝑋
ത2
𝜎1
2
𝑛1
+
𝜎2
2
𝑛2
𝑋
ത1, 𝑋
ത2 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑀𝑒𝑎𝑛 of I and II population
respectively
𝑛1 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒 𝑜𝑓 𝐼 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
𝑛2 = Sample Size of II Population
𝜎1, 𝜎2 = 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝐼 𝑎𝑛𝑑 𝐼𝐼
Two Tail Test
Reject H0 when
𝑍𝐶𝑎𝑙 ≤ −1.960
𝑍𝐶𝑎𝑙 ≥ 1.960 At 5% Level of Significance
𝑍𝐶𝑎𝑙 ≤ −2.58
𝑍𝐶𝑎𝑙 ≥ 2.58 At 1% Level of Significance
One Tail
Test
Upper Tailed Test
Ho : µ1 > µ2
H1 : µ1 < µ2
Lower Tailed Test
Ho : µ1 < µ2
H1 : µ1 > µ2
One Tail Test
- Upper tailed Z test (µ1 ≥ µ2)
Reject H0 when
𝑍𝐶𝑎𝑙 ≥ 1.645 𝑎𝑡 5% 𝐿𝑂𝑆
𝑍𝐶𝑎𝑙 ≥ 2.326 𝑎𝑡 1% 𝐿𝑂𝑆
- Lower tailed Z test (µ ≤ µ0)
Reject H0 when
𝑍𝐶𝑎𝑙 ≤ 1.645 𝑎𝑡 5% 𝐿𝑂𝑆
𝑍𝐶𝑎𝑙 ≤ 2.326 𝑎𝑡 1% 𝐿𝑂𝑆
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
108. Z Test Z Test Hypothesis Test Statistics / Test Procedure Decision Rule
Test for Equality of Population
Two Tail
Test
Ho : P = 𝑃0
Ho : P ≠ 𝑃0
𝑍𝐶𝑎𝑙 =
𝑃 − 𝑃0
𝑃0𝑄0
𝑁
𝑃0 = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑃𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛
P = X/n Sample Proportion
𝑃0𝑄0
𝑁
=Standard Error of Sample population
Two Tail Test
Reject H0 when
𝑍𝐶𝑎𝑙 ≤ −1.960
𝑍𝐶𝑎𝑙 ≥ 1.960 At 5% Level of Significance
𝑍𝐶𝑎𝑙 ≤ −2.58
𝑍𝐶𝑎𝑙 ≥ 2.58 At 1% Level of Significance
One Tail
Test
Upper Tailed Test
Ho : P > 𝑃0
H1 : P < 𝑃0
Lower Tailed Test
Ho : P < 𝑃0
One Tail Test
- Upper tailed Z test (P≥ 𝑃0)
Reject H0 when
𝑍𝐶𝑎𝑙 ≥ 1.645 𝑎𝑡 5% 𝐿𝑂𝑆
𝑍𝐶𝑎𝑙 ≥ 2.326 𝑎𝑡 1% 𝐿𝑂𝑆
- Lower tailed Z test (P≤ 𝑃0)
Reject H0 when
𝑍𝐶𝑎𝑙 ≤ 1.645 𝑎𝑡 5% 𝐿𝑂𝑆
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
109. Z Test
𝑍𝐶𝑎𝑙=
𝑃1 − 𝑃2
𝑃1𝑄1
𝑛1
+
𝑃2𝑄2
𝑛2
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
110. t Test
Hypothesis Test Statistics / Test Procedure Decision Rule
Test for equality of mean
Two Tail Test
Ho : µ = µ0
Ho : µ ≠ µ0
𝑡𝐶𝑎𝑙 =
𝑥ҧ − 𝜇0
𝑠
𝑛−1
𝑥ҧ = Sample Mean
µ = population mean
𝑠
𝑛−1
= Standard Deviation of 𝑆𝑎𝑚𝑝𝑙𝑒
𝑠 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
Reject H0 when
𝑡𝐶𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 > 𝑡𝑡𝑎𝑏𝑙𝑒 𝑣𝑎𝑙𝑢𝑒
(n-1) degrees of freedom
One Tail Test
Upper Tailed Test
Ho : µ > µ0
H1 : µ < µ0
Lower Tailed Test
Ho : µ < µ0
H1 : µ > µ0
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
111. t Test Hypothesis Test Statistics / Test Procedure Decision Rule
Test for Equality of two means (t-Test)
Two
Tail Test
Ho : µ1 = µ2
Ho : µ1 ≠ µ2
𝑡𝐶𝑎𝑙 =
𝑋
ത1 − 𝑋
ത2
𝑛1𝑠1+
2 𝑛2𝑠2
2
𝑛1+𝑛2−2
(
1
𝑛1
+
1
𝑛2
)
𝑋
ത1, 𝑋
ത2 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑀𝑒𝑎𝑛 of I and II population
respectively
𝑛1 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒 𝑜𝑓 𝐼 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
𝑛2 = Sample Size of II Population
Two Tail Test
Reject H0 when
𝑡𝐶𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 > 𝑡𝑡𝑎𝑏𝑙𝑒 𝑣𝑎𝑙𝑢𝑒
(𝑛1 + 𝑛2-1) degrees of freedom
One Tail
Test
Upper Tailed Test
Ho : µ1 > µ2
H1 : µ1 < µ2
Lower Tailed Test
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
112. t Test Hypothesis Test Statistics / Test Procedure Decision Rule
Paired Sample t-Test
Two Tail
Test
Ho : µ1 = µ2
Ho : µ1 ≠ µ2
𝑡𝐶𝑎𝑙 =
𝑑
ത
𝑆𝑑
𝑛−1
𝑑 = 𝑥 − 𝑦
𝑑ҧ=
σ 𝑑
𝑛
Sd= Standard Deviation of “d”
Two Tail Test
Reject H0 when
𝑡𝐶𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 > 𝑡𝑡𝑎𝑏𝑙𝑒 𝑣𝑎𝑙𝑢𝑒
(𝑛1-1) degrees of freedom
One Tail
Test
Upper Tailed Test
Ho : µ1 > µ2
H1 : µ1 < µ2
Lower Tailed Test
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
113. F Test
𝑭 𝒗𝒂𝒍𝒖e =
𝝈𝟏
𝟐
𝝈𝟐
𝟐
Where 𝜎2
=
σ 𝑥−𝜘2
𝑛−1
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
114. Mann- Whitney U Test
𝑢1 = 𝑛1𝑛2 +
𝑛1 𝑛1 + 1
2
− 𝑅1
𝑢2= 𝑛1𝑛2 +
𝑛2 𝑛2+1
2
− 𝑅2
U = Min(𝑢1, 𝑢2
𝒁𝒄𝒂𝒍 =
𝒖−𝑬 𝒖
𝝈𝒖
Where
𝐸 𝑢 =
𝑛1𝑛2
2
𝜎 =
𝑛1𝑛2(𝑛1 + 𝑛2 − 1)
12
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
116. Normality and Reliability of Hypothesis Testing
• The normality and reliability test will be done to ensure the hypothesis test is consistent and to know the
required matter is measured during the process.
• A normality test is used to determine whether sample data has been
drawn from a normally distributed population (within some
tolerance).
• Reliability is the extent to which the measure will give the same response under similar circumstances. In
other words, reliability shows a measure of consistency in measure the same phenomenon.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
117. Methods to check the Reliability of Hypothesis
Testing
- Test-retest method
- Alternate or parallel forms
- Split-half techniques
- Kuder-Richardson Reliability and coefficient alpha
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
118. Bivariate Analysis
- Bivariate analysis is slightly more analytical than Univariate analysis. When the data set contains two
variables and researchers aim to undertake comparisons between the two data set then Bivariate analysis is
the right type of analysis technique.
- For example – in a survey of a classroom, the researcher may be looking to analysis the ratio of students who
scored above 85% corresponding to their genders. In this case, there are two variables – gender = X
(independent variable) and result = Y (dependent variable).
- Linear regression
- Simple regression
- Correlation
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
119. Multivariate Analysis
• Multivariate analysis is a more complex form of statistical analysis technique and used when there are more
than two variables in the data set. Here is an example –
• A doctor has collected data on cholesterol, blood pressure, and weight. She also collected data on the
eating habits of the subjects (e.g., how many ounces of red meat, fish, dairy products, and chocolate
consumed per week). She wants to investigate the relationship between the three measures of health and
eating habits?
• Factor Analysis
• Cluster Analysis
• Variance Analysis
• Discriminant Analysis
• Multidimensional Scaling
• Principal Component Analysis
• Redundancy Analysis
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
120. ANOVA – One Way
• A one-way ANOVA is a type of statistical test that compares the variance in the group means within a sample
while considering only one independent variable or factor.
• It is a hypothesis-based test, meaning that it aims to evaluate multiple mutually exclusive theories about our
data.
• A one-way ANOVA compares three or more than three categorical groups to establish whether there is a
difference between them. Within each group there should be three or more observations (here, this means
walruses), and the means of the samples are compared.
• In a one-way ANOVA there are two possible hypotheses.
- The null hypothesis (H0) is that there is no difference between the groups and equality between means.
(Walruses weigh the same in different months)
- The alternative hypothesis (H1) is that there is a difference between the means and groups. (Walruses have
different weights in different months)
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
121. ANOVA – One Way - Assumptions
- Normality – That each sample is taken from a normally distributed population
- Sample independence – that each sample has been drawn independently of the other samples
- Variance Equality – That the variance of data in the different groups should be the same
- Your dependent variable – here, “weight”, should be continuous – that is, measured on a scale which can
be subdivided using increments (i.e. grams, milligrams)
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
122. ANOVA – Two Way
• A two-way ANOVA is, like a one-way ANOVA, a hypothesis-based test. However, in the two-way ANOVA each
sample is defined in two ways, and resultingly put into two categorical groups.
• The two-way ANOVA therefore examines the effect of two factors (month and gender) on a dependent
variable – in this case weight, and also examines whether the two factors affect each other to influence the
continuous variable.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
123. ANOVA – Two Way - Assumptions
- Your dependent variable – here, “weight”, should be continuous – that is, measured on a scale which can be
subdivided using increments (i.e. grams, milligrams)
- Your two independent variables – here, “month” and “gender”, should be in categorical, independent
groups.
- Sample independence – that each sample has been drawn independently of the other samples
- Variance Equality – That the variance of data in the different groups should be the same
- Normality – That each sample is taken from a normally distributed population
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
124. One-Wayvs Two-Way ANOVA Differences Chart
One-Way ANOVA Two-Way ANOVA
Definition A test that allows one to make
comparisons between the means of three
or more groups of data.
A test that allows one to make comparisons
between the means of three or more groups
of data, where two independent variables
are considered.
Number of
Independent
Variables
One. Two.
What is Being
Compared?
The means of three or more groups of an
independent variable on a dependent
variable.
The effect of multiple groups of two
independent variables on a dependent
variable and on each other.
Number of Groups
of Samples
Three or more. Each variable should have multiple samples.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
127. • A key statistical test in research fields including biology, economics, and
psychology
• Analysis of Variance (ANOVA) is very useful for analyzing datasets.
• It allows comparisons to be made between three or more groups of data.
• In a given data set, one can observe two main variations. One due to chance
and the other due to some specific reasons.
• These variations are studied separately in ANOVA to identify the actual cause of
the variation and help the researcher to make effective decisions.
• Two types of ANOVA are commonly used, One-Way ANOVA and Two-Way
ANOVA.
128. Analysis of Variance
• ANOVA is an inferential statistics technique that allows you to
compare the mean level on one interval-ratio variable (such as
income) for each group relative to the others in a nominal variable
(such as degree).
• If you had only two groups to compare, ANOVA would give the same
answer as an independent samples t-test.
129. ANOVA
Isn’t it conceivable that the differences are due to natural random variability between samples? Would you
want to claim they are different in the population?
Marks scored by the students
Marks scored by the students
Just Imagine that the following distribution represents the distribution of marks scored by the students
belonging to a different section.
How do you interpret the data presentation?
Groups Broken Down
All Groups
130. ANOVA
Now…What if three sections had scores distributed like this in your sample?
Doesn’t it now appear that the groups may be different regardless of sampling variability? Would you
feel comfortable claiming the groups are different in the population?
All Groups Combined
Groups Separated Out
Marks scored by the students.
Marks scored by the students.
131. ANOVA
Conceptually, ANOVA compares the variance within groups to the overall variance
between all the groups to determine whether the groups appear distinct from each
other or if they look quite the same.
Different groups, different means.
Y-bar Y-bar Y-bar
Similar groups, similar means.
Y-bars
Categories
of Nominal
Variable
Measures on
Continuous
Variable
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 10 11 12 13 14 15 16
132. ANOVA
• When the groups have little variation within themselves, but large
variation between them, it would appear that they are distinct and
that their means are different.
Y-bar Y-bar Y-bar Y-bars
Different groups, different means. Similar groups, similar means.
133. ANOVA
• When the groups have a lot of variation within themselves, but little
variation between them, it would appear that they are similar and
that their means are not really different (perhaps they differ only
because of peculiarities of the particular sample).
Y-bar Y-bar Y-bar Y-bars
Similar groups, similar means.
Different groups, different means.
134. One – Way ANOVA
• One-way Analysis of Variance (ANOVA) is used to test whether the
means of two or more independent (Unrelated) groups are
statistically significantly different
• A Table of variation, ANOVA table represents as follows
Sources of Variance Sum of Squares (SS) Degree of
Freedom (d.f.)
Means of Square
(MS)
F Ratio
Between the sample Sum of Squares
between the sample
(SSB)
(K-1) MSB = SSB/(k-1) MSB(mean sum of squares
between)/MSW(mean sum of
Squares within)
F Ratio = MSB/MSW
Within the sample Sum of Squares
within the sample
(SSW)
(n-k) MSW = SSW/(n-k)
Total Total Sum of Squares (n-1)
135. Assumptions of One way ANOVA
• Normally distributed outcome
• Equal variances between the groups
• Groups are independent
Hypothesis of One way ANOVA
=
=
= 3
2
1
0 μ
μ
μ
:
H
same
the
are
means
population
the
of
all
Not
:
1
H
136. The process of carrying out one-way ANOVA
• Calculate the mean of each sample
• Calculate the mean of all sample means
• Calculate the variation between two samples, Known as SSB (Sum of
Squares between)
• Divide SSB with the degrees of freedom (d.f.) to get the mean of the square
between.
• The mean square between in the mean of variations in two samples
• Calculate the variation within the samples known as SSW(SS within)
• Divide SSW with the degrees of freedom (n-k) to get the mean square
within (MS within)
• Add the square of deviation to get the total variation in the sample
• Calculate the F Ratio
137. Problem
• The researcher observed the sale of products of a particular brand in
six big retail houses in three cities. He/she wants to determine
whether the mean sale is the same across the cities. Use the data
shown in the following table to calculate one-way ANOVA:
Retail Houses City A City B City C
1 3 6 9
2 8 9 8
3 4 8 6
4 9 5 7
5 6 7 5
6 7 4 7
138. Steps
Step 1: Defining the hypothesis
H0: There is no significant difference in sales between the three
cities / The sales in the three cities are the same.
Step 2: Calculate the mean sales of three cities separately, and the total
sample mean
Retail Houses City A City B City C
1 3 6 9
2 8 9 8
3 4 8 6
4 9 5 7
5 6 7 5
6 7 4 7
Mean 6.17 6.5 7
Mean of Samples 6.556666667
139. Steps
• Step 3: Calculate Sample Square Between
• Step 4: Calculate the sample Square WITHIN
140. Steps
• Step 5: Calculate the total Variance
• Step 6: Creating a ANOVA table
Sources of
Variance
Sum of
Squares
(SS)
Degree of
Freedom
(d.f.)
Means of
Square (MS)
F Ratio 5% F
Limit
Between the
sample
2.1 (3-1) = 2 MSB = 2.1/2
= 1.06
MSB(mean sum of squares
between)/MSW(mean sum of
Squares within)
F Ratio = MSB/MSW = 1.06/3.64 = .29
3.68
Within the
sample
54.34 (18-3) = 15 MSW = 54.34/15
= 3.64
Total 56.48 (18-1) = 17
141. Ho is accepted, and H1 is rejected. The value implies that the product’s sales are almost the same in the three cities
There is no significant difference in sales among these cities
The F Ratio value is < the critical/Table Value
Hence the null hypothesis is accepted
144. Steps
Sources of
Variance
Sum of
Squares
(SS)
Degree of
Freedom
(d.f.)
Means of
Square (MS)
F Ratio 5% F
Limit
Between the
sample
2.1 (3-1) = 2 MSB = 2.1/2
= 1.06
MSB(mean sum of squares
between)/MSW(mean sum of
Squares within)
F Ratio = MSB/MSW = 1.06/3.64 = .29
3.68
Within the
sample
54.34 (18-3) = 15 MSW = 54.34/15
= 3.64
Total 56.48 (18-1) = 17
145. Homework
• How much of the variance in height is explained by the treatment group?
Treatment 1 Treatment 2 Treatment 3 Treatment 4
60 inches 50 48 47
67 52 49 67
42 43 50 54
67 67 55 67
56 67 56 68
62 59 61 65
64 67 61 65
59 64 60 56
72 63 59 60
71 65 64 65
147. Two Way ANOVA – Steps Involved
• Step 1: Find the Correction term
• Step 2: Find the Sum of Squares of the total (SST)
• Step 3: Sum of Squares of Column
• Step 4: Sum of Squares of Rows
• Step 5: Find the Sum of the Square Residual
• Step 6: Creating ANOVA Table
148. Problem
• Three respondents have rates three small cars of different brands on
a five-point scale (5 being the highest) concerning their features. The
ratings and features are provided in the following table.
Respondents Mileage Durability Maintenance
Cost
Technology Price
1 Zen 3 2 4 3 5
I10 4 4 4 5 4
Alto 4 3 5 2 4
2 Zen 2 4 3 1 4
I10 4 5 3 4 4
Alto 3 1 2 5 3
3 Zen 4 5 3 2 4
I10 3 2 4 5 3
Alto 4 5 4 5 5
149. Steps
• Step 1: Find the correction term
• Step 2: Find the Sum of Squares of total (SST)
Total of All the observations 162
Square of total 162*162 = 26244
Number of Observations 45
Square of total/No. of
Observations 26244/45 = 583.2
Sum pf Squares of All the
individual 638
Correction term 583.2
SST 54.8
Here the researcher wants to know the difference between the brands in terms of features.
H0: There is no difference in the means of the five features of the cars.
150. Steps
• Step 3: Sum of Squares of Column (i.e. between the variables)
• Step 4: Sum of Squares of Row (i.e. between the Cars)
Sum of Colums 31 31 32 32 36
Sum of Squares of Column 961 961 1024 1024 1296
Sum of Squares of Column / Observation of
Column 585.2
Correction term 583.2
Sum of Squares between(SSB) 2
Sum of rows - 1 Respondents 56 Sum of Rows - 2 Respondent 48 Sum of Rows - 3 Respondent 58
Square of Row - 1 Respondent 3136 Square of Row - 2 Respondent 2304 Square of Row - 3 Respondent 3364
Sum of Squares of Rows 8804
Sum of Squares of Rows/ Observation 587
Correction term 583.2
Sum of Squares between Cars 3.8
151. Steps
• Step 5: Find the Sum of the Square Residual
• Step 6: Creating ANOVA Table
Sources of
Variance
Sum of
Squares
(SS)
Degree of
Freedom
(d.f.)
Means of
Square (MS)
F Ratio 5% F Limit
Between
Column
2 (5-1) = 4 2/4 = 0.5 0.5/6.125 = 0.08 F(4,8) = 3.84
Between Rows 3.8 (3-1) = 2 3.8/2 = 1.9 1.9/6.125 = 0.31 F(2,8) = 4.46
Residual 49 (5-1)(3-1) = 8 49/8 = 6.125
Total 56.48 (45-1) = 44
152.
153. F Calculated value is less than the F Critical Value / Table Value; the Null hypothesis is accepted.
F Value lies in the acceptance region; hence H0 is accepted, and H1 is rejected.
So we can state that there is no difference in the means of the five features of the cars.
155. 155
Parametric and Nonparametric Tests (cont.)
• The term "non-parametric" refers to the fact that the chi-square tests
do not require assumptions about population parameters, nor do
they test hypotheses about population parameters.
• Previous examples of hypothesis tests, such as the t tests and analysis
of variance, are parametric tests and they do include assumptions
about parameters and hypotheses about parameters.
• The most obvious difference between the chi-square tests and the
other hypothesis tests we have considered (t and ANOVA) is the
nature of the data.
• For chi-square, the data are frequencies rather than numerical scores.
157. Chi-Square Test
• This statistical test is to compare the observed results with the
expected results.
• The purpose is to determine whether the difference is due to chance
or a relationship due to the relationship among the variables we are
studying.
• Chi-square enables us to understand and interpret the relationship
between two categorical variables.
• Chi-square test is denoted by the symbol χ2
• This test can be performed for the categorical data than the
numerical data
• The formula to calculate the ch-square test is
158. Applications of Chi-Square test
• To test the divergence of observed results from the expected results
when our expectations are based on the hypothesis of equal
probability
• Chi-square test is used to determine the degree of association
between the two variable.
159. O = Observed or actual values
E = Expected Value
161. Chi-Square Test for Goodness of Fit
• This test helps the researcher to know whether the theoretical
distribution is fitted to the observed data and to what extent.
• It allows you to draw conclusions about the distribution of a population
based on a sample. Using the chi-square goodness of fit test, you can
test whether the goodness of fit is “good enough” to conclude that the
population follows the distribution.
• Goodness-of-Fit is a statistical hypothesis test used to see how closely
observed data mirrors expected data.
162. Assumptions
• 1 or more categories
• Independent observations
• A sample size of at least 10
• Random sampling
• All observations must be used
• For the test to be accurate, the expected frequency should be at least 5
163. Chi-Square Test for Goodness of Fit - Problems
Test the hypothesis that the customers have no preference for any particular products. Use a 5% level of significance
164. Solution:
Step 1: Formulating the hypothesis:
Ho: The customers have no preference for any particular products
H1: Customers have a preference for a particular product
Step 2: Level of Significance, In the problem, it was given as 5%
The degrees of freedom (n-1) = (4-1) = 3
Step 3: Calculate χ2 Value
165. Solution:
Step 4: Compare the χ2
Value with the Critical value at 5% level of significance and 3 degrees of freedom
Here the critical value / Table value = 7.81
So Calculated chi-squared (27.2) is > than the chi-squared table (7.81), Hence rejecting the Null hypothesis
Product Number of Customers Preferred (O) Expected Value (E) (O-E) (O-E)^2 (O-E)^2/E
Product A 300 250 50 2500 10
Product B 280 250 30 900 3.6
Product C 220 250 -30 900 3.6
Product D 200 250 -50 2500 10
Total 1000 (O-E)^2/E 27.2
Average
(Expected
Value) 250
χ2
= 27.2
Step 3: Calculate χ2 Value
𝛴
166.
167. Example 2:
The following table gives the number of defective items in a factory on
various days in a week.
Using the chi-square test checks whether the defective items are
uniformly distributed or not at 5% Level of significance
Days Number of defective Items
Monday 14
Tuesday 22
Wednesday 16
Thursday 18
Friday 12
Saturday 19
Sunday 11
168. Solution:
Step 1: Formulating the hypothesis:
Ho: The defective items are uniformly distributed across the days
H1: The defective items are not uniformly distributed across the days
Step 2: Level of Significance, In the problem, it was given as 5%
The degrees of freedom (n-1) = (7-1) = 6
Step 3: Calculate χ2 Value
169. Solution:
Step 4: Compare the χ2 Value with the Critical value at 5% level of significance and 6 degrees of freedom
Here the critical value / Table value = 12.59
So Calculated chi-squared (5.875) is < than the chi-squared table (12.59), Hence Accepting the Null hypothesis
Therefore, The defective items are uniformly distributed across the days.
χ2
= 5.875
Step 3: Calculate χ2
Value
Days Number of defective Items (O) Expected Value (E) (O-E) (O-E)^2 (O-E)^2/E
Monday 14 16 -2 4 0.25
Tuesday 22 16 6 36 2.25
Wednesday 16 16 0 0 0
Thursday 18 16 2 4 0.25
Friday 12 16 -4 16 1
Saturday 19 16 3 9 0.5625
Sunday 11 16 -5 25 1.5625
Total 112 (O-E)^2/E 5.875
Average 16
𝛴
170.
171. Chi-Square Test for Independence
• Here, the two attributes/variables are tested to determine whether they are
associated.
• Example: Whether introducing a training program increases the efficiency of
employees. Intend to establish a relationship between training and the
efficiency of employees.
• It allows you to draw conclusions about a population based on a sample.
Specifically, it allows you to conclude whether two variables are related in
the population.
• can be used and interpreted in two different ways:
1. Testing hypotheses about the relationship between two variables in
a population, or
2. Testing hypotheses about differences between proportions for two
or more populations.
172. Chi-Square Test for Independence - Problems
Example 1: The researcher has the data for the preference of men and women
regarding joint and nuclear families, as shown in the table
The researcher wants to know whether the preference of men and women
about the type of family is the same or not at 5% Level of Significance
Joint Family Nuclear Family Total
Men 96 35 131
Women 170 360 530
Total 266 395 661
173. Solution:
Step 1: Formulating the hypothesis:
Ho: The opinion of men and women about the type of family is indifferent
H1: The opinion of men and women about the type of family is different
Step 2: Level of Significance, In the problem, it was given as 5%
The degrees of freedom (r-1)(c-1) = 1
Step 3: Calculate χ2 Value
Expected Value = Row Total * Column Total / Grand Total
174. Solution:
Step 4: Compare the χ2 Value with the Critical value at 5% level of significance and 1 degree of freedom
Here the critical value / Table value = 3.84
So Calculated chi-squared (74.17) is > than the chi-squared table (3.84), Hence rejecting the Null hypothesis.
Therefore, The opinion of men and women about the type of family is different.
χ2
= 74.17
Step 3: Calculate χ2
Value
Items Number of Preference
Expected Value
(E) (O-E) (O-E)^2 (O-E)^2/E
Men Towards Joint Family 96 52.72 43.28 1873.41 35.54
Women Towards Joint Family 170 213.28 -43.28 1873.41 8.78
Men towards nuclear family 35 78.28 -43.28 1873.41 23.93
Women towards nuclear family 360 316.72 43.28 1873.41 5.92
(O-E)^2/E 74.17
𝛴
Expected Value is calculated with this formula = E = (Row Total *Column total) / Grand Total
Example: Expected value for “men towards Joint Family” is calculated E=(131*266)/661 = 52.72