SlideShare a Scribd company logo
1 of 175
Download to read offline
Business Statistics
22MBA14
Theory
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Unit – 1
Measures of Central Tendency
&
Measures of Dispersion
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Statistics: Meaning and Definition
“ Statistics is the science of estimates and probabilities”
“Statistics may be defined as the collection, presentation, analysis, and
interpretation of numerical data”
“Statistics is a science which deals with the method of collecting,
classifying, presenting, comparing, analyzing, and interpreting
numerical data. Collected to through light on enquiry”.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Functions of Statistics
• To collect and present facts in a systematic manner.
• To help in the formulation and testing of the hypothesis.
• To help in facilitating the comparison of data.
• To help in predicting future trends.
• To help to find the relationship between the variables.
• Simplifies the mass of complex data.
• To help to formulate policies.
• To help governments to make decisions.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Limitations of Statistics
• Does not study the qualitative phenomenon
• Does not deal with individual items
• Statistical results are true only on an average.
• Statistical data should be uniform and homogeneous
• Statistical Results depend on the accuracy of the data
• Statistical conclusions are not universally true.
• Statistical results can be interpreted only if a person has sound
knowledge of statistics
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Collection and presentation of data
• Data Collection
• Primary data – Collected first time by the investigator, They are in the shape
of raw materials
• Secondary Data – Already collected data for a purpose other than the
problem at hand.
Primary Data Secondary Data
Collection Purpose For the problem in hand For the other problems
Collection Process Very involved Rapid and easy
Collection Cost High Relatively low
Collection Time Long Short
Suitability Its suitability is positive It may or may not suit the object of the survey
Originality It is original It is not original
Precautions Not Extra precaution required to use the
data
It should be used with extra care
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Measure of Central Tendency
• Meaning:
A measure of central tendency is a single value that describes the way
in which a group of data cluster around a central value. To put in other
words, it is a way to describe the centre of a data set. A measure of
central tendency is a measure that tells us where the middle of a bunch
of data lies.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Application of Central Value:
• Central tendency also allows you to compare one data set to another.
• Central tendency is also useful when you want to compare one piece
of data to the entire data set.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Different Measures of Central Tendency:
• Mean
• Median
• Mode
• Geometric Mean
• Harmonic Mean
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Mean
• Mean: Mean is the most common measure of central tendency. It is
simply the sum of the numbers divided by the number of numbers in
a set of data. This is also known as average.
• The Arithmetic Mena is a good measure of central tendency
Reasons:
• It takes all the observation into account while calculating
• It is can be used for further mathematical treatment
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Mean
• Mathematical characteristics of Arithmetic Mean
• The sum of the deviations, of all the values of x, from their arithmetic
mean, is zero.
• Sum of square deviation taken from the AM is always least among
such deviations taken from other measures of other tendency
• Mean of the combined series: If we know the sizes and means of two
component series, then we can find the mean of the resultant series
obtained on combining the given series.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Median
• Median is another measure of Central Tendency which locates the
middle most value in given set of data
• Median is the measure of Central Tendency different from any of the
means
• Median is a single value from the data set that measures the central
item in the data
• Median is that value of the variable which divides the group in two
equal parts, one part comprising of the values greater than and the
other less than Median
• This single item is the middlemost or most central item in the set of
numbers. As said earlier half of the items lie above this point and the
other half lie below it
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Median
Median M = L +
𝑁
2
−𝑀 ∗𝐶
𝑓
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Median
Merits:
• Rigidly defined
• Easy to calculate for non-mathematical person
• Since, it is a positional average, not affected by the extreme
observations. Useful in the skewed distribution
• Computed while dealing with open ended classes
• Located by simple inspection and even graphically
• This is the only average which will deal with qualitative characteristics
Demerits:
It doesn’t take all the observation into account while calculating
average
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Mode
• Mode is one of the measure of central tendency that is different from the
mean that somewhat like the median
• The mode is the value that is repeated most often in the data set
• The mode is defined as the highest or the most popular value in the given data
• Mode is the value which occurs most frequently in a set of observations and
around which the other items of the set clusters densely located
• It is the value at the point around which the items tend to be most heavily
concentrated. It is regarded as the most typical of a series of values
• Mode is the value which has the greatest frequency density in its immediate
neighbourhood
• Mode is termed as the fashionable value of the distribution
• Example: Average size of the shoe sold in a shop is 7
• Average Indian Male is 5 feet 6 inch
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Mode
Z = L +
𝑓−𝑓1 ∗𝐶
2𝑓−𝑓1−𝑓2
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Mode
• Merits and Demerits
• Merits:
• Easy to calculate and understand; done by merely inspection
process
• Not affected by observations
• Convenient for open ended class
• Demerits:
• Mode is not rigidly defined
• Mode is not suitable for further mathematical treatment
• Affected to a greater extent with the fluctuation of samplings
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Empirical Relationship between Mean (ഥ
𝑿 ), Median
(M) and Mode (Z) (Slightly Skewed)
Z = 3M – 2 ഥ
𝑿
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Geometric Mean
• GM is nth root of product of quantities of the series. It is observed by
multiplying the values of items together and extracting the root of the
product corresponding to the number of items.
• Thus, square root of the products of two items and cube root of the
products of the three items are the geometric mean
• It is never larger than the arithmetic mean
• If there are zeroes and negative numbers in the series, the geometric mean
cannot be used.
• Logarithms can be used to find the geometric mean to reduce large
numbers and to save time
• Appropriate in situations where, there is an average percentage rate of
change over a period of time.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Geometric Mean
•GM = Antilog
σ 𝑓 𝑙𝑜𝑔𝑥
𝑁
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Merits and demerits of GM
• Merits of GM
• It is based on all the observation in the series
• It is rigidly defined
• It is suited for averages and ratios
• It is less affected by extreme values
• It is useful for studying social and economic data
• Demerits of GM
• It is not simple to understand
• It requires computational skill
• It cannot be computed if any items are zero or negative
• It has restricted applications
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Harmonic Mean
• It is the total number of items of a value, divided by the sum of
reciprocal of values of a variable
• It is a specified average which solves problems involving
variables expressed in “Time rates” that vary according to time
• Example: Speed in km/hr., min/day, Price/chapter
• Harmonic mean (HM) is suitable only when time factor is a
variable and the act being performed remains constant
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Harmonic Mean
HM =
𝑁
σ
1
𝑥
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Harmonic Mean
• Merits of Harmonic Mean
• It is based on all observation
• It is rigidly defined
• ‘Suitable in case of series having wide dispersion
• It is suitable for further mathematical treatment
• Demerits of Harmonic Mean
• It is not easy to compute
• Cannot be used when one of the items is zero
• It cannot represent distribution
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Relationship between AM, HM and GM
The relationship between AM, GM, and HM can be represented by the
formula AM x HM = GM2. The geometric mean (GM) equals the
product of the arithmetic mean (AM) and the harmonic mean (HM) .
Characteristics of Good Average
• It should be easy calculate and simple to follow.
• Average should represent the entire mass of data.
• Averages are always capable of further algebraic treatment.
• A good average should be an absolute number.
• A good average is one which is not affected by skewness in the
distribution.
• It should not be unduly affected by extreme values.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Pre-requisites of Good Measures of Central
Tendency
• It should be rigidly defined
• It should be based on all observations
• It should be easy to understand and calculate
• It should have sampling stability
• It should not be unduly affected by extreme observations
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Measures of Dispersion
• Meaning
• Dispersion is the scattered ness of the data series around it average
• Dispersion is the extent to which values in a distribution differ from
the average of the distribution
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Measures of Dispersion
• Why Dispersion?
• Determine the reliability of an average
• Serve as a basis for the control of the variability
• To compare the variability of two or more series and
• Facilitate the use of other statistical measures.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Measures of Dispersion
• Characteristics of an Ideal Measure of Dispersion?
• It should be rigidly defined.
• It should be easy to understand and easy to calculate.
• It should be based on all the observations of the data.
• It should be easily subjected to further mathematical treatment.
• It should be least affected by the sampling fluctuation .
• It should not be unduly affected by the extreme values.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Measures of Dispersion
• Different Measures of Dispersion
• The range
• The inter quartile range and quartile deviation
• Percentile
• Decile
• The mean deviation or average deviation
• The standard deviation
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
The Range
Range is the crude measure of dispersion;
Calculated as
R = High – Low
R = H – L
Its relative measure is called co-efficient of Range R
=
𝐻−𝐿
𝐻+𝐿
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Quartile Deviation
Quartile Deviation - QD = Q3 – Q1 (Inter quartile range)
QD = =
𝑄3−𝑄1
2
(Semi - Inter quartile range)
Co-efficient of QD = =
𝑄3−𝑄1
𝑄3+𝑄1
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Mean Deviation
Mean Deviation =
σ f I X−A I
𝑛
or
σ f I d I
𝑛
Relative Measure of Mean Deviation
Co-efficient of Mean Deviation =
Mean Deviation
Average about which it is calculated
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Standard Deviation
𝜎 =
σ ሻ
𝑓(𝑥 2
𝑁
− ( ቇ
σ 𝑓𝑋
𝑁
2
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Standard Deviation
Characteristics of Standard Deviation:
- SD is very satisfactory and most widely used measure of
dispersion
- Amenable for mathematical manipulation
- It is independent of origin, but not of scale
- If SD is small, there is a high probability for getting a value close to
the mean and if it is large, the value is father away from the mean
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Co-efficient of Variance
If Co-efficient of Variance for a given data is more, the data is said to
be less consistent, the other hand if C.V is less it means that
variability in the data is less and more consistentfrom the mean
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Unit – 2
Correlation and Regression
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Correlation and Correlation Co-efficient:
• Correlation
• Correlation is a statistical measure that indicates the extent to which
two or more variables fluctuate together.
• A positive correlation indicates the extent to which those variables
increase or decrease in parallel;
• A negative correlation indicates the extent to which one variable
increases as the other decreases.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Correlation and Correlation Co-efficient:
• Correlation Co-efficient
• A correlation coefficient is a statistical measure of the degree to
which changes to the value of one variable predict change to the
value of another.
• When the fluctuation of one variable reliably predicts a similar
fluctuation in another variable, there’s often a tendency to think that
means that the change in one causes the change in the other.
• However, correlation does not imply causation. There may be, for
example, an unknown factor that influences both variables similarly.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Correlation Co-efficient Value:
r lies between -1 and +1
• If r lies between 0 to 1, that means positive correlation exists
• If r is exactly 1, the correlation is perfect positive correlation
• If r lies between -1 to 0, that means negative correlation exists
• If r is -1, that implies perfect negative correlation
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Correlation Analysis
• Correlation Analysis
• Correlation analysis is a method of statistical evaluation used to study
the strength of a relationship between two, numerically measured,
continuous variables (e.g. height and weight).
• Applications of Correlation:
• The most valuable use of a correlation is in predicting the future of a business
direction.
• Correlation is used to assess the direction of change
• It is used to measure the performance measures and for data mining
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Different types of Correlation
• Positive Correlation
• Positive correlation occurs when an increase in one variable increases
the value in another.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Different types of Correlation
• Negative Correlation
• Negative correlation occurs when an increase in one variable
decreases the value of another.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Different types of Correlation
• No Correlation
• No correlation occurs when there is no linear dependency between
the variables.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Different types of Correlation
• Perfect Positive Correlation
• Perfect correlation occurs when there is a functional dependency
between the variables.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Different types of Correlation
• High degree of Positive Correlation
• A correlation is stronger the closer the points are located to one
another on the line.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Different types of Correlation
• Low degree of Positive Correlation
• A correlation is weaker the farther apart the points are located to one
another on the line.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Different Methods of Studying Correlation Analysis
• Scatter diagram method
• Karl Pearson’s Co-efficient of Correlation (Covariance method)
• Two way frequency table (Bivariate correlation method)
• Ranks method or Spearman’s Rank Correlation
• Concurrent Deviation Method
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Different Methods of Studying Correlation Analysis
• Scatter diagram method
• It is one of the simplest way or method of diagrammatic representation of a
bivariate distribution and provides us one of the simplest tool of ascertaining the
correlation between two variables
• The “n” points are plotted as dots of two variables (Examples heights and weight).
The diagram of dots so obtained is known as “Scatter Diagram”
• From the scatter diagram, we can form a fairly good, tough rough idea about the
relationship between the two variables.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Different Methods of Studying Correlation Analysis
• Scatter diagram
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Different Methods of Studying Correlation Analysis
• Karl Pearson’s Co-efficient of Correlation
r =
𝑛 σ 𝑥𝑦− σ 𝑥 . σ 𝑦
𝑛.σ 𝑥2−(σ 𝑥ሻ
2
∗𝑛.σ 𝑦2−(σ 𝑦ሻ
2
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Different Methods of Studying Correlation Analysis
• Ranks method or Spearman’s Rank Correlation
𝜌 = 1 −
6 σ 𝐷2
𝑛3−𝑛
𝜌 = 1 −
6 (σ 𝐷2+𝐶𝐹ሻ
𝑛3−𝑛
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Regression Analysis
• Regression Analysis
• The regression analysis is a statistical process for estimating the relationships
among variables. Regression is the attempt to explain the variation in a
dependent variable using the variation in independent variables.
• Uses or Application of Regression:
• The most common use of regression in business is to predict events that have
yet to occur. Demand analysis, for example, predicts how many units consumers
will purchase.
• Another key use of regression models is the optimization of business processes.
A factory manager might, for example, build a model to understand the
relationship between oven temperature and the shelf life of the cookies baked
in those ovens.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Regression Equation
• X on Y
(x- ҧ
𝑥) = bxy (y-ത
𝑦)
• Y on X
(y-ത
𝑦) = byx (x- ҧ
𝑥)
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Regression Coefficients
bxy =
𝑛 σ 𝑥𝑦−(σ 𝑥ሻ (σ 𝑦ሻ
𝑛 σ 𝑦2− (σ 𝑦ሻ
2 or bxy = r .
𝜎𝑥
𝜎𝑦
byx =
𝑛 σ 𝑥𝑦−(σ 𝑥ሻ (σ 𝑦ሻ
𝑛 σ 𝑥2− (σ 𝑥ሻ
2 or byx = r .
𝜎𝑦
𝜎𝑥
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Simple and Multiple regression
• Simple regression:
• The linear regression model used to describe the relationship between a
dependent variable y and an independent variable x is given by
y=a+bx
• Multiple regression
• Multiple regression is a statistical technique that can be used to analyze the relationship
between a single dependent variable and several independent variables. The objective of
multiple regression analysis is to use the independent variables whose values are known to
predict the value of the single dependent value.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Unit – 3
Probability Distribution
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Important terminologies
• Experiment
• Trail
• Event
• Mutually Exclusive Event
• Dependent and independent event
• Equally Likely event
• Simple and Compound events
• Exhaustive events
• Complementary events
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Important terminologies
• Experiment
The term experiment refer to describe an act which can be repeated under same
given conditions.
Random experiment: An experiment is called random experiment if when
conducted repeatedly under essentially homogeneous conditions, the result is not
unique or results is not certain but may be any one of the various possible
outcomes.
Or
An Experiment having random outcomes
Or
Experiments whose results are depends on chance
Example: Tossing a coin, rolling a dice
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Important terminologies
• Trail
Performing of a random experiment is called a trial
Example: Tossing experiment of a coin has done two times, that means two
trials
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Important terminologies
• Event
Outcome or combination of outcomes of an experiment are termed as
events
Example: Tossing a coin – You may get H or T – These are events
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Important terminologies
• Mutually Exclusive Events:
Two events are said to be mutually exclusive or incompatible, when both cannot
happen simultaneously in a single trial or in other words, the occurrence of any
one of them avoid the occurrence of the other.
In other words “if happening of one event prevents the happening of the other
events such events we call it as mutually exclusive events”
Example: Tossing a coin leads to two events Head (H) or Trail (T)
If head turns up in tossing a coin, then head prevents tail to turn-up and vice-versa
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Important terminologies
• Independent and Dependent Event
Two or more events are said to be independent when the outcome of one
doesn’t affect, and is not affected by the other.
Example: Tossing of coin twice, happening of head during the first trail will
not affect the happening of other in the next trial
The occurrence and non-occurrence of one event in any one trial affect the
probability of other event in other trial
Example: Drawing a card without replacement.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Important terminologies
• Equally Likely
Events are said to be equally likely when one doesn’t occur more often than
the others. This means none of them is expected to occur in preference of
other.
In other words – equal chance of occurrence and importance for all the
events to occur
Example: When you roll a dice, occurrence of all the 6 faces i.e. 1, 2, 3, 4, 5,
6 are equally likely
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Important terminologies
• Simple and Compound Events
In case of simple events we consider the probability of the happening or not
happening of single events
Compound events, we consider the joint occurrence of two or more events
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Important terminologies
• Exhaustive Events
Events are said to be exhaustive when their totality includes all the possible
outcomes of a random experiment.
In other words, if the sum of individual chance of occurrence is equal to 1
Example1 : Rolling dice, once the possible outcomes are 1, 2, 3, 4, 5 and 6,
hence the exhaustive number of cases is 6
Example 2: If we roll two dice once the exhaustive number of cases is 62
= 36
Similarly for rolling of three dice leads to 216 outcomes and summation of
possibilities or probability of occurrence of all these events is 1
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Important terminologies
• Complementary events
Let there be two events A and B, A is called the complementary event of B
(and Vice versa), if A and B are mutually exclusive and exhaustive.
Example: When the dice is thrown, the occurrence of an even number
and odd number are complementary events.
Simultaneous occurrence of two events A and B is generally written as AB
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Definition of Mathematical Probability
• If there be a random experiment with “N” outcomes which are mutually
exclusive, exhaustive and equally likely
• Let there be an event “A”, Let “M” outcome occur for the event “A”
(Favourable outcomes), then the probability of occurrence of “A” can be
written as follows
P (A) =
𝑚
𝑁
=
𝑂𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝐹𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒 𝑡𝑜 "𝐴"
𝑇𝑜𝑡𝑎𝑙 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑂𝑢𝑡𝑐𝑜𝑚𝑒𝑠
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Theorems of Probability or Rules of Probability
The two important theorems of probability
• The addition theorem
• The multiplication theorem
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Theorems of Probability or Rules of Probability
The two important theorems of probability
• The addition theorem
P(A or B) = P(A U B) = P(A) + P(B) – Mutually Exclusive events
P(A or B) = P(A U B) = P(A) + P(B) – P(A ⊓ B) – Events overlap
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Theorems of Probability or Rules of Probability
The two important theorems of probability
• The Multiplication theorem
P(A and B) = P(A) × P(B) - Independent events
• P(A ⊓ B) = P(A) × P(B/A) ; P(A) ≠ 0
• P(B ⊓ A) = P (B) × P(A/B) ; P(B) ≠ 0
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Bayes Theorem of Probability
•P(A / B) =
𝑃 (𝐴 ∩ 𝐵ሻ
𝑃 (𝐵ሻ
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Random Variable
Random means “Unpredictable”
A random variable x is a variable whose possible values are numerical
outcomes of a random phenomenon.
There are two types of a random variable
- Discreate Random variable
- Continuous random variable
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Discrete and Continuous Random Variable
• Discrete Random Variable
• Continuous Random Variable
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Theoretical Probability Distribution
•Binomial Distribution
•Poisson Distribution
•Normal Distribution
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Theoretical Probability Distribution
•Binomial Distribution
• It is also known as “Bernoulli Distribution”, Probability distribution
expressing the probability of one set of dichotomous alternatives i.e.
success or failur
• Bernoulli trail: A trail having only two outcomes
Example: Tossing a coin: H or T
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Theoretical Probability Distribution
• Binomial Distribution
• Binomial Probability Distribution
Let “x” be a random variable for a binomial variable with “n” trail and P(Success) = p,
then probability of “x” number of success is given by
P(x) = n𝐶𝑥 . 𝑝𝑥
. 𝑞𝑛−𝑥
• Where
x = Number of success in “n” trail
n = Number of trail
p = probability of success in a single trail
q = (1-p) = (1-Success)
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Theoretical Probability Distribution
•Binomial Distribution
• Constants of Binomial Distribution
• Mean = np
• Variance = npq
• Standard Deviation = 𝑛𝑝𝑞
• Parameters of Binomial Distribution
n, p, q
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Theoretical Probability Distribution
•Poisson Distribution
• Poisson distribution may be expected in cases where the chance of any
individual event being a success is small.
• The distribution is used to describe the behaviour of rare events such as
the number of accidents on road, Number of printing mistakes in books.
• It has been called “the Law of Impossible Events”
• P(x) =
𝑒−𝜆 𝜆𝑥
𝑥!
Where x= 1, 2, 3, 4…
𝜆 = Parameters of the Poisson distribution
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Theoretical Probability Distribution
•Poisson Distribution
• Constants of Poisson Distribution
• The mean of Poisson distribution = 𝜆
• The standard deviation = 𝜆
Parameters Poisson Distribution − 𝜆
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Theoretical Probability Distribution
• Normal Distribution
• Normal curve is “bell shaped” and symmetrical in it appearance
• The height of the normal curve is at its maximum at the mean. Hence the mean and
mode of the normal distribution coincide. Thus for a Normal Distribution Mean,
Median and Mode are all equal.
• There is one maximum point of the normal curve which occurs at the mean
• Since there is only one maximum point, the normal curve is uni-modal i.e. it has only
one Mode
• As dissatisfied from Binomial and Poisson distribution where the variable is discrete.
The variable distributed according to the normal curve is continuous.
• The first and third quartile are equidistance from the Median
• The area under the normal curve distributed as follows
• Mean ± 1𝜎 covers 68.27% area and 34.135 % area will lie on either side of the Mean
• Mean ± 2𝜎 covers 95.45% area
• Mean ± 3𝜎 Covers 99.73% area
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Theoretical Probability Distribution
•Normal Distribution
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Unit – 4
Time Series Analysis
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Objective of Time Series Analysis
• The assumption underlying time series analysis is that the time
series data behaves the same in the future as that in the past.
Time series analysis is used to detect the pattern underlying
data, isolate the influencing factors which in turn used to
estimate the future accurately. Thus, the time series data helps
us to cope with the uncertainty about the future.
• To review and evaluate the progress made in the plans are
based on the time series data. For example, Finance Minstry of
Govt. of India (GOI) reviewing the gross domestic product )
GDP of the economy during the financial year and chalking out
the strategies to further the growth.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Variations/Components in Time Series Analysis
• In typical time-series there are three main components
which seem to be independent of one another and seems
to be influencing time-series data.
• An important step in analysing time series is to consider
the types of data patterns. A time series data can contain
some or all of the following elements. They are:
• Trend (T)
• Cyclical (C)
• Seasonal (S)
• Irregular (I)
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Variations/Components in Time Series Analysis
• Trend (T)
Trend (T) : The trend is the long term pattern of a time series. A
trend can be positive or negative depending on whether, the
time series exhibits an increasing long term pattern or a
decreasing long term pattern. The rate of trend growth usually
varies over time.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Variations/Components in Time Series Analysis
• Cyclical(T)
• Cyclical (C) : Time series data may show up and down movement around a
given trend. For example, business cycle over the years show upward
trend and touches its peak and then it may show slump and hits the
bottom. The pattern repeats but not a regular interval of time. The
duration of a cycle depends on the type of business or industry.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Variations/Components in Time Series Analysis
• Seasonal
• Seasonal (S): It is a special case of a cycle component of time series in
which the magnitude and duration of the cycle do not vary but happen at
a regular interval each year. Seasonality occurs when the time series
exhibits regular variation during the same periods (Month, Year or same
quarter every year)
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Variations/Components in Time Series Analysis
• Irregular
• Irregular or Random: This type of variation is unpredictable. This is caused
by short term unanticipated and non-recurring factors. These follows np
specific pattern
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Methods of Evaluation the Trend
• These are also called the forecasting methods of Time Series Analysis
Some of them are
• Freehand Method
• Moving Average Method
• Semi-average Method
• Least-Square Method
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Methods of Estimating Seasonal Index
• Method of Simple Averages
• Ratio to trend method
• Ratio to moving average method
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Unit – 5
Hypothesis Testing
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Hypothesis
• A Hypothesis is an assumption or a statement that may or may not be
true
• It is tested base on the data / information obtained from a sample
• It is used to make decisions related to business
Example:
1. Whether a new drug is more effective than the new drug
2. Whether the proportion of smokers in a class is different from 0.30
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Characteristics of a good hypothesis
• Conceptually clear
• Specificity
• Testability
• Availability of techniques
• Theoretical relevance
• Consistency
• Objectivity
• Simplicity
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Sources
• Theory : Goal of business (theory) Hypo: the rate of return on CE is an
index of business success; higher the EPS more favorable is the
financial leverage
• Observation: Ex: price & demand for a product
• Intuition & Personal experience
• Findings of Studies
• Continuity of research
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
One Tailed and Two Tailed Test
• One Tailed Test
The test is called one-sided (One-tailed) only if the null hypothesis gets
rejected when a value of the test statistics falls in one specific tail of the
distribution.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
One Tailed and Two Tailed Test
• Two Tailed Test
If the null hypothesis gets rejected when a value of test statistic falls in
either one or the other of the two tails of its sample distribution.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Formulation of Hypothesis
• Criteria to fulfil while formulating the hypothesis
• A hypothesis must be formulated in simple, clear and declarative form
• A broad hypothesis might not be empirically testable
• A hypothesis must be measurable and quantifiable so that the statistical
authenticity of the relationship can be established
• A hypothesis is a conjunctural statement based on the existing literature and
theories about the topic and not based on the gut feel or subjective
judgement of the researcher
• Validation of the hypothesis would necessarily involve testing the statistical
significance of the hypothesized relation.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Formulation of Hypothesis
• Null Hypothesis
• is a statement about a population parameter that is assumed to be true.
• Null hypotheses are formulated for testing statistical significance
• It is the presumption that is accepted as correct unless, there is strong evidence against it.
• It is a starting point. The researcher test whether the value stated in the null hypothesis is true.
Example: There is no relationship between families’ income level and expenditure on recreation
• Alternate Hypothesis
• Is not specific and is not directly tested.
• It is complementary to null hypothesis.
• It is accepted when null hypothesis (H0) is rejected.
Example: There is a relationship between families’ income level and expenditure on recreation
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Functions / Role of Hypothesis
• Guides the direction of study
• Gives an idea for setting order among facts
• Specifies sources of data
• Determines data needs
• Suggests type of research
• Determines the technique of analysis
• Helps in development of theories
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Errors in Hypothesis
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Errors in Hypothesis – Type 1 and Type 2
A Type I error means rejecting the null hypothesis when it’s actually true. It
means concluding that results are statistically significant when, in reality,
they came about purely by chance or because of unrelated factors.
A Type II error means not rejecting the null hypothesis when it’s actually
false. This is not quite the same as “accepting” the null hypothesis, because
hypothesis testing can only tell you whether to reject the null hypothesis
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Parametric and Non Parametric Test
Parametric Test Non-Parametric Test
• Parametric analysis to test group means
• Information about population is completely known
• Specific assumptions are made regarding the
population
• Applicable only for variable
• Samples are independent
• Assumed normal distributions
• Handles Interval data or Ratio data
• Results can be significantly affected by outliers
• Perform well when the spread of each group is
different, might not provide valid results if groups
have a same spread
• Have more statistical power
• Nonparametric analysis to test group medians
• No Information about the population is available
• No assumptions are made regarding population
• Applicable to both variable and attributes
• Not necessarily the samples are Independent
• No Assumed Shape / distribution
• Handles Ordinal data, Nominal (or Interval or Ratio),
ranked data
• Results cannot be seriously affected by outliers
• Perform well when the spread of each group is same,
might not provide valid results if groups have a
different spread
• It is not so powerful like parametric test
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Z Test
Formulas to remember – Testing of Hypotheses
Z Test
Hypothesis Test Statistics / Test Procedure Decision Rule
Test for equality of mean
Two Tail
Test
Ho : µ = µ0
Ho : µ ≠ µ0
𝑍𝐶𝑎𝑙 =
𝑥ҧ − µ
𝜎
𝑛
𝑥ҧ = Sample Mean
µ = Known value of population means
𝜎
𝑛
= Standard Deviation of µ
Two Tail Test
Reject H0 when
𝑍𝐶𝑎𝑙 ≤ −1.960
𝑍𝐶𝑎𝑙 ≥ 1.960 At 5% Level of Significance
𝑍𝐶𝑎𝑙 ≤ −2.58
𝑍𝐶𝑎𝑙 ≥ 2.58 At 1% Level of Significance
One Tail
Test
Upper Tailed Test
Ho : µ > µ0
H1 : µ < µ0
Lower Tailed Test
Ho : µ < µ0
H1 : µ > µ0
One Tail Test
- Upper tailed Z test (µ ≥ µ0)
Reject H0 when
𝑍𝐶𝑎𝑙 ≥ 1.645 𝑎𝑡 5% 𝐿𝑂𝑆
𝑍𝐶𝑎𝑙 ≥ 2.326 𝑎𝑡 1% 𝐿𝑂𝑆
- Lower tailed Z test (µ ≤ µ0)
Reject H0 when
𝑍𝐶𝑎𝑙 ≤ 1.645 𝑎𝑡 5% 𝐿𝑂𝑆
𝑍𝐶𝑎𝑙 ≤ 2.326 𝑎𝑡 1% 𝐿𝑂𝑆
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Z Test Hypothesis Test Statistics / Test Procedure Decision Rule
Test for Equality of two means
Two Tail
Test
Ho : µ1 = µ2
Ho : µ1 ≠ µ2
𝑍𝐶𝑎𝑙 =
𝑋
ത1 − 𝑋
ത2
𝜎1
2
𝑛1
+
𝜎2
2
𝑛2
𝑋
ത1, 𝑋
ത2 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑀𝑒𝑎𝑛 of I and II population
respectively
𝑛1 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒 𝑜𝑓 𝐼 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
𝑛2 = Sample Size of II Population
𝜎1, 𝜎2 = 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝐼 𝑎𝑛𝑑 𝐼𝐼
Two Tail Test
Reject H0 when
𝑍𝐶𝑎𝑙 ≤ −1.960
𝑍𝐶𝑎𝑙 ≥ 1.960 At 5% Level of Significance
𝑍𝐶𝑎𝑙 ≤ −2.58
𝑍𝐶𝑎𝑙 ≥ 2.58 At 1% Level of Significance
One Tail
Test
Upper Tailed Test
Ho : µ1 > µ2
H1 : µ1 < µ2
Lower Tailed Test
Ho : µ1 < µ2
H1 : µ1 > µ2
One Tail Test
- Upper tailed Z test (µ1 ≥ µ2)
Reject H0 when
𝑍𝐶𝑎𝑙 ≥ 1.645 𝑎𝑡 5% 𝐿𝑂𝑆
𝑍𝐶𝑎𝑙 ≥ 2.326 𝑎𝑡 1% 𝐿𝑂𝑆
- Lower tailed Z test (µ ≤ µ0)
Reject H0 when
𝑍𝐶𝑎𝑙 ≤ 1.645 𝑎𝑡 5% 𝐿𝑂𝑆
𝑍𝐶𝑎𝑙 ≤ 2.326 𝑎𝑡 1% 𝐿𝑂𝑆
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Z Test Z Test Hypothesis Test Statistics / Test Procedure Decision Rule
Test for Equality of Population
Two Tail
Test
Ho : P = 𝑃0
Ho : P ≠ 𝑃0
𝑍𝐶𝑎𝑙 =
𝑃 − 𝑃0
𝑃0𝑄0
𝑁
𝑃0 = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑃𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛
P = X/n Sample Proportion
𝑃0𝑄0
𝑁
=Standard Error of Sample population
Two Tail Test
Reject H0 when
𝑍𝐶𝑎𝑙 ≤ −1.960
𝑍𝐶𝑎𝑙 ≥ 1.960 At 5% Level of Significance
𝑍𝐶𝑎𝑙 ≤ −2.58
𝑍𝐶𝑎𝑙 ≥ 2.58 At 1% Level of Significance
One Tail
Test
Upper Tailed Test
Ho : P > 𝑃0
H1 : P < 𝑃0
Lower Tailed Test
Ho : P < 𝑃0
One Tail Test
- Upper tailed Z test (P≥ 𝑃0)
Reject H0 when
𝑍𝐶𝑎𝑙 ≥ 1.645 𝑎𝑡 5% 𝐿𝑂𝑆
𝑍𝐶𝑎𝑙 ≥ 2.326 𝑎𝑡 1% 𝐿𝑂𝑆
- Lower tailed Z test (P≤ 𝑃0)
Reject H0 when
𝑍𝐶𝑎𝑙 ≤ 1.645 𝑎𝑡 5% 𝐿𝑂𝑆
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Z Test
𝑍𝐶𝑎𝑙=
𝑃1 − 𝑃2
𝑃1𝑄1
𝑛1
+
𝑃2𝑄2
𝑛2
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
t Test
Hypothesis Test Statistics / Test Procedure Decision Rule
Test for equality of mean
Two Tail Test
Ho : µ = µ0
Ho : µ ≠ µ0
𝑡𝐶𝑎𝑙 =
𝑥ҧ − 𝜇0
𝑠
𝑛−1
𝑥ҧ = Sample Mean
µ = population mean
𝑠
𝑛−1
= Standard Deviation of 𝑆𝑎𝑚𝑝𝑙𝑒
𝑠 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
Reject H0 when
𝑡𝐶𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 > 𝑡𝑡𝑎𝑏𝑙𝑒 𝑣𝑎𝑙𝑢𝑒
(n-1) degrees of freedom
One Tail Test
Upper Tailed Test
Ho : µ > µ0
H1 : µ < µ0
Lower Tailed Test
Ho : µ < µ0
H1 : µ > µ0
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
t Test Hypothesis Test Statistics / Test Procedure Decision Rule
Test for Equality of two means (t-Test)
Two
Tail Test
Ho : µ1 = µ2
Ho : µ1 ≠ µ2
𝑡𝐶𝑎𝑙 =
𝑋
ത1 − 𝑋
ത2
𝑛1𝑠1+
2 𝑛2𝑠2
2
𝑛1+𝑛2−2
(
1
𝑛1
+
1
𝑛2
)
𝑋
ത1, 𝑋
ത2 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑀𝑒𝑎𝑛 of I and II population
respectively
𝑛1 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒 𝑜𝑓 𝐼 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
𝑛2 = Sample Size of II Population
Two Tail Test
Reject H0 when
𝑡𝐶𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 > 𝑡𝑡𝑎𝑏𝑙𝑒 𝑣𝑎𝑙𝑢𝑒
(𝑛1 + 𝑛2-1) degrees of freedom
One Tail
Test
Upper Tailed Test
Ho : µ1 > µ2
H1 : µ1 < µ2
Lower Tailed Test
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
t Test Hypothesis Test Statistics / Test Procedure Decision Rule
Paired Sample t-Test
Two Tail
Test
Ho : µ1 = µ2
Ho : µ1 ≠ µ2
𝑡𝐶𝑎𝑙 =
𝑑
ത
𝑆𝑑
𝑛−1
𝑑 = 𝑥 − 𝑦
𝑑ҧ=
σ 𝑑
𝑛
Sd= Standard Deviation of “d”
Two Tail Test
Reject H0 when
𝑡𝐶𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 > 𝑡𝑡𝑎𝑏𝑙𝑒 𝑣𝑎𝑙𝑢𝑒
(𝑛1-1) degrees of freedom
One Tail
Test
Upper Tailed Test
Ho : µ1 > µ2
H1 : µ1 < µ2
Lower Tailed Test
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
F Test
𝑭 𝒗𝒂𝒍𝒖e =
𝝈𝟏
𝟐
𝝈𝟐
𝟐
Where 𝜎2
=
σ 𝑥−𝜘2
𝑛−1
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Mann- Whitney U Test
𝑢1 = 𝑛1𝑛2 +
𝑛1 𝑛1 + 1
2
− 𝑅1
𝑢2= 𝑛1𝑛2 +
𝑛2 𝑛2+1
2
− 𝑅2
U = Min(𝑢1, 𝑢2
𝒁𝒄𝒂𝒍 =
𝒖−𝑬 𝒖
𝝈𝒖
Where
𝐸 𝑢 =
𝑛1𝑛2
2
𝜎 =
𝑛1𝑛2(𝑛1 + 𝑛2 − 1)
12
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
K-W Test
H =
𝟏𝟐
𝒏(𝒏+𝟏ሻ
σ
𝒓𝒊
𝟐
𝒏𝒊
– 3(n+1)
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Normality and Reliability of Hypothesis Testing
• The normality and reliability test will be done to ensure the hypothesis test is consistent and to know the
required matter is measured during the process.
• A normality test is used to determine whether sample data has been
drawn from a normally distributed population (within some
tolerance).
• Reliability is the extent to which the measure will give the same response under similar circumstances. In
other words, reliability shows a measure of consistency in measure the same phenomenon.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Methods to check the Reliability of Hypothesis
Testing
- Test-retest method
- Alternate or parallel forms
- Split-half techniques
- Kuder-Richardson Reliability and coefficient alpha
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Bivariate Analysis
- Bivariate analysis is slightly more analytical than Univariate analysis. When the data set contains two
variables and researchers aim to undertake comparisons between the two data set then Bivariate analysis is
the right type of analysis technique.
- For example – in a survey of a classroom, the researcher may be looking to analysis the ratio of students who
scored above 85% corresponding to their genders. In this case, there are two variables – gender = X
(independent variable) and result = Y (dependent variable).
- Linear regression
- Simple regression
- Correlation
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Multivariate Analysis
• Multivariate analysis is a more complex form of statistical analysis technique and used when there are more
than two variables in the data set. Here is an example –
• A doctor has collected data on cholesterol, blood pressure, and weight. She also collected data on the
eating habits of the subjects (e.g., how many ounces of red meat, fish, dairy products, and chocolate
consumed per week). She wants to investigate the relationship between the three measures of health and
eating habits?
• Factor Analysis
• Cluster Analysis
• Variance Analysis
• Discriminant Analysis
• Multidimensional Scaling
• Principal Component Analysis
• Redundancy Analysis
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
ANOVA – One Way
• A one-way ANOVA is a type of statistical test that compares the variance in the group means within a sample
while considering only one independent variable or factor.
• It is a hypothesis-based test, meaning that it aims to evaluate multiple mutually exclusive theories about our
data.
• A one-way ANOVA compares three or more than three categorical groups to establish whether there is a
difference between them. Within each group there should be three or more observations (here, this means
walruses), and the means of the samples are compared.
• In a one-way ANOVA there are two possible hypotheses.
- The null hypothesis (H0) is that there is no difference between the groups and equality between means.
(Walruses weigh the same in different months)
- The alternative hypothesis (H1) is that there is a difference between the means and groups. (Walruses have
different weights in different months)
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
ANOVA – One Way - Assumptions
- Normality – That each sample is taken from a normally distributed population
- Sample independence – that each sample has been drawn independently of the other samples
- Variance Equality – That the variance of data in the different groups should be the same
- Your dependent variable – here, “weight”, should be continuous – that is, measured on a scale which can
be subdivided using increments (i.e. grams, milligrams)
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
ANOVA – Two Way
• A two-way ANOVA is, like a one-way ANOVA, a hypothesis-based test. However, in the two-way ANOVA each
sample is defined in two ways, and resultingly put into two categorical groups.
• The two-way ANOVA therefore examines the effect of two factors (month and gender) on a dependent
variable – in this case weight, and also examines whether the two factors affect each other to influence the
continuous variable.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
ANOVA – Two Way - Assumptions
- Your dependent variable – here, “weight”, should be continuous – that is, measured on a scale which can be
subdivided using increments (i.e. grams, milligrams)
- Your two independent variables – here, “month” and “gender”, should be in categorical, independent
groups.
- Sample independence – that each sample has been drawn independently of the other samples
- Variance Equality – That the variance of data in the different groups should be the same
- Normality – That each sample is taken from a normally distributed population
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
One-Wayvs Two-Way ANOVA Differences Chart
One-Way ANOVA Two-Way ANOVA
Definition A test that allows one to make
comparisons between the means of three
or more groups of data.
A test that allows one to make comparisons
between the means of three or more groups
of data, where two independent variables
are considered.
Number of
Independent
Variables
One. Two.
What is Being
Compared?
The means of three or more groups of an
independent variable on a dependent
variable.
The effect of multiple groups of two
independent variables on a dependent
variable and on each other.
Number of Groups
of Samples
Three or more. Each variable should have multiple samples.
Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
Chi-Square Test and Analysis of
Variance
Vijay K S
Analysis of Variance (ANOVA)
• A key statistical test in research fields including biology, economics, and
psychology
• Analysis of Variance (ANOVA) is very useful for analyzing datasets.
• It allows comparisons to be made between three or more groups of data.
• In a given data set, one can observe two main variations. One due to chance
and the other due to some specific reasons.
• These variations are studied separately in ANOVA to identify the actual cause of
the variation and help the researcher to make effective decisions.
• Two types of ANOVA are commonly used, One-Way ANOVA and Two-Way
ANOVA.
Analysis of Variance
• ANOVA is an inferential statistics technique that allows you to
compare the mean level on one interval-ratio variable (such as
income) for each group relative to the others in a nominal variable
(such as degree).
• If you had only two groups to compare, ANOVA would give the same
answer as an independent samples t-test.
ANOVA
Isn’t it conceivable that the differences are due to natural random variability between samples? Would you
want to claim they are different in the population?
Marks scored by the students
Marks scored by the students
Just Imagine that the following distribution represents the distribution of marks scored by the students
belonging to a different section.
How do you interpret the data presentation?
Groups Broken Down
All Groups
ANOVA
Now…What if three sections had scores distributed like this in your sample?
Doesn’t it now appear that the groups may be different regardless of sampling variability? Would you
feel comfortable claiming the groups are different in the population?
All Groups Combined
Groups Separated Out
Marks scored by the students.
Marks scored by the students.
ANOVA
Conceptually, ANOVA compares the variance within groups to the overall variance
between all the groups to determine whether the groups appear distinct from each
other or if they look quite the same.
Different groups, different means.
Y-bar Y-bar Y-bar
Similar groups, similar means.
Y-bars
Categories
of Nominal
Variable
Measures on
Continuous
Variable
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 10 11 12 13 14 15 16
ANOVA
• When the groups have little variation within themselves, but large
variation between them, it would appear that they are distinct and
that their means are different.
Y-bar Y-bar Y-bar Y-bars
Different groups, different means. Similar groups, similar means.
ANOVA
• When the groups have a lot of variation within themselves, but little
variation between them, it would appear that they are similar and
that their means are not really different (perhaps they differ only
because of peculiarities of the particular sample).
Y-bar Y-bar Y-bar Y-bars
Similar groups, similar means.
Different groups, different means.
One – Way ANOVA
• One-way Analysis of Variance (ANOVA) is used to test whether the
means of two or more independent (Unrelated) groups are
statistically significantly different
• A Table of variation, ANOVA table represents as follows
Sources of Variance Sum of Squares (SS) Degree of
Freedom (d.f.)
Means of Square
(MS)
F Ratio
Between the sample Sum of Squares
between the sample
(SSB)
(K-1) MSB = SSB/(k-1) MSB(mean sum of squares
between)/MSW(mean sum of
Squares within)
F Ratio = MSB/MSW
Within the sample Sum of Squares
within the sample
(SSW)
(n-k) MSW = SSW/(n-k)
Total Total Sum of Squares (n-1)
Assumptions of One way ANOVA
• Normally distributed outcome
• Equal variances between the groups
• Groups are independent
Hypothesis of One way ANOVA

=
=
= 3
2
1
0 μ
μ
μ
:
H
same
the
are
means
population
the
of
all
Not
:
1
H
The process of carrying out one-way ANOVA
• Calculate the mean of each sample
• Calculate the mean of all sample means
• Calculate the variation between two samples, Known as SSB (Sum of
Squares between)
• Divide SSB with the degrees of freedom (d.f.) to get the mean of the square
between.
• The mean square between in the mean of variations in two samples
• Calculate the variation within the samples known as SSW(SS within)
• Divide SSW with the degrees of freedom (n-k) to get the mean square
within (MS within)
• Add the square of deviation to get the total variation in the sample
• Calculate the F Ratio
Problem
• The researcher observed the sale of products of a particular brand in
six big retail houses in three cities. He/she wants to determine
whether the mean sale is the same across the cities. Use the data
shown in the following table to calculate one-way ANOVA:
Retail Houses City A City B City C
1 3 6 9
2 8 9 8
3 4 8 6
4 9 5 7
5 6 7 5
6 7 4 7
Steps
Step 1: Defining the hypothesis
H0: There is no significant difference in sales between the three
cities / The sales in the three cities are the same.
Step 2: Calculate the mean sales of three cities separately, and the total
sample mean
Retail Houses City A City B City C
1 3 6 9
2 8 9 8
3 4 8 6
4 9 5 7
5 6 7 5
6 7 4 7
Mean 6.17 6.5 7
Mean of Samples 6.556666667
Steps
• Step 3: Calculate Sample Square Between
• Step 4: Calculate the sample Square WITHIN
Steps
• Step 5: Calculate the total Variance
• Step 6: Creating a ANOVA table
Sources of
Variance
Sum of
Squares
(SS)
Degree of
Freedom
(d.f.)
Means of
Square (MS)
F Ratio 5% F
Limit
Between the
sample
2.1 (3-1) = 2 MSB = 2.1/2
= 1.06
MSB(mean sum of squares
between)/MSW(mean sum of
Squares within)
F Ratio = MSB/MSW = 1.06/3.64 = .29
3.68
Within the
sample
54.34 (18-3) = 15 MSW = 54.34/15
= 3.64
Total 56.48 (18-1) = 17
Ho is accepted, and H1 is rejected. The value implies that the product’s sales are almost the same in the three cities
There is no significant difference in sales among these cities
The F Ratio value is < the critical/Table Value
Hence the null hypothesis is accepted
ANOVA in Excel
Steps
Sources of
Variance
Sum of
Squares
(SS)
Degree of
Freedom
(d.f.)
Means of
Square (MS)
F Ratio 5% F
Limit
Between the
sample
2.1 (3-1) = 2 MSB = 2.1/2
= 1.06
MSB(mean sum of squares
between)/MSW(mean sum of
Squares within)
F Ratio = MSB/MSW = 1.06/3.64 = .29
3.68
Within the
sample
54.34 (18-3) = 15 MSW = 54.34/15
= 3.64
Total 56.48 (18-1) = 17
Homework
• How much of the variance in height is explained by the treatment group?
Treatment 1 Treatment 2 Treatment 3 Treatment 4
60 inches 50 48 47
67 52 49 67
42 43 50 54
67 67 55 67
56 67 56 68
62 59 61 65
64 67 61 65
59 64 60 56
72 63 59 60
71 65 64 65
Two Way ANOVA
Two Way ANOVA – Steps Involved
• Step 1: Find the Correction term
• Step 2: Find the Sum of Squares of the total (SST)
• Step 3: Sum of Squares of Column
• Step 4: Sum of Squares of Rows
• Step 5: Find the Sum of the Square Residual
• Step 6: Creating ANOVA Table
Problem
• Three respondents have rates three small cars of different brands on
a five-point scale (5 being the highest) concerning their features. The
ratings and features are provided in the following table.
Respondents Mileage Durability Maintenance
Cost
Technology Price
1 Zen 3 2 4 3 5
I10 4 4 4 5 4
Alto 4 3 5 2 4
2 Zen 2 4 3 1 4
I10 4 5 3 4 4
Alto 3 1 2 5 3
3 Zen 4 5 3 2 4
I10 3 2 4 5 3
Alto 4 5 4 5 5
Steps
• Step 1: Find the correction term
• Step 2: Find the Sum of Squares of total (SST)
Total of All the observations 162
Square of total 162*162 = 26244
Number of Observations 45
Square of total/No. of
Observations 26244/45 = 583.2
Sum pf Squares of All the
individual 638
Correction term 583.2
SST 54.8
Here the researcher wants to know the difference between the brands in terms of features.
H0: There is no difference in the means of the five features of the cars.
Steps
• Step 3: Sum of Squares of Column (i.e. between the variables)
• Step 4: Sum of Squares of Row (i.e. between the Cars)
Sum of Colums 31 31 32 32 36
Sum of Squares of Column 961 961 1024 1024 1296
Sum of Squares of Column / Observation of
Column 585.2
Correction term 583.2
Sum of Squares between(SSB) 2
Sum of rows - 1 Respondents 56 Sum of Rows - 2 Respondent 48 Sum of Rows - 3 Respondent 58
Square of Row - 1 Respondent 3136 Square of Row - 2 Respondent 2304 Square of Row - 3 Respondent 3364
Sum of Squares of Rows 8804
Sum of Squares of Rows/ Observation 587
Correction term 583.2
Sum of Squares between Cars 3.8
Steps
• Step 5: Find the Sum of the Square Residual
• Step 6: Creating ANOVA Table
Sources of
Variance
Sum of
Squares
(SS)
Degree of
Freedom
(d.f.)
Means of
Square (MS)
F Ratio 5% F Limit
Between
Column
2 (5-1) = 4 2/4 = 0.5 0.5/6.125 = 0.08 F(4,8) = 3.84
Between Rows 3.8 (3-1) = 2 3.8/2 = 1.9 1.9/6.125 = 0.31 F(2,8) = 4.46
Residual 49 (5-1)(3-1) = 8 49/8 = 6.125
Total 56.48 (45-1) = 44
F Calculated value is less than the F Critical Value / Table Value; the Null hypothesis is accepted.
F Value lies in the acceptance region; hence H0 is accepted, and H1 is rejected.
So we can state that there is no difference in the means of the five features of the cars.
Parametric and
Nonparametric Tests
155
Parametric and Nonparametric Tests (cont.)
• The term "non-parametric" refers to the fact that the chi-square tests
do not require assumptions about population parameters, nor do
they test hypotheses about population parameters.
• Previous examples of hypothesis tests, such as the t tests and analysis
of variance, are parametric tests and they do include assumptions
about parameters and hypotheses about parameters.
• The most obvious difference between the chi-square tests and the
other hypothesis tests we have considered (t and ANOVA) is the
nature of the data.
• For chi-square, the data are frequencies rather than numerical scores.
Chi-Square Test
Chi-Square Test
• This statistical test is to compare the observed results with the
expected results.
• The purpose is to determine whether the difference is due to chance
or a relationship due to the relationship among the variables we are
studying.
• Chi-square enables us to understand and interpret the relationship
between two categorical variables.
• Chi-square test is denoted by the symbol χ2
• This test can be performed for the categorical data than the
numerical data
• The formula to calculate the ch-square test is
Applications of Chi-Square test
• To test the divergence of observed results from the expected results
when our expectations are based on the hypothesis of equal
probability
• Chi-square test is used to determine the degree of association
between the two variable.
O = Observed or actual values
E = Expected Value
Chi-Square Test
Chi-Square Test for
Goodness of Fit
Chi-Square Test for
Independence
Chi-Square Test for Goodness of Fit
• This test helps the researcher to know whether the theoretical
distribution is fitted to the observed data and to what extent.
• It allows you to draw conclusions about the distribution of a population
based on a sample. Using the chi-square goodness of fit test, you can
test whether the goodness of fit is “good enough” to conclude that the
population follows the distribution.
• Goodness-of-Fit is a statistical hypothesis test used to see how closely
observed data mirrors expected data.
Assumptions
• 1 or more categories
• Independent observations
• A sample size of at least 10
• Random sampling
• All observations must be used
• For the test to be accurate, the expected frequency should be at least 5
Chi-Square Test for Goodness of Fit - Problems
Test the hypothesis that the customers have no preference for any particular products. Use a 5% level of significance
Solution:
Step 1: Formulating the hypothesis:
Ho: The customers have no preference for any particular products
H1: Customers have a preference for a particular product
Step 2: Level of Significance, In the problem, it was given as 5%
The degrees of freedom (n-1) = (4-1) = 3
Step 3: Calculate χ2 Value
Solution:
Step 4: Compare the χ2
Value with the Critical value at 5% level of significance and 3 degrees of freedom
Here the critical value / Table value = 7.81
So Calculated chi-squared (27.2) is > than the chi-squared table (7.81), Hence rejecting the Null hypothesis
Product Number of Customers Preferred (O) Expected Value (E) (O-E) (O-E)^2 (O-E)^2/E
Product A 300 250 50 2500 10
Product B 280 250 30 900 3.6
Product C 220 250 -30 900 3.6
Product D 200 250 -50 2500 10
Total 1000 (O-E)^2/E 27.2
Average
(Expected
Value) 250
χ2
= 27.2
Step 3: Calculate χ2 Value
𝛴
Example 2:
The following table gives the number of defective items in a factory on
various days in a week.
Using the chi-square test checks whether the defective items are
uniformly distributed or not at 5% Level of significance
Days Number of defective Items
Monday 14
Tuesday 22
Wednesday 16
Thursday 18
Friday 12
Saturday 19
Sunday 11
Solution:
Step 1: Formulating the hypothesis:
Ho: The defective items are uniformly distributed across the days
H1: The defective items are not uniformly distributed across the days
Step 2: Level of Significance, In the problem, it was given as 5%
The degrees of freedom (n-1) = (7-1) = 6
Step 3: Calculate χ2 Value
Solution:
Step 4: Compare the χ2 Value with the Critical value at 5% level of significance and 6 degrees of freedom
Here the critical value / Table value = 12.59
So Calculated chi-squared (5.875) is < than the chi-squared table (12.59), Hence Accepting the Null hypothesis
Therefore, The defective items are uniformly distributed across the days.
χ2
= 5.875
Step 3: Calculate χ2
Value
Days Number of defective Items (O) Expected Value (E) (O-E) (O-E)^2 (O-E)^2/E
Monday 14 16 -2 4 0.25
Tuesday 22 16 6 36 2.25
Wednesday 16 16 0 0 0
Thursday 18 16 2 4 0.25
Friday 12 16 -4 16 1
Saturday 19 16 3 9 0.5625
Sunday 11 16 -5 25 1.5625
Total 112 (O-E)^2/E 5.875
Average 16
𝛴
Chi-Square Test for Independence
• Here, the two attributes/variables are tested to determine whether they are
associated.
• Example: Whether introducing a training program increases the efficiency of
employees. Intend to establish a relationship between training and the
efficiency of employees.
• It allows you to draw conclusions about a population based on a sample.
Specifically, it allows you to conclude whether two variables are related in
the population.
• can be used and interpreted in two different ways:
1. Testing hypotheses about the relationship between two variables in
a population, or
2. Testing hypotheses about differences between proportions for two
or more populations.
Chi-Square Test for Independence - Problems
Example 1: The researcher has the data for the preference of men and women
regarding joint and nuclear families, as shown in the table
The researcher wants to know whether the preference of men and women
about the type of family is the same or not at 5% Level of Significance
Joint Family Nuclear Family Total
Men 96 35 131
Women 170 360 530
Total 266 395 661
Solution:
Step 1: Formulating the hypothesis:
Ho: The opinion of men and women about the type of family is indifferent
H1: The opinion of men and women about the type of family is different
Step 2: Level of Significance, In the problem, it was given as 5%
The degrees of freedom (r-1)(c-1) = 1
Step 3: Calculate χ2 Value
Expected Value = Row Total * Column Total / Grand Total
Solution:
Step 4: Compare the χ2 Value with the Critical value at 5% level of significance and 1 degree of freedom
Here the critical value / Table value = 3.84
So Calculated chi-squared (74.17) is > than the chi-squared table (3.84), Hence rejecting the Null hypothesis.
Therefore, The opinion of men and women about the type of family is different.
χ2
= 74.17
Step 3: Calculate χ2
Value
Items Number of Preference
Expected Value
(E) (O-E) (O-E)^2 (O-E)^2/E
Men Towards Joint Family 96 52.72 43.28 1873.41 35.54
Women Towards Joint Family 170 213.28 -43.28 1873.41 8.78
Men towards nuclear family 35 78.28 -43.28 1873.41 23.93
Women towards nuclear family 360 316.72 43.28 1873.41 5.92
(O-E)^2/E 74.17
𝛴
Expected Value is calculated with this formula = E = (Row Total *Column total) / Grand Total
Example: Expected value for “men towards Joint Family” is calculated E=(131*266)/661 = 52.72
Statistics for Manager.pdf

More Related Content

Similar to Statistics for Manager.pdf

MS Lecture 1 introduction
MS Lecture 1 introductionMS Lecture 1 introduction
MS Lecture 1 introduction
Est
 
Assessment in mathematics
Assessment in mathematicsAssessment in mathematics
Assessment in mathematics
Carlo Magno
 

Similar to Statistics for Manager.pdf (20)

Advanced analytics proposal review guide
Advanced analytics proposal review guideAdvanced analytics proposal review guide
Advanced analytics proposal review guide
 
MORSS 2015: Optimizing Resource Informed Metrics
MORSS 2015: Optimizing Resource Informed MetricsMORSS 2015: Optimizing Resource Informed Metrics
MORSS 2015: Optimizing Resource Informed Metrics
 
Operations research
Operations researchOperations research
Operations research
 
MS Lecture 1 introduction
MS Lecture 1 introductionMS Lecture 1 introduction
MS Lecture 1 introduction
 
Me nature and scope
Me nature and scopeMe nature and scope
Me nature and scope
 
Introduction to Data Analysis for Nurse Researchers
Introduction to Data Analysis for Nurse ResearchersIntroduction to Data Analysis for Nurse Researchers
Introduction to Data Analysis for Nurse Researchers
 
pm-pa.ppt
pm-pa.pptpm-pa.ppt
pm-pa.ppt
 
All about dd forecasting
All about dd forecastingAll about dd forecasting
All about dd forecasting
 
MEASURES OF CENTRAL TENDENCIES_102909.pptx
MEASURES OF CENTRAL TENDENCIES_102909.pptxMEASURES OF CENTRAL TENDENCIES_102909.pptx
MEASURES OF CENTRAL TENDENCIES_102909.pptx
 
M & E Training guide
M & E Training guide M & E Training guide
M & E Training guide
 
Complete Introduction to Business Data Analysis
Complete Introduction to Business Data AnalysisComplete Introduction to Business Data Analysis
Complete Introduction to Business Data Analysis
 
Forecasting Methods
Forecasting MethodsForecasting Methods
Forecasting Methods
 
HR 202 Chapter 11
HR 202 Chapter 11HR 202 Chapter 11
HR 202 Chapter 11
 
lecture-8.pdf
lecture-8.pdflecture-8.pdf
lecture-8.pdf
 
Pp ch13
Pp ch13Pp ch13
Pp ch13
 
Bank Marketing Analysis: Data Analysis Project
Bank Marketing Analysis: Data Analysis ProjectBank Marketing Analysis: Data Analysis Project
Bank Marketing Analysis: Data Analysis Project
 
Hm 418 harris ch13 ppt
Hm 418 harris ch13 pptHm 418 harris ch13 ppt
Hm 418 harris ch13 ppt
 
SECTION VI - CHAPTER 39 - Descriptive Statistics basics
SECTION VI - CHAPTER 39 - Descriptive Statistics basicsSECTION VI - CHAPTER 39 - Descriptive Statistics basics
SECTION VI - CHAPTER 39 - Descriptive Statistics basics
 
Report reliably - Serious Social Investing 2013
Report reliably - Serious Social Investing 2013Report reliably - Serious Social Investing 2013
Report reliably - Serious Social Investing 2013
 
Assessment in mathematics
Assessment in mathematicsAssessment in mathematics
Assessment in mathematics
 

Recently uploaded

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 

Recently uploaded (20)

Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 

Statistics for Manager.pdf

  • 1. Business Statistics 22MBA14 Theory Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 2. Unit – 1 Measures of Central Tendency & Measures of Dispersion Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 3. Statistics: Meaning and Definition “ Statistics is the science of estimates and probabilities” “Statistics may be defined as the collection, presentation, analysis, and interpretation of numerical data” “Statistics is a science which deals with the method of collecting, classifying, presenting, comparing, analyzing, and interpreting numerical data. Collected to through light on enquiry”. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 4. Functions of Statistics • To collect and present facts in a systematic manner. • To help in the formulation and testing of the hypothesis. • To help in facilitating the comparison of data. • To help in predicting future trends. • To help to find the relationship between the variables. • Simplifies the mass of complex data. • To help to formulate policies. • To help governments to make decisions. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 5. Limitations of Statistics • Does not study the qualitative phenomenon • Does not deal with individual items • Statistical results are true only on an average. • Statistical data should be uniform and homogeneous • Statistical Results depend on the accuracy of the data • Statistical conclusions are not universally true. • Statistical results can be interpreted only if a person has sound knowledge of statistics Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 6. Collection and presentation of data • Data Collection • Primary data – Collected first time by the investigator, They are in the shape of raw materials • Secondary Data – Already collected data for a purpose other than the problem at hand. Primary Data Secondary Data Collection Purpose For the problem in hand For the other problems Collection Process Very involved Rapid and easy Collection Cost High Relatively low Collection Time Long Short Suitability Its suitability is positive It may or may not suit the object of the survey Originality It is original It is not original Precautions Not Extra precaution required to use the data It should be used with extra care Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 7. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 8. Measure of Central Tendency • Meaning: A measure of central tendency is a single value that describes the way in which a group of data cluster around a central value. To put in other words, it is a way to describe the centre of a data set. A measure of central tendency is a measure that tells us where the middle of a bunch of data lies. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 9. Application of Central Value: • Central tendency also allows you to compare one data set to another. • Central tendency is also useful when you want to compare one piece of data to the entire data set. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 10. Different Measures of Central Tendency: • Mean • Median • Mode • Geometric Mean • Harmonic Mean Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 11. Mean • Mean: Mean is the most common measure of central tendency. It is simply the sum of the numbers divided by the number of numbers in a set of data. This is also known as average. • The Arithmetic Mena is a good measure of central tendency Reasons: • It takes all the observation into account while calculating • It is can be used for further mathematical treatment Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 12. Mean • Mathematical characteristics of Arithmetic Mean • The sum of the deviations, of all the values of x, from their arithmetic mean, is zero. • Sum of square deviation taken from the AM is always least among such deviations taken from other measures of other tendency • Mean of the combined series: If we know the sizes and means of two component series, then we can find the mean of the resultant series obtained on combining the given series. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 13. Median • Median is another measure of Central Tendency which locates the middle most value in given set of data • Median is the measure of Central Tendency different from any of the means • Median is a single value from the data set that measures the central item in the data • Median is that value of the variable which divides the group in two equal parts, one part comprising of the values greater than and the other less than Median • This single item is the middlemost or most central item in the set of numbers. As said earlier half of the items lie above this point and the other half lie below it Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 14. Median Median M = L + 𝑁 2 −𝑀 ∗𝐶 𝑓 Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 15. Median Merits: • Rigidly defined • Easy to calculate for non-mathematical person • Since, it is a positional average, not affected by the extreme observations. Useful in the skewed distribution • Computed while dealing with open ended classes • Located by simple inspection and even graphically • This is the only average which will deal with qualitative characteristics Demerits: It doesn’t take all the observation into account while calculating average Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 16. Mode • Mode is one of the measure of central tendency that is different from the mean that somewhat like the median • The mode is the value that is repeated most often in the data set • The mode is defined as the highest or the most popular value in the given data • Mode is the value which occurs most frequently in a set of observations and around which the other items of the set clusters densely located • It is the value at the point around which the items tend to be most heavily concentrated. It is regarded as the most typical of a series of values • Mode is the value which has the greatest frequency density in its immediate neighbourhood • Mode is termed as the fashionable value of the distribution • Example: Average size of the shoe sold in a shop is 7 • Average Indian Male is 5 feet 6 inch Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 17. Mode Z = L + 𝑓−𝑓1 ∗𝐶 2𝑓−𝑓1−𝑓2 Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 18. Mode • Merits and Demerits • Merits: • Easy to calculate and understand; done by merely inspection process • Not affected by observations • Convenient for open ended class • Demerits: • Mode is not rigidly defined • Mode is not suitable for further mathematical treatment • Affected to a greater extent with the fluctuation of samplings Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 19. Empirical Relationship between Mean (ഥ 𝑿 ), Median (M) and Mode (Z) (Slightly Skewed) Z = 3M – 2 ഥ 𝑿 Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 20. Geometric Mean • GM is nth root of product of quantities of the series. It is observed by multiplying the values of items together and extracting the root of the product corresponding to the number of items. • Thus, square root of the products of two items and cube root of the products of the three items are the geometric mean • It is never larger than the arithmetic mean • If there are zeroes and negative numbers in the series, the geometric mean cannot be used. • Logarithms can be used to find the geometric mean to reduce large numbers and to save time • Appropriate in situations where, there is an average percentage rate of change over a period of time. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 21. Geometric Mean •GM = Antilog σ 𝑓 𝑙𝑜𝑔𝑥 𝑁 Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 22. Merits and demerits of GM • Merits of GM • It is based on all the observation in the series • It is rigidly defined • It is suited for averages and ratios • It is less affected by extreme values • It is useful for studying social and economic data • Demerits of GM • It is not simple to understand • It requires computational skill • It cannot be computed if any items are zero or negative • It has restricted applications Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 23. Harmonic Mean • It is the total number of items of a value, divided by the sum of reciprocal of values of a variable • It is a specified average which solves problems involving variables expressed in “Time rates” that vary according to time • Example: Speed in km/hr., min/day, Price/chapter • Harmonic mean (HM) is suitable only when time factor is a variable and the act being performed remains constant Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 24. Harmonic Mean HM = 𝑁 σ 1 𝑥 Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 25. Harmonic Mean • Merits of Harmonic Mean • It is based on all observation • It is rigidly defined • ‘Suitable in case of series having wide dispersion • It is suitable for further mathematical treatment • Demerits of Harmonic Mean • It is not easy to compute • Cannot be used when one of the items is zero • It cannot represent distribution Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 26. Relationship between AM, HM and GM The relationship between AM, GM, and HM can be represented by the formula AM x HM = GM2. The geometric mean (GM) equals the product of the arithmetic mean (AM) and the harmonic mean (HM) .
  • 27. Characteristics of Good Average • It should be easy calculate and simple to follow. • Average should represent the entire mass of data. • Averages are always capable of further algebraic treatment. • A good average should be an absolute number. • A good average is one which is not affected by skewness in the distribution. • It should not be unduly affected by extreme values. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 28. Pre-requisites of Good Measures of Central Tendency • It should be rigidly defined • It should be based on all observations • It should be easy to understand and calculate • It should have sampling stability • It should not be unduly affected by extreme observations Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 29. Measures of Dispersion • Meaning • Dispersion is the scattered ness of the data series around it average • Dispersion is the extent to which values in a distribution differ from the average of the distribution Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 30. Measures of Dispersion • Why Dispersion? • Determine the reliability of an average • Serve as a basis for the control of the variability • To compare the variability of two or more series and • Facilitate the use of other statistical measures. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 31. Measures of Dispersion • Characteristics of an Ideal Measure of Dispersion? • It should be rigidly defined. • It should be easy to understand and easy to calculate. • It should be based on all the observations of the data. • It should be easily subjected to further mathematical treatment. • It should be least affected by the sampling fluctuation . • It should not be unduly affected by the extreme values. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 32. Measures of Dispersion • Different Measures of Dispersion • The range • The inter quartile range and quartile deviation • Percentile • Decile • The mean deviation or average deviation • The standard deviation Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 33. The Range Range is the crude measure of dispersion; Calculated as R = High – Low R = H – L Its relative measure is called co-efficient of Range R = 𝐻−𝐿 𝐻+𝐿 Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 34. Quartile Deviation Quartile Deviation - QD = Q3 – Q1 (Inter quartile range) QD = = 𝑄3−𝑄1 2 (Semi - Inter quartile range) Co-efficient of QD = = 𝑄3−𝑄1 𝑄3+𝑄1 Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 35. Mean Deviation Mean Deviation = σ f I X−A I 𝑛 or σ f I d I 𝑛 Relative Measure of Mean Deviation Co-efficient of Mean Deviation = Mean Deviation Average about which it is calculated Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 36. Standard Deviation 𝜎 = σ ሻ 𝑓(𝑥 2 𝑁 − ( ቇ σ 𝑓𝑋 𝑁 2 Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 37. Standard Deviation Characteristics of Standard Deviation: - SD is very satisfactory and most widely used measure of dispersion - Amenable for mathematical manipulation - It is independent of origin, but not of scale - If SD is small, there is a high probability for getting a value close to the mean and if it is large, the value is father away from the mean Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 38. Co-efficient of Variance If Co-efficient of Variance for a given data is more, the data is said to be less consistent, the other hand if C.V is less it means that variability in the data is less and more consistentfrom the mean Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 39. Unit – 2 Correlation and Regression Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 40. Correlation and Correlation Co-efficient: • Correlation • Correlation is a statistical measure that indicates the extent to which two or more variables fluctuate together. • A positive correlation indicates the extent to which those variables increase or decrease in parallel; • A negative correlation indicates the extent to which one variable increases as the other decreases. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 41. Correlation and Correlation Co-efficient: • Correlation Co-efficient • A correlation coefficient is a statistical measure of the degree to which changes to the value of one variable predict change to the value of another. • When the fluctuation of one variable reliably predicts a similar fluctuation in another variable, there’s often a tendency to think that means that the change in one causes the change in the other. • However, correlation does not imply causation. There may be, for example, an unknown factor that influences both variables similarly. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 42. Correlation Co-efficient Value: r lies between -1 and +1 • If r lies between 0 to 1, that means positive correlation exists • If r is exactly 1, the correlation is perfect positive correlation • If r lies between -1 to 0, that means negative correlation exists • If r is -1, that implies perfect negative correlation Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 43. Correlation Analysis • Correlation Analysis • Correlation analysis is a method of statistical evaluation used to study the strength of a relationship between two, numerically measured, continuous variables (e.g. height and weight). • Applications of Correlation: • The most valuable use of a correlation is in predicting the future of a business direction. • Correlation is used to assess the direction of change • It is used to measure the performance measures and for data mining Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 44. Different types of Correlation • Positive Correlation • Positive correlation occurs when an increase in one variable increases the value in another. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 45. Different types of Correlation • Negative Correlation • Negative correlation occurs when an increase in one variable decreases the value of another. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 46. Different types of Correlation • No Correlation • No correlation occurs when there is no linear dependency between the variables. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 47. Different types of Correlation • Perfect Positive Correlation • Perfect correlation occurs when there is a functional dependency between the variables. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 48. Different types of Correlation • High degree of Positive Correlation • A correlation is stronger the closer the points are located to one another on the line. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 49. Different types of Correlation • Low degree of Positive Correlation • A correlation is weaker the farther apart the points are located to one another on the line. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 50. Different Methods of Studying Correlation Analysis • Scatter diagram method • Karl Pearson’s Co-efficient of Correlation (Covariance method) • Two way frequency table (Bivariate correlation method) • Ranks method or Spearman’s Rank Correlation • Concurrent Deviation Method Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 51. Different Methods of Studying Correlation Analysis • Scatter diagram method • It is one of the simplest way or method of diagrammatic representation of a bivariate distribution and provides us one of the simplest tool of ascertaining the correlation between two variables • The “n” points are plotted as dots of two variables (Examples heights and weight). The diagram of dots so obtained is known as “Scatter Diagram” • From the scatter diagram, we can form a fairly good, tough rough idea about the relationship between the two variables. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 52. Different Methods of Studying Correlation Analysis • Scatter diagram Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 53. Different Methods of Studying Correlation Analysis • Karl Pearson’s Co-efficient of Correlation r = 𝑛 σ 𝑥𝑦− σ 𝑥 . σ 𝑦 𝑛.σ 𝑥2−(σ 𝑥ሻ 2 ∗𝑛.σ 𝑦2−(σ 𝑦ሻ 2 Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 54. Different Methods of Studying Correlation Analysis • Ranks method or Spearman’s Rank Correlation 𝜌 = 1 − 6 σ 𝐷2 𝑛3−𝑛 𝜌 = 1 − 6 (σ 𝐷2+𝐶𝐹ሻ 𝑛3−𝑛 Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 55. Regression Analysis • Regression Analysis • The regression analysis is a statistical process for estimating the relationships among variables. Regression is the attempt to explain the variation in a dependent variable using the variation in independent variables. • Uses or Application of Regression: • The most common use of regression in business is to predict events that have yet to occur. Demand analysis, for example, predicts how many units consumers will purchase. • Another key use of regression models is the optimization of business processes. A factory manager might, for example, build a model to understand the relationship between oven temperature and the shelf life of the cookies baked in those ovens. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 56. Regression Equation • X on Y (x- ҧ 𝑥) = bxy (y-ത 𝑦) • Y on X (y-ത 𝑦) = byx (x- ҧ 𝑥) Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 57. Regression Coefficients bxy = 𝑛 σ 𝑥𝑦−(σ 𝑥ሻ (σ 𝑦ሻ 𝑛 σ 𝑦2− (σ 𝑦ሻ 2 or bxy = r . 𝜎𝑥 𝜎𝑦 byx = 𝑛 σ 𝑥𝑦−(σ 𝑥ሻ (σ 𝑦ሻ 𝑛 σ 𝑥2− (σ 𝑥ሻ 2 or byx = r . 𝜎𝑦 𝜎𝑥 Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 58. Simple and Multiple regression • Simple regression: • The linear regression model used to describe the relationship between a dependent variable y and an independent variable x is given by y=a+bx • Multiple regression • Multiple regression is a statistical technique that can be used to analyze the relationship between a single dependent variable and several independent variables. The objective of multiple regression analysis is to use the independent variables whose values are known to predict the value of the single dependent value. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 59. Unit – 3 Probability Distribution Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 60. Important terminologies • Experiment • Trail • Event • Mutually Exclusive Event • Dependent and independent event • Equally Likely event • Simple and Compound events • Exhaustive events • Complementary events Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 61. Important terminologies • Experiment The term experiment refer to describe an act which can be repeated under same given conditions. Random experiment: An experiment is called random experiment if when conducted repeatedly under essentially homogeneous conditions, the result is not unique or results is not certain but may be any one of the various possible outcomes. Or An Experiment having random outcomes Or Experiments whose results are depends on chance Example: Tossing a coin, rolling a dice Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 62. Important terminologies • Trail Performing of a random experiment is called a trial Example: Tossing experiment of a coin has done two times, that means two trials Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 63. Important terminologies • Event Outcome or combination of outcomes of an experiment are termed as events Example: Tossing a coin – You may get H or T – These are events Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 64. Important terminologies • Mutually Exclusive Events: Two events are said to be mutually exclusive or incompatible, when both cannot happen simultaneously in a single trial or in other words, the occurrence of any one of them avoid the occurrence of the other. In other words “if happening of one event prevents the happening of the other events such events we call it as mutually exclusive events” Example: Tossing a coin leads to two events Head (H) or Trail (T) If head turns up in tossing a coin, then head prevents tail to turn-up and vice-versa Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 65. Important terminologies • Independent and Dependent Event Two or more events are said to be independent when the outcome of one doesn’t affect, and is not affected by the other. Example: Tossing of coin twice, happening of head during the first trail will not affect the happening of other in the next trial The occurrence and non-occurrence of one event in any one trial affect the probability of other event in other trial Example: Drawing a card without replacement. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 66. Important terminologies • Equally Likely Events are said to be equally likely when one doesn’t occur more often than the others. This means none of them is expected to occur in preference of other. In other words – equal chance of occurrence and importance for all the events to occur Example: When you roll a dice, occurrence of all the 6 faces i.e. 1, 2, 3, 4, 5, 6 are equally likely Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 67. Important terminologies • Simple and Compound Events In case of simple events we consider the probability of the happening or not happening of single events Compound events, we consider the joint occurrence of two or more events Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 68. Important terminologies • Exhaustive Events Events are said to be exhaustive when their totality includes all the possible outcomes of a random experiment. In other words, if the sum of individual chance of occurrence is equal to 1 Example1 : Rolling dice, once the possible outcomes are 1, 2, 3, 4, 5 and 6, hence the exhaustive number of cases is 6 Example 2: If we roll two dice once the exhaustive number of cases is 62 = 36 Similarly for rolling of three dice leads to 216 outcomes and summation of possibilities or probability of occurrence of all these events is 1 Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 69. Important terminologies • Complementary events Let there be two events A and B, A is called the complementary event of B (and Vice versa), if A and B are mutually exclusive and exhaustive. Example: When the dice is thrown, the occurrence of an even number and odd number are complementary events. Simultaneous occurrence of two events A and B is generally written as AB Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 70. Definition of Mathematical Probability • If there be a random experiment with “N” outcomes which are mutually exclusive, exhaustive and equally likely • Let there be an event “A”, Let “M” outcome occur for the event “A” (Favourable outcomes), then the probability of occurrence of “A” can be written as follows P (A) = 𝑚 𝑁 = 𝑂𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝐹𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒 𝑡𝑜 "𝐴" 𝑇𝑜𝑡𝑎𝑙 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑂𝑢𝑡𝑐𝑜𝑚𝑒𝑠 Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 71. Theorems of Probability or Rules of Probability The two important theorems of probability • The addition theorem • The multiplication theorem Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 72. Theorems of Probability or Rules of Probability The two important theorems of probability • The addition theorem P(A or B) = P(A U B) = P(A) + P(B) – Mutually Exclusive events P(A or B) = P(A U B) = P(A) + P(B) – P(A ⊓ B) – Events overlap Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 73. Theorems of Probability or Rules of Probability The two important theorems of probability • The Multiplication theorem P(A and B) = P(A) × P(B) - Independent events • P(A ⊓ B) = P(A) × P(B/A) ; P(A) ≠ 0 • P(B ⊓ A) = P (B) × P(A/B) ; P(B) ≠ 0 Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 74. Bayes Theorem of Probability •P(A / B) = 𝑃 (𝐴 ∩ 𝐵ሻ 𝑃 (𝐵ሻ Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 75. Random Variable Random means “Unpredictable” A random variable x is a variable whose possible values are numerical outcomes of a random phenomenon. There are two types of a random variable - Discreate Random variable - Continuous random variable Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 76. Discrete and Continuous Random Variable • Discrete Random Variable • Continuous Random Variable Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 77. Theoretical Probability Distribution •Binomial Distribution •Poisson Distribution •Normal Distribution Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 78. Theoretical Probability Distribution •Binomial Distribution • It is also known as “Bernoulli Distribution”, Probability distribution expressing the probability of one set of dichotomous alternatives i.e. success or failur • Bernoulli trail: A trail having only two outcomes Example: Tossing a coin: H or T Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 79. Theoretical Probability Distribution • Binomial Distribution • Binomial Probability Distribution Let “x” be a random variable for a binomial variable with “n” trail and P(Success) = p, then probability of “x” number of success is given by P(x) = n𝐶𝑥 . 𝑝𝑥 . 𝑞𝑛−𝑥 • Where x = Number of success in “n” trail n = Number of trail p = probability of success in a single trail q = (1-p) = (1-Success) Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 80. Theoretical Probability Distribution •Binomial Distribution • Constants of Binomial Distribution • Mean = np • Variance = npq • Standard Deviation = 𝑛𝑝𝑞 • Parameters of Binomial Distribution n, p, q Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 81. Theoretical Probability Distribution •Poisson Distribution • Poisson distribution may be expected in cases where the chance of any individual event being a success is small. • The distribution is used to describe the behaviour of rare events such as the number of accidents on road, Number of printing mistakes in books. • It has been called “the Law of Impossible Events” • P(x) = 𝑒−𝜆 𝜆𝑥 𝑥! Where x= 1, 2, 3, 4… 𝜆 = Parameters of the Poisson distribution Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 82. Theoretical Probability Distribution •Poisson Distribution • Constants of Poisson Distribution • The mean of Poisson distribution = 𝜆 • The standard deviation = 𝜆 Parameters Poisson Distribution − 𝜆 Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 83. Theoretical Probability Distribution • Normal Distribution • Normal curve is “bell shaped” and symmetrical in it appearance • The height of the normal curve is at its maximum at the mean. Hence the mean and mode of the normal distribution coincide. Thus for a Normal Distribution Mean, Median and Mode are all equal. • There is one maximum point of the normal curve which occurs at the mean • Since there is only one maximum point, the normal curve is uni-modal i.e. it has only one Mode • As dissatisfied from Binomial and Poisson distribution where the variable is discrete. The variable distributed according to the normal curve is continuous. • The first and third quartile are equidistance from the Median • The area under the normal curve distributed as follows • Mean ± 1𝜎 covers 68.27% area and 34.135 % area will lie on either side of the Mean • Mean ± 2𝜎 covers 95.45% area • Mean ± 3𝜎 Covers 99.73% area Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 84. Theoretical Probability Distribution •Normal Distribution Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 85. Unit – 4 Time Series Analysis Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 86. Objective of Time Series Analysis • The assumption underlying time series analysis is that the time series data behaves the same in the future as that in the past. Time series analysis is used to detect the pattern underlying data, isolate the influencing factors which in turn used to estimate the future accurately. Thus, the time series data helps us to cope with the uncertainty about the future. • To review and evaluate the progress made in the plans are based on the time series data. For example, Finance Minstry of Govt. of India (GOI) reviewing the gross domestic product ) GDP of the economy during the financial year and chalking out the strategies to further the growth. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 87. Variations/Components in Time Series Analysis • In typical time-series there are three main components which seem to be independent of one another and seems to be influencing time-series data. • An important step in analysing time series is to consider the types of data patterns. A time series data can contain some or all of the following elements. They are: • Trend (T) • Cyclical (C) • Seasonal (S) • Irregular (I) Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 88. Variations/Components in Time Series Analysis • Trend (T) Trend (T) : The trend is the long term pattern of a time series. A trend can be positive or negative depending on whether, the time series exhibits an increasing long term pattern or a decreasing long term pattern. The rate of trend growth usually varies over time. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 89. Variations/Components in Time Series Analysis • Cyclical(T) • Cyclical (C) : Time series data may show up and down movement around a given trend. For example, business cycle over the years show upward trend and touches its peak and then it may show slump and hits the bottom. The pattern repeats but not a regular interval of time. The duration of a cycle depends on the type of business or industry. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 90. Variations/Components in Time Series Analysis • Seasonal • Seasonal (S): It is a special case of a cycle component of time series in which the magnitude and duration of the cycle do not vary but happen at a regular interval each year. Seasonality occurs when the time series exhibits regular variation during the same periods (Month, Year or same quarter every year) Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 91. Variations/Components in Time Series Analysis • Irregular • Irregular or Random: This type of variation is unpredictable. This is caused by short term unanticipated and non-recurring factors. These follows np specific pattern Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 92. Methods of Evaluation the Trend • These are also called the forecasting methods of Time Series Analysis Some of them are • Freehand Method • Moving Average Method • Semi-average Method • Least-Square Method Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 93. Methods of Estimating Seasonal Index • Method of Simple Averages • Ratio to trend method • Ratio to moving average method Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 94. Unit – 5 Hypothesis Testing Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 95. Hypothesis • A Hypothesis is an assumption or a statement that may or may not be true • It is tested base on the data / information obtained from a sample • It is used to make decisions related to business Example: 1. Whether a new drug is more effective than the new drug 2. Whether the proportion of smokers in a class is different from 0.30 Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 96. Characteristics of a good hypothesis • Conceptually clear • Specificity • Testability • Availability of techniques • Theoretical relevance • Consistency • Objectivity • Simplicity Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 97. Sources • Theory : Goal of business (theory) Hypo: the rate of return on CE is an index of business success; higher the EPS more favorable is the financial leverage • Observation: Ex: price & demand for a product • Intuition & Personal experience • Findings of Studies • Continuity of research Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 98. One Tailed and Two Tailed Test • One Tailed Test The test is called one-sided (One-tailed) only if the null hypothesis gets rejected when a value of the test statistics falls in one specific tail of the distribution. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 99. One Tailed and Two Tailed Test • Two Tailed Test If the null hypothesis gets rejected when a value of test statistic falls in either one or the other of the two tails of its sample distribution. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 100. Formulation of Hypothesis • Criteria to fulfil while formulating the hypothesis • A hypothesis must be formulated in simple, clear and declarative form • A broad hypothesis might not be empirically testable • A hypothesis must be measurable and quantifiable so that the statistical authenticity of the relationship can be established • A hypothesis is a conjunctural statement based on the existing literature and theories about the topic and not based on the gut feel or subjective judgement of the researcher • Validation of the hypothesis would necessarily involve testing the statistical significance of the hypothesized relation. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 101. Formulation of Hypothesis • Null Hypothesis • is a statement about a population parameter that is assumed to be true. • Null hypotheses are formulated for testing statistical significance • It is the presumption that is accepted as correct unless, there is strong evidence against it. • It is a starting point. The researcher test whether the value stated in the null hypothesis is true. Example: There is no relationship between families’ income level and expenditure on recreation • Alternate Hypothesis • Is not specific and is not directly tested. • It is complementary to null hypothesis. • It is accepted when null hypothesis (H0) is rejected. Example: There is a relationship between families’ income level and expenditure on recreation Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 102. Functions / Role of Hypothesis • Guides the direction of study • Gives an idea for setting order among facts • Specifies sources of data • Determines data needs • Suggests type of research • Determines the technique of analysis • Helps in development of theories Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 103. Errors in Hypothesis Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 104. Errors in Hypothesis – Type 1 and Type 2 A Type I error means rejecting the null hypothesis when it’s actually true. It means concluding that results are statistically significant when, in reality, they came about purely by chance or because of unrelated factors. A Type II error means not rejecting the null hypothesis when it’s actually false. This is not quite the same as “accepting” the null hypothesis, because hypothesis testing can only tell you whether to reject the null hypothesis Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 105. Parametric and Non Parametric Test Parametric Test Non-Parametric Test • Parametric analysis to test group means • Information about population is completely known • Specific assumptions are made regarding the population • Applicable only for variable • Samples are independent • Assumed normal distributions • Handles Interval data or Ratio data • Results can be significantly affected by outliers • Perform well when the spread of each group is different, might not provide valid results if groups have a same spread • Have more statistical power • Nonparametric analysis to test group medians • No Information about the population is available • No assumptions are made regarding population • Applicable to both variable and attributes • Not necessarily the samples are Independent • No Assumed Shape / distribution • Handles Ordinal data, Nominal (or Interval or Ratio), ranked data • Results cannot be seriously affected by outliers • Perform well when the spread of each group is same, might not provide valid results if groups have a different spread • It is not so powerful like parametric test Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 106. Z Test Formulas to remember – Testing of Hypotheses Z Test Hypothesis Test Statistics / Test Procedure Decision Rule Test for equality of mean Two Tail Test Ho : µ = µ0 Ho : µ ≠ µ0 𝑍𝐶𝑎𝑙 = 𝑥ҧ − µ 𝜎 𝑛 𝑥ҧ = Sample Mean µ = Known value of population means 𝜎 𝑛 = Standard Deviation of µ Two Tail Test Reject H0 when 𝑍𝐶𝑎𝑙 ≤ −1.960 𝑍𝐶𝑎𝑙 ≥ 1.960 At 5% Level of Significance 𝑍𝐶𝑎𝑙 ≤ −2.58 𝑍𝐶𝑎𝑙 ≥ 2.58 At 1% Level of Significance One Tail Test Upper Tailed Test Ho : µ > µ0 H1 : µ < µ0 Lower Tailed Test Ho : µ < µ0 H1 : µ > µ0 One Tail Test - Upper tailed Z test (µ ≥ µ0) Reject H0 when 𝑍𝐶𝑎𝑙 ≥ 1.645 𝑎𝑡 5% 𝐿𝑂𝑆 𝑍𝐶𝑎𝑙 ≥ 2.326 𝑎𝑡 1% 𝐿𝑂𝑆 - Lower tailed Z test (µ ≤ µ0) Reject H0 when 𝑍𝐶𝑎𝑙 ≤ 1.645 𝑎𝑡 5% 𝐿𝑂𝑆 𝑍𝐶𝑎𝑙 ≤ 2.326 𝑎𝑡 1% 𝐿𝑂𝑆 Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 107. Z Test Hypothesis Test Statistics / Test Procedure Decision Rule Test for Equality of two means Two Tail Test Ho : µ1 = µ2 Ho : µ1 ≠ µ2 𝑍𝐶𝑎𝑙 = 𝑋 ത1 − 𝑋 ത2 𝜎1 2 𝑛1 + 𝜎2 2 𝑛2 𝑋 ത1, 𝑋 ത2 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑀𝑒𝑎𝑛 of I and II population respectively 𝑛1 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒 𝑜𝑓 𝐼 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑛2 = Sample Size of II Population 𝜎1, 𝜎2 = 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝐼 𝑎𝑛𝑑 𝐼𝐼 Two Tail Test Reject H0 when 𝑍𝐶𝑎𝑙 ≤ −1.960 𝑍𝐶𝑎𝑙 ≥ 1.960 At 5% Level of Significance 𝑍𝐶𝑎𝑙 ≤ −2.58 𝑍𝐶𝑎𝑙 ≥ 2.58 At 1% Level of Significance One Tail Test Upper Tailed Test Ho : µ1 > µ2 H1 : µ1 < µ2 Lower Tailed Test Ho : µ1 < µ2 H1 : µ1 > µ2 One Tail Test - Upper tailed Z test (µ1 ≥ µ2) Reject H0 when 𝑍𝐶𝑎𝑙 ≥ 1.645 𝑎𝑡 5% 𝐿𝑂𝑆 𝑍𝐶𝑎𝑙 ≥ 2.326 𝑎𝑡 1% 𝐿𝑂𝑆 - Lower tailed Z test (µ ≤ µ0) Reject H0 when 𝑍𝐶𝑎𝑙 ≤ 1.645 𝑎𝑡 5% 𝐿𝑂𝑆 𝑍𝐶𝑎𝑙 ≤ 2.326 𝑎𝑡 1% 𝐿𝑂𝑆 Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 108. Z Test Z Test Hypothesis Test Statistics / Test Procedure Decision Rule Test for Equality of Population Two Tail Test Ho : P = 𝑃0 Ho : P ≠ 𝑃0 𝑍𝐶𝑎𝑙 = 𝑃 − 𝑃0 𝑃0𝑄0 𝑁 𝑃0 = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑃𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛 P = X/n Sample Proportion 𝑃0𝑄0 𝑁 =Standard Error of Sample population Two Tail Test Reject H0 when 𝑍𝐶𝑎𝑙 ≤ −1.960 𝑍𝐶𝑎𝑙 ≥ 1.960 At 5% Level of Significance 𝑍𝐶𝑎𝑙 ≤ −2.58 𝑍𝐶𝑎𝑙 ≥ 2.58 At 1% Level of Significance One Tail Test Upper Tailed Test Ho : P > 𝑃0 H1 : P < 𝑃0 Lower Tailed Test Ho : P < 𝑃0 One Tail Test - Upper tailed Z test (P≥ 𝑃0) Reject H0 when 𝑍𝐶𝑎𝑙 ≥ 1.645 𝑎𝑡 5% 𝐿𝑂𝑆 𝑍𝐶𝑎𝑙 ≥ 2.326 𝑎𝑡 1% 𝐿𝑂𝑆 - Lower tailed Z test (P≤ 𝑃0) Reject H0 when 𝑍𝐶𝑎𝑙 ≤ 1.645 𝑎𝑡 5% 𝐿𝑂𝑆 Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 109. Z Test 𝑍𝐶𝑎𝑙= 𝑃1 − 𝑃2 𝑃1𝑄1 𝑛1 + 𝑃2𝑄2 𝑛2 Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 110. t Test Hypothesis Test Statistics / Test Procedure Decision Rule Test for equality of mean Two Tail Test Ho : µ = µ0 Ho : µ ≠ µ0 𝑡𝐶𝑎𝑙 = 𝑥ҧ − 𝜇0 𝑠 𝑛−1 𝑥ҧ = Sample Mean µ = population mean 𝑠 𝑛−1 = Standard Deviation of 𝑆𝑎𝑚𝑝𝑙𝑒 𝑠 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 Reject H0 when 𝑡𝐶𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 > 𝑡𝑡𝑎𝑏𝑙𝑒 𝑣𝑎𝑙𝑢𝑒 (n-1) degrees of freedom One Tail Test Upper Tailed Test Ho : µ > µ0 H1 : µ < µ0 Lower Tailed Test Ho : µ < µ0 H1 : µ > µ0 Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 111. t Test Hypothesis Test Statistics / Test Procedure Decision Rule Test for Equality of two means (t-Test) Two Tail Test Ho : µ1 = µ2 Ho : µ1 ≠ µ2 𝑡𝐶𝑎𝑙 = 𝑋 ത1 − 𝑋 ത2 𝑛1𝑠1+ 2 𝑛2𝑠2 2 𝑛1+𝑛2−2 ( 1 𝑛1 + 1 𝑛2 ) 𝑋 ത1, 𝑋 ത2 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑀𝑒𝑎𝑛 of I and II population respectively 𝑛1 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒 𝑜𝑓 𝐼 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑛2 = Sample Size of II Population Two Tail Test Reject H0 when 𝑡𝐶𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 > 𝑡𝑡𝑎𝑏𝑙𝑒 𝑣𝑎𝑙𝑢𝑒 (𝑛1 + 𝑛2-1) degrees of freedom One Tail Test Upper Tailed Test Ho : µ1 > µ2 H1 : µ1 < µ2 Lower Tailed Test Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 112. t Test Hypothesis Test Statistics / Test Procedure Decision Rule Paired Sample t-Test Two Tail Test Ho : µ1 = µ2 Ho : µ1 ≠ µ2 𝑡𝐶𝑎𝑙 = 𝑑 ത 𝑆𝑑 𝑛−1 𝑑 = 𝑥 − 𝑦 𝑑ҧ= σ 𝑑 𝑛 Sd= Standard Deviation of “d” Two Tail Test Reject H0 when 𝑡𝐶𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 > 𝑡𝑡𝑎𝑏𝑙𝑒 𝑣𝑎𝑙𝑢𝑒 (𝑛1-1) degrees of freedom One Tail Test Upper Tailed Test Ho : µ1 > µ2 H1 : µ1 < µ2 Lower Tailed Test Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 113. F Test 𝑭 𝒗𝒂𝒍𝒖e = 𝝈𝟏 𝟐 𝝈𝟐 𝟐 Where 𝜎2 = σ 𝑥−𝜘2 𝑛−1 Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 114. Mann- Whitney U Test 𝑢1 = 𝑛1𝑛2 + 𝑛1 𝑛1 + 1 2 − 𝑅1 𝑢2= 𝑛1𝑛2 + 𝑛2 𝑛2+1 2 − 𝑅2 U = Min(𝑢1, 𝑢2 𝒁𝒄𝒂𝒍 = 𝒖−𝑬 𝒖 𝝈𝒖 Where 𝐸 𝑢 = 𝑛1𝑛2 2 𝜎 = 𝑛1𝑛2(𝑛1 + 𝑛2 − 1) 12 Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 115. K-W Test H = 𝟏𝟐 𝒏(𝒏+𝟏ሻ σ 𝒓𝒊 𝟐 𝒏𝒊 – 3(n+1) Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 116. Normality and Reliability of Hypothesis Testing • The normality and reliability test will be done to ensure the hypothesis test is consistent and to know the required matter is measured during the process. • A normality test is used to determine whether sample data has been drawn from a normally distributed population (within some tolerance). • Reliability is the extent to which the measure will give the same response under similar circumstances. In other words, reliability shows a measure of consistency in measure the same phenomenon. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 117. Methods to check the Reliability of Hypothesis Testing - Test-retest method - Alternate or parallel forms - Split-half techniques - Kuder-Richardson Reliability and coefficient alpha Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 118. Bivariate Analysis - Bivariate analysis is slightly more analytical than Univariate analysis. When the data set contains two variables and researchers aim to undertake comparisons between the two data set then Bivariate analysis is the right type of analysis technique. - For example – in a survey of a classroom, the researcher may be looking to analysis the ratio of students who scored above 85% corresponding to their genders. In this case, there are two variables – gender = X (independent variable) and result = Y (dependent variable). - Linear regression - Simple regression - Correlation Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 119. Multivariate Analysis • Multivariate analysis is a more complex form of statistical analysis technique and used when there are more than two variables in the data set. Here is an example – • A doctor has collected data on cholesterol, blood pressure, and weight. She also collected data on the eating habits of the subjects (e.g., how many ounces of red meat, fish, dairy products, and chocolate consumed per week). She wants to investigate the relationship between the three measures of health and eating habits? • Factor Analysis • Cluster Analysis • Variance Analysis • Discriminant Analysis • Multidimensional Scaling • Principal Component Analysis • Redundancy Analysis Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 120. ANOVA – One Way • A one-way ANOVA is a type of statistical test that compares the variance in the group means within a sample while considering only one independent variable or factor. • It is a hypothesis-based test, meaning that it aims to evaluate multiple mutually exclusive theories about our data. • A one-way ANOVA compares three or more than three categorical groups to establish whether there is a difference between them. Within each group there should be three or more observations (here, this means walruses), and the means of the samples are compared. • In a one-way ANOVA there are two possible hypotheses. - The null hypothesis (H0) is that there is no difference between the groups and equality between means. (Walruses weigh the same in different months) - The alternative hypothesis (H1) is that there is a difference between the means and groups. (Walruses have different weights in different months) Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 121. ANOVA – One Way - Assumptions - Normality – That each sample is taken from a normally distributed population - Sample independence – that each sample has been drawn independently of the other samples - Variance Equality – That the variance of data in the different groups should be the same - Your dependent variable – here, “weight”, should be continuous – that is, measured on a scale which can be subdivided using increments (i.e. grams, milligrams) Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 122. ANOVA – Two Way • A two-way ANOVA is, like a one-way ANOVA, a hypothesis-based test. However, in the two-way ANOVA each sample is defined in two ways, and resultingly put into two categorical groups. • The two-way ANOVA therefore examines the effect of two factors (month and gender) on a dependent variable – in this case weight, and also examines whether the two factors affect each other to influence the continuous variable. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 123. ANOVA – Two Way - Assumptions - Your dependent variable – here, “weight”, should be continuous – that is, measured on a scale which can be subdivided using increments (i.e. grams, milligrams) - Your two independent variables – here, “month” and “gender”, should be in categorical, independent groups. - Sample independence – that each sample has been drawn independently of the other samples - Variance Equality – That the variance of data in the different groups should be the same - Normality – That each sample is taken from a normally distributed population Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 124. One-Wayvs Two-Way ANOVA Differences Chart One-Way ANOVA Two-Way ANOVA Definition A test that allows one to make comparisons between the means of three or more groups of data. A test that allows one to make comparisons between the means of three or more groups of data, where two independent variables are considered. Number of Independent Variables One. Two. What is Being Compared? The means of three or more groups of an independent variable on a dependent variable. The effect of multiple groups of two independent variables on a dependent variable and on each other. Number of Groups of Samples Three or more. Each variable should have multiple samples. Business Statistics and Analytics – BIET MBA Programme Prof. Vijay K S
  • 125. Chi-Square Test and Analysis of Variance Vijay K S
  • 127. • A key statistical test in research fields including biology, economics, and psychology • Analysis of Variance (ANOVA) is very useful for analyzing datasets. • It allows comparisons to be made between three or more groups of data. • In a given data set, one can observe two main variations. One due to chance and the other due to some specific reasons. • These variations are studied separately in ANOVA to identify the actual cause of the variation and help the researcher to make effective decisions. • Two types of ANOVA are commonly used, One-Way ANOVA and Two-Way ANOVA.
  • 128. Analysis of Variance • ANOVA is an inferential statistics technique that allows you to compare the mean level on one interval-ratio variable (such as income) for each group relative to the others in a nominal variable (such as degree). • If you had only two groups to compare, ANOVA would give the same answer as an independent samples t-test.
  • 129. ANOVA Isn’t it conceivable that the differences are due to natural random variability between samples? Would you want to claim they are different in the population? Marks scored by the students Marks scored by the students Just Imagine that the following distribution represents the distribution of marks scored by the students belonging to a different section. How do you interpret the data presentation? Groups Broken Down All Groups
  • 130. ANOVA Now…What if three sections had scores distributed like this in your sample? Doesn’t it now appear that the groups may be different regardless of sampling variability? Would you feel comfortable claiming the groups are different in the population? All Groups Combined Groups Separated Out Marks scored by the students. Marks scored by the students.
  • 131. ANOVA Conceptually, ANOVA compares the variance within groups to the overall variance between all the groups to determine whether the groups appear distinct from each other or if they look quite the same. Different groups, different means. Y-bar Y-bar Y-bar Similar groups, similar means. Y-bars Categories of Nominal Variable Measures on Continuous Variable 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 10 11 12 13 14 15 16
  • 132. ANOVA • When the groups have little variation within themselves, but large variation between them, it would appear that they are distinct and that their means are different. Y-bar Y-bar Y-bar Y-bars Different groups, different means. Similar groups, similar means.
  • 133. ANOVA • When the groups have a lot of variation within themselves, but little variation between them, it would appear that they are similar and that their means are not really different (perhaps they differ only because of peculiarities of the particular sample). Y-bar Y-bar Y-bar Y-bars Similar groups, similar means. Different groups, different means.
  • 134. One – Way ANOVA • One-way Analysis of Variance (ANOVA) is used to test whether the means of two or more independent (Unrelated) groups are statistically significantly different • A Table of variation, ANOVA table represents as follows Sources of Variance Sum of Squares (SS) Degree of Freedom (d.f.) Means of Square (MS) F Ratio Between the sample Sum of Squares between the sample (SSB) (K-1) MSB = SSB/(k-1) MSB(mean sum of squares between)/MSW(mean sum of Squares within) F Ratio = MSB/MSW Within the sample Sum of Squares within the sample (SSW) (n-k) MSW = SSW/(n-k) Total Total Sum of Squares (n-1)
  • 135. Assumptions of One way ANOVA • Normally distributed outcome • Equal variances between the groups • Groups are independent Hypothesis of One way ANOVA  = = = 3 2 1 0 μ μ μ : H same the are means population the of all Not : 1 H
  • 136. The process of carrying out one-way ANOVA • Calculate the mean of each sample • Calculate the mean of all sample means • Calculate the variation between two samples, Known as SSB (Sum of Squares between) • Divide SSB with the degrees of freedom (d.f.) to get the mean of the square between. • The mean square between in the mean of variations in two samples • Calculate the variation within the samples known as SSW(SS within) • Divide SSW with the degrees of freedom (n-k) to get the mean square within (MS within) • Add the square of deviation to get the total variation in the sample • Calculate the F Ratio
  • 137. Problem • The researcher observed the sale of products of a particular brand in six big retail houses in three cities. He/she wants to determine whether the mean sale is the same across the cities. Use the data shown in the following table to calculate one-way ANOVA: Retail Houses City A City B City C 1 3 6 9 2 8 9 8 3 4 8 6 4 9 5 7 5 6 7 5 6 7 4 7
  • 138. Steps Step 1: Defining the hypothesis H0: There is no significant difference in sales between the three cities / The sales in the three cities are the same. Step 2: Calculate the mean sales of three cities separately, and the total sample mean Retail Houses City A City B City C 1 3 6 9 2 8 9 8 3 4 8 6 4 9 5 7 5 6 7 5 6 7 4 7 Mean 6.17 6.5 7 Mean of Samples 6.556666667
  • 139. Steps • Step 3: Calculate Sample Square Between • Step 4: Calculate the sample Square WITHIN
  • 140. Steps • Step 5: Calculate the total Variance • Step 6: Creating a ANOVA table Sources of Variance Sum of Squares (SS) Degree of Freedom (d.f.) Means of Square (MS) F Ratio 5% F Limit Between the sample 2.1 (3-1) = 2 MSB = 2.1/2 = 1.06 MSB(mean sum of squares between)/MSW(mean sum of Squares within) F Ratio = MSB/MSW = 1.06/3.64 = .29 3.68 Within the sample 54.34 (18-3) = 15 MSW = 54.34/15 = 3.64 Total 56.48 (18-1) = 17
  • 141. Ho is accepted, and H1 is rejected. The value implies that the product’s sales are almost the same in the three cities There is no significant difference in sales among these cities The F Ratio value is < the critical/Table Value Hence the null hypothesis is accepted
  • 142.
  • 144. Steps Sources of Variance Sum of Squares (SS) Degree of Freedom (d.f.) Means of Square (MS) F Ratio 5% F Limit Between the sample 2.1 (3-1) = 2 MSB = 2.1/2 = 1.06 MSB(mean sum of squares between)/MSW(mean sum of Squares within) F Ratio = MSB/MSW = 1.06/3.64 = .29 3.68 Within the sample 54.34 (18-3) = 15 MSW = 54.34/15 = 3.64 Total 56.48 (18-1) = 17
  • 145. Homework • How much of the variance in height is explained by the treatment group? Treatment 1 Treatment 2 Treatment 3 Treatment 4 60 inches 50 48 47 67 52 49 67 42 43 50 54 67 67 55 67 56 67 56 68 62 59 61 65 64 67 61 65 59 64 60 56 72 63 59 60 71 65 64 65
  • 147. Two Way ANOVA – Steps Involved • Step 1: Find the Correction term • Step 2: Find the Sum of Squares of the total (SST) • Step 3: Sum of Squares of Column • Step 4: Sum of Squares of Rows • Step 5: Find the Sum of the Square Residual • Step 6: Creating ANOVA Table
  • 148. Problem • Three respondents have rates three small cars of different brands on a five-point scale (5 being the highest) concerning their features. The ratings and features are provided in the following table. Respondents Mileage Durability Maintenance Cost Technology Price 1 Zen 3 2 4 3 5 I10 4 4 4 5 4 Alto 4 3 5 2 4 2 Zen 2 4 3 1 4 I10 4 5 3 4 4 Alto 3 1 2 5 3 3 Zen 4 5 3 2 4 I10 3 2 4 5 3 Alto 4 5 4 5 5
  • 149. Steps • Step 1: Find the correction term • Step 2: Find the Sum of Squares of total (SST) Total of All the observations 162 Square of total 162*162 = 26244 Number of Observations 45 Square of total/No. of Observations 26244/45 = 583.2 Sum pf Squares of All the individual 638 Correction term 583.2 SST 54.8 Here the researcher wants to know the difference between the brands in terms of features. H0: There is no difference in the means of the five features of the cars.
  • 150. Steps • Step 3: Sum of Squares of Column (i.e. between the variables) • Step 4: Sum of Squares of Row (i.e. between the Cars) Sum of Colums 31 31 32 32 36 Sum of Squares of Column 961 961 1024 1024 1296 Sum of Squares of Column / Observation of Column 585.2 Correction term 583.2 Sum of Squares between(SSB) 2 Sum of rows - 1 Respondents 56 Sum of Rows - 2 Respondent 48 Sum of Rows - 3 Respondent 58 Square of Row - 1 Respondent 3136 Square of Row - 2 Respondent 2304 Square of Row - 3 Respondent 3364 Sum of Squares of Rows 8804 Sum of Squares of Rows/ Observation 587 Correction term 583.2 Sum of Squares between Cars 3.8
  • 151. Steps • Step 5: Find the Sum of the Square Residual • Step 6: Creating ANOVA Table Sources of Variance Sum of Squares (SS) Degree of Freedom (d.f.) Means of Square (MS) F Ratio 5% F Limit Between Column 2 (5-1) = 4 2/4 = 0.5 0.5/6.125 = 0.08 F(4,8) = 3.84 Between Rows 3.8 (3-1) = 2 3.8/2 = 1.9 1.9/6.125 = 0.31 F(2,8) = 4.46 Residual 49 (5-1)(3-1) = 8 49/8 = 6.125 Total 56.48 (45-1) = 44
  • 152.
  • 153. F Calculated value is less than the F Critical Value / Table Value; the Null hypothesis is accepted. F Value lies in the acceptance region; hence H0 is accepted, and H1 is rejected. So we can state that there is no difference in the means of the five features of the cars.
  • 155. 155 Parametric and Nonparametric Tests (cont.) • The term "non-parametric" refers to the fact that the chi-square tests do not require assumptions about population parameters, nor do they test hypotheses about population parameters. • Previous examples of hypothesis tests, such as the t tests and analysis of variance, are parametric tests and they do include assumptions about parameters and hypotheses about parameters. • The most obvious difference between the chi-square tests and the other hypothesis tests we have considered (t and ANOVA) is the nature of the data. • For chi-square, the data are frequencies rather than numerical scores.
  • 157. Chi-Square Test • This statistical test is to compare the observed results with the expected results. • The purpose is to determine whether the difference is due to chance or a relationship due to the relationship among the variables we are studying. • Chi-square enables us to understand and interpret the relationship between two categorical variables. • Chi-square test is denoted by the symbol χ2 • This test can be performed for the categorical data than the numerical data • The formula to calculate the ch-square test is
  • 158. Applications of Chi-Square test • To test the divergence of observed results from the expected results when our expectations are based on the hypothesis of equal probability • Chi-square test is used to determine the degree of association between the two variable.
  • 159. O = Observed or actual values E = Expected Value
  • 160. Chi-Square Test Chi-Square Test for Goodness of Fit Chi-Square Test for Independence
  • 161. Chi-Square Test for Goodness of Fit • This test helps the researcher to know whether the theoretical distribution is fitted to the observed data and to what extent. • It allows you to draw conclusions about the distribution of a population based on a sample. Using the chi-square goodness of fit test, you can test whether the goodness of fit is “good enough” to conclude that the population follows the distribution. • Goodness-of-Fit is a statistical hypothesis test used to see how closely observed data mirrors expected data.
  • 162. Assumptions • 1 or more categories • Independent observations • A sample size of at least 10 • Random sampling • All observations must be used • For the test to be accurate, the expected frequency should be at least 5
  • 163. Chi-Square Test for Goodness of Fit - Problems Test the hypothesis that the customers have no preference for any particular products. Use a 5% level of significance
  • 164. Solution: Step 1: Formulating the hypothesis: Ho: The customers have no preference for any particular products H1: Customers have a preference for a particular product Step 2: Level of Significance, In the problem, it was given as 5% The degrees of freedom (n-1) = (4-1) = 3 Step 3: Calculate χ2 Value
  • 165. Solution: Step 4: Compare the χ2 Value with the Critical value at 5% level of significance and 3 degrees of freedom Here the critical value / Table value = 7.81 So Calculated chi-squared (27.2) is > than the chi-squared table (7.81), Hence rejecting the Null hypothesis Product Number of Customers Preferred (O) Expected Value (E) (O-E) (O-E)^2 (O-E)^2/E Product A 300 250 50 2500 10 Product B 280 250 30 900 3.6 Product C 220 250 -30 900 3.6 Product D 200 250 -50 2500 10 Total 1000 (O-E)^2/E 27.2 Average (Expected Value) 250 χ2 = 27.2 Step 3: Calculate χ2 Value 𝛴
  • 166.
  • 167. Example 2: The following table gives the number of defective items in a factory on various days in a week. Using the chi-square test checks whether the defective items are uniformly distributed or not at 5% Level of significance Days Number of defective Items Monday 14 Tuesday 22 Wednesday 16 Thursday 18 Friday 12 Saturday 19 Sunday 11
  • 168. Solution: Step 1: Formulating the hypothesis: Ho: The defective items are uniformly distributed across the days H1: The defective items are not uniformly distributed across the days Step 2: Level of Significance, In the problem, it was given as 5% The degrees of freedom (n-1) = (7-1) = 6 Step 3: Calculate χ2 Value
  • 169. Solution: Step 4: Compare the χ2 Value with the Critical value at 5% level of significance and 6 degrees of freedom Here the critical value / Table value = 12.59 So Calculated chi-squared (5.875) is < than the chi-squared table (12.59), Hence Accepting the Null hypothesis Therefore, The defective items are uniformly distributed across the days. χ2 = 5.875 Step 3: Calculate χ2 Value Days Number of defective Items (O) Expected Value (E) (O-E) (O-E)^2 (O-E)^2/E Monday 14 16 -2 4 0.25 Tuesday 22 16 6 36 2.25 Wednesday 16 16 0 0 0 Thursday 18 16 2 4 0.25 Friday 12 16 -4 16 1 Saturday 19 16 3 9 0.5625 Sunday 11 16 -5 25 1.5625 Total 112 (O-E)^2/E 5.875 Average 16 𝛴
  • 170.
  • 171. Chi-Square Test for Independence • Here, the two attributes/variables are tested to determine whether they are associated. • Example: Whether introducing a training program increases the efficiency of employees. Intend to establish a relationship between training and the efficiency of employees. • It allows you to draw conclusions about a population based on a sample. Specifically, it allows you to conclude whether two variables are related in the population. • can be used and interpreted in two different ways: 1. Testing hypotheses about the relationship between two variables in a population, or 2. Testing hypotheses about differences between proportions for two or more populations.
  • 172. Chi-Square Test for Independence - Problems Example 1: The researcher has the data for the preference of men and women regarding joint and nuclear families, as shown in the table The researcher wants to know whether the preference of men and women about the type of family is the same or not at 5% Level of Significance Joint Family Nuclear Family Total Men 96 35 131 Women 170 360 530 Total 266 395 661
  • 173. Solution: Step 1: Formulating the hypothesis: Ho: The opinion of men and women about the type of family is indifferent H1: The opinion of men and women about the type of family is different Step 2: Level of Significance, In the problem, it was given as 5% The degrees of freedom (r-1)(c-1) = 1 Step 3: Calculate χ2 Value Expected Value = Row Total * Column Total / Grand Total
  • 174. Solution: Step 4: Compare the χ2 Value with the Critical value at 5% level of significance and 1 degree of freedom Here the critical value / Table value = 3.84 So Calculated chi-squared (74.17) is > than the chi-squared table (3.84), Hence rejecting the Null hypothesis. Therefore, The opinion of men and women about the type of family is different. χ2 = 74.17 Step 3: Calculate χ2 Value Items Number of Preference Expected Value (E) (O-E) (O-E)^2 (O-E)^2/E Men Towards Joint Family 96 52.72 43.28 1873.41 35.54 Women Towards Joint Family 170 213.28 -43.28 1873.41 8.78 Men towards nuclear family 35 78.28 -43.28 1873.41 23.93 Women towards nuclear family 360 316.72 43.28 1873.41 5.92 (O-E)^2/E 74.17 𝛴 Expected Value is calculated with this formula = E = (Row Total *Column total) / Grand Total Example: Expected value for “men towards Joint Family” is calculated E=(131*266)/661 = 52.72