SlideShare a Scribd company logo
1 of 156
1
QUANTITATIVE METHODS I
Ashok K Mittal
Department of IME
IIT Kanpur
Statistics
Q1 I want to invest Rs 1000 , in which company
should I invest
Growth
Returns
Risk
Statistics
• Q2 How do I know which Company will
give me
• High Return
• Or
• Growth
• But Risk should be Low
Statistics
• Collect Information from the Past
• Qualitative
• Quantitative ( Data)
• Analyze Information ( Data) to provide
patterns of past performance ( Descriptive
Statistics)
• Project these patterns to answer your
questions ( Inference)
Types of data
 Primary: You do a survey to find out
the percentage of people living below
the poverty line in Allahabad
 Secondary: You are interested in
studying the performance of banks
and for that you study the RBI
published documents
Descriptive Statistics
Presentation of data
 Non frequency data
 Frequency data
Non frequency data
Time series representation of data
BSE(30) Close
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
3-Jan-94
5-Jan-94
7-Jan-94
11-Jan-94
13-Jan-94
17-Jan-94
19-Jan-94
24-Jan-94
27-Jan-94
31-Jan-94
2-Feb-94
4-Feb-94
8-Feb-94
10-Feb-94
14-Feb-94
16-Feb-94
18-Feb-94
22-Feb-94
24-Feb-94
28-Feb-94
1-Mar-94
3-Mar-94
7-Mar-94
9-Mar-94
15-Mar-94
17-Mar-94
21-Mar-94
23-Mar-94
25-Mar-94
29-Mar-94
31-Mar-94
Date
BSE(30)Close
Non frequency data
Spatial series representation of
data
Fertiliser Consumption for few Indian states for 1999-2000 (in tonnes)
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
A
nd
hra
P
radeshK
arnata
ka
K
eralaT
am
ilN
adu
G
ujarat
M
adh
ya
P
rad
esh
M
aha
rashtraR
ajasthan
H
aryan
a
P
un
jab
U
ttar
P
radesh
B
ihar
O
rissa
W
e
stB
e
ngal
A
ssam
States
Fertiliserconsumption(intonnes)
Fertiliser Consumption
Frequency data:
Tabular representation
India at a glance Year
(% of GDP) 1983 1993 2002 2003
Agriculture 36.6 31.0 22.7 22.2
Industry 25.8 26.3 26.6 26.6
Mfg 16.3 16.1 15.6 15.8
Services 37.6 42.8 50.7 51.2
Pvt Consump 71.8 37.4 65.0 64.9
GOI consump 10.6 11.4 12.5 12.8
Import 8.1 10.0 15.6 16.0
Domes save 17.6 22.5 24.2 22.2
Interests paid 0.4 1.3 0.7 18.3
Note: 2003 refers to 2003-2004; data are preliminary. Gross domestic
savings figures are taken directly from India′s central statistical
organization.
Frequency data
Line diagram representation
Fertiliser Consumption for few Indian states for 1999-2000 (in tonnes)
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
Andhra
PradeshKarnataka
KeralaTam
ilN
adu
G
ujarat
M
adhya
P
radesh
M
aharashtraR
ajasthan
H
aryana
Punjab
U
ttarPradesh
Bihar
O
rissa
W
estBengal
Assam
States
Fertiliserconsumption(intonnes)
Fertiliser Consumption
Frequency data
Bar diagram (histogram)
representation
World Population (projected mid 2004)
0
1000000000
2000000000
3000000000
4000000000
5000000000
6000000000
7000000000
1950 1960 1970 1980 1990 2000
Year
Population
World Population
Frequency data
Bar diagram (histogram)
representation
Height and Weight of individuals
0
20
40
60
80
100
120
140
160
180
200
Ram Shyam Rahim Praveen Saikat Govind Alan
Individual
Height/Weight
Height (in cms) Weight (in kgs)
Frequency data
Pie diagram/chart representation
Median marks in JMET (2003)
Verbal Quantitative Analytical Data Interpresentation
Frequency data
Box plot representation
The box plot is also called the box whisker
plot. A box plot is a set of five summary
measures of distribution of the data which
are
 median
 lower quartile
 upper quartile
 smallest observation
 largest observation.
Frequency data
Box plot representation
Median
LQ UQ
Whisker
Frequency data
Box plot representation
Here:
 UQ – LQ = Inter quartile range
(IQR)
 X = Smallest observation within
certain percentage of LQ
 Y = Largest observation within
certain percentage of UQ
Important note
A cumulative frequency diagram
is called the ogive. The abscissa
of the point of intersection
represents the median of the
data.
Example
Frequency
0
2
4
6
8
10
500to550
550to600
600to650
650to700
700to750
750to800
800to850
850to900
900to950
950to1000
1000to1050
1050to1100
1100to1150
1150to1200
1200to1250
1250to1300
Class Interval
Frequency
Frequency
Analysis
Measurements of uncertainty
Concept of uncertainty and different measures
of uncertainty, Probability as a measure of
uncertainty, Description of qualitative as well
as quantitative probability, Assessment of
probability, Concepts of Decision trees,
Random Variables, Distributions, Expectations,
Probability plots, etc.
Definitions
Quantitative variable: It can be described by a number
for which arithmetic operations such as averaging
make sense.
Qualitative (or categorical) variable: It simply records a
qualitative, e.g., good, bad, right, wrong, etc.
We know statistics deals with measurements, some
being qualitative others being quantitative. The
measurements are the actual numerical values of a
variable. Qualitative variables could be described by
numbers, although such a description might be
arbitrary, e.g., good = 1, bad = 0, right = 1, wrong = 0,
etc.
Scales of measurement
Nominal scale: In this scale numbers are used
simply as labels for groups or classes. If we
are dealing with a data set which consists of
colours blue, red, green and yellow, then we
can designate blue = 3, red = 4, green = 5 and
yellow = 6. We can state that the numbers
stand for the category to which a data point
belongs. It must be remembered that nothing
is sacrosanct regarding the numbering against
each category. This scale is used for
qualitative data rather than quantitative data.
Scales of measurement
Ordinal Scale: In this scale of
measurement, data elements may be
ordered according to relative size or
quality. For example a customer or a
buyer can rank a particular
characteristics of a car as good,
average, bad and while doing so he/she
can assign some numeric value which
may be as follows, characteristic good =
10, average = 5 and bad = 0.
Scales of measurement
Interval Scale: For the interval scale we specify intervals
in a way so as to note a particular characteristic, which
we are measuring and assign that item or data point
under a particular interval depending on the data point.
Consider we are measuring the age of school going
students between classes 5 to 12 in the city of Kanpur.
We may form intervals 10-12 years, 12-14 years,....., 18-
20 years. Now when we have one data point, i.e., the
age of a student we put that data under any one
particular interval, e.g. if the student's age is 11 years,
we immediately put that under the interval 10-12 years.
Scales of measurement
Ratio Scale: If two measurements are in
ratio scale, then we can take ratios of
measurements. The ratio scale
represents the reading for each recorded
data in a way which enables us to take a
ratio of the readings in order to depict it
either pictorially or in figures. Examples
of ratio scale are measurements of
weight, height, area, length etc.
Definitions: different measures
1) Measure of central tendency
 Mean (Arithmetic mean (AM), Geometric
mean (GM), Harmonic mean (HM))
 Median
 Mode
1) Measure of dispersion
 Variance or Standard deviation
 Skewness
 Kurtosis
Definition: Mean
Given N number of observations, x1, x2,
….., xN we define the following
].....[
1
21 NXXX
N
AM +++=
N
NXXXGM
1
21 ]*.....**[=
1
21
)]
1
.....
11
(
1
[ −
+++=
NXXXN
HM
Definition: Median and Mode
 Median(µe) : The median of a data set is
the value below which lies half of the data
points. To find the median we use F (µe) =
0.5.
 Mode (µo): The mode of a data set is the
value that occurs most frequently. Hence
f(µo) ≥ f(x); ∀ x.
Definition: Variance, Standard
deviation, Skewness, Kurtosis
 Variance: V[X] =
 Standard deviation (SD) = σ
 Skewness =
 Kurtosis =
( )
3
3
2
3
2
3
1
σ
µ
µ
µ
βγ ===
( )[ ]22
XEXE −=σ






−=−= 33 4
4
22
σ
µ
βγ
Example
Consider we have the following data
points:
5, 7, 10, 7, 10, 11, 3, 5, 5
For these data points we have
µ = 7; µe = 10; µo = 5; σ2
= 6.89
Descriptive statistics
Suppose the data are available in the form of a
frequency distribution. Assume there are k
classes and the mid-points of the corresponding
class intervals being x1, x2,…., xk. While the
corresponding frequencies are f1, f2,….., fk, such
that n = f1+f2+…..+fk
Then: ∑
=
=
k
i
ii fx
n 1
1
µ ∑
=
−=
k
i
ii fxx
n 1
2
1
2
})(
1
{σ
Consider m groups of observations with
respective means µ1, µ2,….., µm and standard
deviations σ1, σ2,….., σm. Let the group sizes be
n1, n2,….., nm such that n = n1, n2,….., nm.
Then:
Descriptive statistics
∑
=
=
m
i
iiOVERALL f
n 1
1
µµ
2
1
1
2
1
2
}])({
1
[ ∑∑
==
−+=
m
i
OVERALLii
m
i
iiOVERALL nn
n
µµσσ
Probability
Random event
Random experiment: Is an experiment whose
outcome cannot be predicted with certainty.
 Sample space (Ω) : The set of all possible
outcomes of a random experiment
 Sample point (ωi): The elements of the
sample space
 Event (A): Is a subset of the sample space
such that it is a collection of sample point(s).
Probability
Probability (P(A)): Of an event is defined as a quantitative
measure of uncertainty of the occurrence of the event
 Objective probability: Based on game of chance and which
can be mathematically proved or verified . If the experiment
is the same for two different persons, then the value of
objective probability would remain the same. It is the
limiting definition of relative frequency. Example: be
probability of getting the number 5 when we roll a fair die.
 Subjective probability: Based on personal judgment,
intuition and subjective criteria. Its value will change from
person to person. Example one person sees the chance of
India winning the test series with Australia high while the
other person sees it to be low.
Random event
For a random experiment, we denote
P(ωi) = pi
Where:
 P(ωi) = pi = Probability of occurrence of the sample
point ωi
 P(A) = Probability of occurrence of the event

∑
∈
=
A
i
i
pAP
ϖ
)(
1)( ==Ω ∑
Ω∈i
ipP
ϖ
Example 1
Suppose there are two dice each with faces 1, 2,....., 6
and they are rolled simultaneously. This rolling of the
two dice would constitute our random experiment
Then we have:
 Ω = {(1,1), (1,2), (1,3), (1,4), (1,5), (1,6), (2,1),.…..,
(5,6), (6,1), (6,2), (6,3), (6,4), (6,5), (6,6)}.
 ωi = (1,1), (1,2),…., (6,5), (6,6)
 We define the event is such that the outcomes for
each die are equal in one simultaneous throw, then A
= {(1, 1), (2, 2),….., (6, 6)}
 P(ωi): p1 = p2 = ….. = p36 = 1/36
 P(A) = p1 + p8 + p15 + p22 + p29 + p36 = 6/36
Example 2
Suppose a coin is tossed repeatedly till the first
head is obtained.
Then we have:
 Ω = {(H), (T,H), (T,T,H),………}
 ωi = (H), (T,H), (T,T,H),…..
 We define the event such that at most 3 tosses
are needed to obtain the first head, then A =
{(H), (T,H), (T,T,H)}
 P(ωi): p1 = ½, p2 = (½)2
, p3 = (½)3
, p4 = (½)4
,..…
 P(A) = p1 + p2 + p3 = 7/8
Example 3
In a club there 10 members of whom 5 are
Asians and the rest are Americans. A committee
of 3 members has to be formed and these
members are to be chosen randomly. Find the
probability that there will be at least 1 Asian and
at least 1 American in the committee
Total number of cases = 10
C2 and the number of
cases favouring the formation of the committee
is 5
C2*5
C1 + 5
C1*5
C2
Hence P(A) = 100/120
Example 4
Suppose we continue with example 2
which we have just discussed and we
define the event B , that al least 5 tosses
are needed to produce the first head
 Ω = {(H), (T,H), (T,T,H),………}
 ωi = (H), (T,H), (T,T,H),…..
 P(ωi): p1 = ½, p2 = (½)2
, p3 = (½)3
, p4 = (½)4
,..…
 P(B) = p5+p6+p7+ ….. = 1 – (p1+p2+p3+p4)
Theorem in probability
For any event A, B ∈ Ω
 0 ≤ P(A) ≤ 1
 If A ⊂ B, then P(A) ≤ P(B)
 P(A U B) = P(A) + P(B) – P(A ∩ B)
 P(AC
) = 1 – P(A)
 P(Ω) = 1
 P(φ) = 0
Definitions
 Mutually exclusive: Consider n events A1,
A2,….., An. They are mutually exclusive if no
two of them can occur together, i.e., P(Ai ∩
Aj) = 0. ∀ i, j (i ≠ j) ∈ n
 Mutually exhaustive: Consider n events A1,
A2,….., An. They are mutually exhaustive if at
least one of them must occur and
P(A1UA2U…..UAn) = 1
Example 5
Suppose a fair die with faces 1, 2,….., 6 is rolled.
Then Ω = {1, 2, 3, 4, 5, 6}. Let us define the events
A1 = {1, 2}, A2 = {3, 4, 5, 6} and A3 = {3, 5}
 The events A2 and A3 are neither mutually exclusive
nor exhaustive
 A1 and A3 are mutually exclusive but not exhaustive
 A1, A2 and A3 are not mutually exclusive but are
exhaustive
 A1 and A2 are mutually exclusive and exhaustive
Conditional probability
Let A and B be two events such that P(B) > 0.
Then the conditional probability of A given B is
Assume Ω = {1, 2, 3, 4, 5, 6}, A = {2}, B = {2, 4, 6}.
Then A ∩ B = {2} and
)(
)(
)|(
BP
BAP
BAP
∩
=
3
1
63
61
)|( ==BAP 1
61
61
)|( ==ABP
Baye′s Theorem
Let B1, B2,….., Bn be mutually exclusive and
exhaustive events such that P(Bi) > 0, for
every i =1, 2,…., n and A be any event
such that then we have∑
=
=
n
i
ii BPBAPAP
1
)()|()(
∑
=
=
n
i
ii
jj
j
BPBAP
BPBAP
ABP
1
)()|(
)()|(
)|(
Independence of events
Two events A and B are called
independent if P(A∩B) = P(A)*P(B)
Distribution
Distribution
Depending what are the outcomes of an
experiment a random variable (r.v) is used
to denote the outcome of the experiment
and we usually denote the r.v using X, Y
or Z and the corresponding probability
distribution is denoted by f(x), f(y) or f(z)
 Discrete: probability mass function (pmf)
 Continuous: probability density function
(pdf)
Discrete distribution
1) Uniform discrete distribution
2) Binomial distribution
3) Negative binomial distribution
4) Geometric distribution
5) Hypergeometric distribution
6) Poisson distribution
7) Log distribution
Bernoulli Trails
1) Each trial has two possible outcomes,
say a success and a failure.
2) The trials are independent
3) The probability of success remains the
same and so does the probability of
failure from one trial to another
Uniform discrete distribution
[X ~ UD (a , b) ]
f(x) = 1/n x = a, a+k, a+2k,….., b
 a and b are the parameters where a, b ∈
R
 E[X] =
 V[X] =
 Example: Generating the random
numbers 1, 2, 3,…, 10. Hence
X~UD(1,10) where a=1, k=1, b=10. Hence
n=10.
)1(
2
−+ n
k
a
)
12
1
(
2
2 −n
k
Uniform discrete distribution
Uniform discrete distribution
0
0.02
0.04
0.06
0.08
0.1
0.12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
x
f(x)
Binomial distribution
[X ~ B (p , n)]
f(x) = n
Cxpx
qn-x
x = 0, 1, 2,…..,n
 n and p are the parameters where p ∈ [0, 1] and n ∈ Z+
 E[X] = np
 V[X] = npq
 Example: Consider you are checking the quality of the
product coming out of the shop floor. A product can
either pass (with probability p = 0.8) or fail (with
probability q = 0.2) and for checking you take such 50
products (n = 50). Then if X is the random variable
denoting the number of success in these 50 inspections,
we have
X~50
Cx(0.8)x
(0.2)50-x
Binomial distribution
Binomial distribution
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0 2 4 6 8 10 12 14 16 18 20 22 24 30 32 34 36 38 40 42 44 46 48 50
x
f(x)
Negative binomial distribution
[X ~ NB (p , r)]
f(x) = r+x-1
Cr-1pr
qx
x = r, r+1,.…..
 p and r are the parameters where p ∈ [0, 1] and r ∈ Z+
 E[X] = rq/p
 V[X] = rq/p2
 Example: Consider the example above where you are
still inspecting items from the production line. But now
you are interested in finding the probability distribution
of the number of failures preceding the 5th
success of
getting the right product. Then, we have, considering
p=0.8, q=0.2
X ~ 5+x-1
C5-1(0.8)5
(0.2)x
Negative binomial distribution
Negative binomial distribution
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
x
f(x)
Geometric distribution
[X ~ G (p)]
f(x) = pqx
x = 0,1,2,…..
 p is the parameter where p ∈ [0, 1]
 E[X] = q/p (r = 1 in the Negative Binomial distribution
case)
 V[X] = q/p2
(r = 1 in the Negative Binomial distribution
case)
 Example: Consider the example above. But now you
are interested in finding the probability distribution of
the number of failures preceding the 1st
success of
getting the right product. Then, we have considering
p=0.8, q=0.2
X~ (0.8)(0.2)x
Geometric distribution
Geometric distribution
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
x
f(x)
Hypergeometric distribution
[X ~ HG (N, n, p)]
f(x) = Np
Cx
Nq
Cn-x/N
Cn 0 ≤ x ≤ Np and 0 ≤ (n – x) ≤ Nq
 N, n and p are the parameters
 E[X] = np
 V[X] = npq{(N – n)/(N – 1)}
 Example: Consider the example above. But now you are interested in
finding the probability distribution of the number of failures(success) of
getting the wrong(right) product when we choose n number of products
for inspection out of the total population N. If the population is 100 and
we choose 10 out of those, then the probability distribution of getting the
right product, denoted by X is given by
X~85
Cx
15
C10-x/100
C10
 Remember
 p (0.85) and q (0.15) are the proportions of getting a good item and
bad item respectively.
 In this distribution we are considering the choosing is done without
replacement
Hypergeometric distribution
Hypergeometric distribution
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
1 2 3 4 5 6 7 8 9 10
x
f(x)
f(x)
Poisson distribution
[X ~ P (λ)]
f(x) = e-λ
λx
/x! x = 0,1,2,…..
 λ is the parameter where λ > 0
 E[X] = λ
 V[X] = λ
 Example: Consider the arrival of the number of
customers at the bank teller counter. If we are
interested in finding the probability distribution of the
number of customers arriving at the counter in specific
intervals of time and we know that the average
number of customers arriving is 5, then, we have
X~ e-5
5x
/x!
Poisson distribution
Poisson distribution
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
x
f(x)
Log distribution
[X ~ L (p)]
f(x) = -(loge p)-1
x-1
(1 – p)x
x = 1,2,3,…..
 p is the parameter where p ∈ (0, 1)
 E[X] = -(1-p)/(plogep)
 V[X] = -(1-p)[1 + (1 - p)/logep]/(p2
logep)
 Example
1) Emission of gases from engines against fuel type
2) Used to represent the distribution of the number of
items of a product purchased by a buyer in a
specified period of time
Log distribution
Log distribution
0
0.1
0.2
0.3
0.4
0.5
0.6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
x
f(x)
Continuous distribution
1) Uniform distribution
2) Normal distribution
3) Exponential distribution
4) Chi-Square distribution
5) Gamma distribution
6) Beta distribution
7) Cauchy distribution
Continuous distribution
8) t-distribution
9) F-distribution
10)Log-normal distribution
11)Weibull distribution
12)Double exponential distribution
13)Pareto distribution
14)Logistic distribution
Uniform distribution
[X ~ U (a, b)]
f(x) = 1/(b – a) a ≤ x ≤ b
 a and b are the parameters where a, b ∈
R and a < b
 E[X] = (a+b)/2
 V[X] = (b-a)2
/12
 Example: Choosing any number between
1 and 10, both inclusive, from the real line
Uniform distribution
Uniform distribution
0
0.02
0.04
0.06
0.08
0.1
0.12
1 10
x
f(x)
Normal distribution
[X ~ N ( µ, σ2
)]
-∞ < x < ∞
 µX, σ2
X are the parameters where µX ∈ R and σ2
X >
0
 E[X] = µX
 V[X] = σ2
X
 Example: Consider the average age of a student
between class VII and VIII selected at random
from all the schools in the city of Kanpur
2
2
2
)(
2
1
)( X
Xx
X
exf σ
µ
σπ
−
−
=
Normal distribution
Normal distribution
0
0.05
0.1
0.15
0.2
0.25
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
x
f(x)
Log-normal distribution
[X ~ LN ( µ, σ2
)]
0 < x < ∞
 µX, σ2
X are the parameters where µX ∈ R
and σ2
X > 0
 E[X] = exp(µX+σ2
X/2)
 V[X] = exp(2µX+σ2
X){exp(σ2
X)-1}
 Example: Stock prices return distribution
2
2
2
)(log
1
2
1
)( X
Xe x
X
e
x
xf σ
µ
σπ
−
−
=
Log-normal distribution
Log-normal distribution
0
0.002
0.004
0.006
0.008
0.01
0.012
0.5 3 5.5 8 10.5 13 15.5 18 20.5 23 25.5 28 30.5 33 35.5 38
x
f(x)
Relationship between Poisson and
Exponential distribution
If a process has the intervals
between successive events as
independent and identical and it is
exponentially distributed then the
number of events in a specified
time interval will be a Poisson
distribution
Normal distribution results
Normal distribution
0
0.05
0.1
0.15
0.2
0.25
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
x
f(x)
a b
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
543210-1-2-3-4-5
0.4
0.3
0.2
0.1
0.0
Z
f(z)
Standard Normal Distribution
1.56
{
Standard Normal Probabilities
Look in row labeled 1.5 and
column labeled .06 to find
P(0 ≤ z ≤ 1.56) = .4406
Finding Probabilities using Standard Normal
Distribution: P(0 < Z < 1.56)
Standard Normal distribution
Z ~ N(0,1) given by the equation
The area within an interval (a,b) is given by
which is not integrable
algebraically. The Taylor’s expansion of the above assists in
speeding up the calculation, which is
2
2
2
1
)(
z
ezf
−
=
π
dzebZaF
b
a
z
∫
−
=≤≤ 2
2
2
1
)(
π
∑
∞
=
+
+
−
+=≤
0
12
!2)12(
)1(
2
1
2
1
)(
k
k
kk
kk
z
zZF
π
Cumulative distribution function
(cdf) or the distribution function
We denote the distribution function by F(x)
∑
≤
=≤=
xx
i
i
xfxXPxF )()()(
∫∫
∞−∞−
==≤=
xx
xdFdxxfxXPxF )()()()(
Properties of distribution function
1) F(x) is non-decreasing in x, i.e., if x1 ≤
x2, then F(x1) ≤ F(x2)
2) Lt F(x) = 0 as x → - ∞
3) Lt F(x) = 1 as x → + ∞
4) F(x) is right continuous
Standard normal distribution
Putting Z=(X-µX)/σX in the normal distribution we
have the standard normal distribution
Where: µZ = 0 and σZ = 1
Remember
• F(x) = P(X ≤ x) = F(z) = F(Z ≤ z)
• f(z) = φ(z)
•
2
2
2
1
)(
z
ezf
−
=
π
)()()()()( zxdFdxzfzZPzF
zz
Φ===≤= ∫∫
∞−∞−
Standard normal distribution
Standard Normal distribution
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
z
f(z)
Exponential distribution
[ X ~ E (a, θ)]
a < x < ∞
 a and θ are the parameters where a ∈ R
and θ > 0
 E[X] = a + θ
 V[X] = θ2
 `
Example: The life distribution of the
number of hours a electric bulb survives.
θ
θ
)(
1
)(
ax
exf
−
−
=
Exponential distribution
Exponential distribution
0
0.05
0.1
0.15
0.2
0.25
0.3
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
x
f(x)
Normal CDF Plot
Normal F(x)
0
0.2
0.4
0.6
0.8
1
1.2
1.83
2.46
2.74
2.99
3.12
3.25
3.36
3.45
3.51
3.56
3.74
3.8
3.9
3.98
4.08
4.28
4.48
4.78
5.16
X
NormalF(x)
Exponential CDF Plot
Exponential F(x)
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
1.83
2.46
2.74
2.99
3.12
3.25
3.36
3.45
3.51
3.56
3.74
3.8
3.9
3.98
4.08
4.28
4.48
4.78
5.16
X
F(X)
Arrival time problem # 1
In a factory shop floor for a certain CNC machine (machine
marked # 1) the number of jobs arriving per unit time are given
below
# of Arrivals Frequency
0 2
1 4
2 4
3 1
4 2
5 1
6 4
7 6
8 4
9 1
Arrival time problem # 1
Frequency distribution of arrivals
0
1
2
3
4
5
6
7
0 1 2 3 4 5 6 7 8 9
# of Arrivals
Frequency
Arrival time problem # 1
Relative Frequency of Number of Arrivals
0
0.05
0.1
0.15
0.2
0.25
0
1
2
3
4
5
6
7
8
9
# of Arrivals
RelativeFrequency
Arrival time problem # 1
Cumulative Relative Frequency of Number of Arrivals
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
1
2
3
4
5
6
7
8
9
10
# of Arrivals
CumulativeRelativeFrequency
Arrival time problem # 1
1) The probability of number of arrivals of
jobs being equal to or more than 7 is
about 0.18.
2) The average number of arrival of jobs is
5.
3) The probability of number of arrivals of
jobs being less than or equal to 4 is
about 0.45.
Covariance
SAT problem
This data set [ ] includes eight variables:
1) STATE: Name of state
2) COST: Current expenditure per pupil (measured in thousands of
dollars per averagedaily attendance in public elementary and
secondary schools)
3) RATIO: Average pupil/teacher ratio in public elementary and
secondary schools during Fall 1994
4) SALARY: Estimated average annual salary of teachers in public
elementary and secondary schools during 1994-95 (in thousands of
dollars)
5) PERCENT: percentage of all eligible students taking the SAT in
1994-95
6) VERBAL: Average verbal SAT score in 1994-95
7) MATH: Average math SAT score in 1994-95
8) TOTAL: Average total score on the SAT in 1994-95
SAT problem
Histogram for Cost
0
2
4
6
8
10
12
St at e
COST
SAT problem
Histograms for Cost and Ratio
0
5
10
15
20
25
30
Alabama
Arkansas
Connecticut
Georgia
Illinois
Kansas
Maine
Michigan
Missouri
Nevada
NewMexico
NorthDakota
Oregon
SouthCarolina
Texas
Virginia
Wisconsin
State
CostandRatio
COST RATIO
SAT problem
Histogramof Cost, Ratio and Salary
0
10
20
30
40
50
60
Alabama
Arizona
California
Connecticut
Florida
Hawaii
Illinois
Iowa
Kentucky
Maine
Massachusetts
Minnesota
Missouri
Nebraska
New
NewMexico
NorthCarolina
Ohio
Oregon
RhodeIslan
SouthDakota
Texas
Vermont
Washington
Wisconsin
State
Cost,RatioandSalary
COST RATIO SALARY
SAT problem
Average value is given by
Variance is given by
Covariance is given by
( ) ∑
=
=
n
i
iX
n
XE
1
1
( ) ( ){ }∑
=
−=
n
i
i XEX
n
XV
1
21
( ) ( ){ } ( ){ }[ ] ( ) ( )YVXVYEYXEXEYXCov YX ,, ρ=−−=
SAT problem
COST RATIO SALARY PERCENT VERBAL MATH TOTAL
Mean 5.90526 16.858 34.82892 35.24 457.14 508.78 965.92
Median 5.7675 16.6 33.2875 28 448 497.5 945.5
Maximum 9.774 24.3 50.045 81 516 592 1107
Minimum 3.656 13.8 25.994 4 401 443 844
Standard Deviation(1) 1.362807 2.266355 5.941265 26.76242 35.17595 40.20473 74.82056
Standard Deviation(2) 1.34911 2.243577 5.881552 26.49344 34.82241 39.80065 74.06857
SAT problem
COST RATIO SALARY PERCENT VERBAL MATH TOTAL
COST 1.820097 -1.12303 6.901753 21.18202 -19.2638 -18.7619 -38.0258
RATIO -1.12303 5.033636 -0.01512 -12.6639 4.98188 8.52076 13.50264
SALARY 6.901753 -0.01512 34.59266 96.10822 -97.6868 -93.9432 -191.63
PERCENT 21.18202 -12.6639 96.10822 701.9024 -824.094 -916.727 -1740.82
VERBAL -19.2638 4.98188 -97.6868 -824.094 1212.6 1344.731 2557.331
MATHS -18.7619 8.52076 -93.9432 -916.727 1344.731 1584.092 2928.822
TOTAL -38.0258 13.50264 -191.63 -1740.82 2557.331 2928.822 5486.154
SAT problem
COST RATIO SALARY PERCENT VERBAL MATH TOTAL
COST 1 -0.37103 0.869802 0.592627 -0.41005 -0.34941 -0.38054
RATIO -0.37103 1 -0.00115 -0.21305 0.063767 0.095422 0.081254
SALARY 0.869802 -0.00115 1 0.61678 -0.47696 -0.40131 -0.43988
PERCENT 0.592627 -0.21305 0.61678 1 -0.89326 -0.86938 -0.88712
VERBAL -0.41005 0.063767 -0.47696 -0.89326 1 0.970256 0.991503
MATHS -0.34941 0.095422 -0.40131 -0.86938 0.970256 1 0.993502
TOTAL -0.38054 0.081254 -0.43988 -0.88712 0.991503 0.993502 1
Inference
1) Point estimation
2) Interval estimation
3) Hypothesis testing
Sampling
• Population: N
–Population distribution
–Parameter (θ)
• Sample: n
–Sampling distribution
–Statistic (tn)
Types of sampling
• Probability Sampling
– Simple Random Sampling
– Stratified Random Sampling
– Cluster Sampling
– Multistage Sampling
– Systematic Sampling
• Judgement Sampling
– Quota Sampling
– Purposive Sampling
Simple Random Sampling
A simple random sampling of size (n)
from a finite population (N) is a
sample selected such that each
possible sample of size n has he same
probability of being selected. This
would be akin to SRSWOR
Simple Random Sampling
A simple random sampling of size (n) from
an infinite population (N) is a sample
selected such that the following conditions
hold
 Each element selected comes from the
population.
 Each element is selected independently
This would be akin to SRSWR
Some Special Distribution
Used
in Inference
Chi-square distribution
Suppose Z1, Z2,….., Zn are ′n′
independent observations from
N(0, 1), then
Z2
1 + Z2
2 + Z2
3 +….. + Z2
n ~ χ2
n
Chi-square distribution
0 ≤ x <∞
 n is the parameter (degree of freedom) where n
∈ Z+
 E[X] = n
 V[X] = 2n
)
2
(1)
2
(
22)
2
(
1
)(
xn
n
ex
n
xf
−−
Γ
=
]~[ 2
nX χ
Chi-square distribution
t-distribution
Suppose Z ~ N(0, 1), Y ~ χ2
n and they are
independent, then
nt
n
Y
Z
~
t-distribution
[X ~ tn]
• n is the parameter where k ∈ Z+
• E[X] = 0 (n > 1)
• V[X] = n/(n – 2), (n > 2)
]1[
)
2
(
]
2
1
[
)(
2
n
x
n
n
n
xf +
Γ
+
Γ
=
π
t-distribution
F-distribution
Suppose X ~ χ2
n, Y ~ χ2
m and they are
independent, then
mnF
m
Y
n
X
,~
F-distribution
[X ~ Fn,m]
0 < x < ∞
 n, m are the parameter (degrees of freedom)
where n, m ∈ Z+
 E[X] = m/(m - 2), (m > 2)
 V[X] = 2m2
(n + m -2)/[n(m – 2)2
(m – 4)], (m > 4)
2
1)
2
(
22
))(
2
()
2
(
)
2
(
)(
mn
nmn
mnx
mn
x
mn
mn
xf
+
−
+ΓΓ
+
Γ
=
F-distribution
Some results
If X1, X2,….., Xn are ′n′ observations from
X ~ N(µX, σ2
X) and
then
n
n X
n
XXX
=
++ .....21
)1,0(~ N
n
X
X
Xn
σ
µ−
Some results
If and
then
2
12
2
,
~
)1(
−
−
n
X
XnSn
χ
σ
∑
=
−
−
=
n
j
njXn XX
n
S
1
22
, }{
)1(
1
∑
=
−=
n
j
XjXn X
n
S
1
22/
, }{
1
µ
2
2
2/
,
~ n
X
XnnS
χ
σ
1
,
~
)(
−
−
n
Xn
Xn t
S
Xn µ
Some results
If X1, X2,….., Xn are ′mX′ observations from X ~
N(µX, σ2
X) and Y1, Y2,….., Ym are ′mY′ observations
from Y ~ N(µY, σ2
Y) and more over these samples
are independent then
1,12
2
2
2
2
2
2
2
~))((
)1(
)1(
)1(
)1(
−−=
−
−
−
−
YX mm
Y
X
X
Y
Y
Y
YY
X
X
XX
F
S
S
m
Sm
m
Sm
σ
σ
σ
σ
Estimation
Estimators and their properties
Estimator: Any statistic (a random
function) which is used to estimate the
population parameter
 Unbiasedness
Eθ (tn) = θ
 Consistency
P[|tn - θ| < ε] = 1 as n → ∞
Estimators (Discrete distribution)
1) X ~ UD (a , b) then and
2) X ~ P (λ), then
3) X ~ B (n, p), then
),...,min(ˆ 1 nXXa =
),...,max(ˆ
1 nXXb =
nX=λˆ
n
favouring
p
#
ˆ =
Estimators (Continuous distribution)
1) X ~ N(µ, σ2
), then
2) X ~ N(µ, σ2
) and if µ is known then
3) X ~ N(µ, σ2
) and if µ is unknown then
4) X ~ E(θ), then
∑
=
−=
n
i
iX
n 1
22
}{
1
ˆ µσ
nX=µˆ
nX=θˆ
∑
=
−
−
=
n
i
ni XX
n 1
22
}{
)1(
1
ˆσ
Examples (Estimation)
Number of jobs arriving in a unit time for a CNC
machine Consider we choose from a discrete
distribution whose population distribution [X ~
UD (20, 35)] is not know. We select the values
from the distribution and the numbers sampled
are 22, 34, 33, 21, 29, 29, 30. Then the best
estimate for a, i.e., and the best estimate
for b, i.e.,
21ˆ =a
34ˆ =b
Examples (Estimation)
You are testing the components coming
out of the shop floor and find that 9 out of
30 components fail. Then the estimated
value of p (proportion of bad items in the
population) = 9/30.
Examples (Estimation)
At a particular teller machine in one of the
bank branches it was found that the
number of customers arriving in an unit
time span were 4, 6, 7, 4, 3, 5, 6, and 5.
Then for this Poisson process the
estimated value of λ is 5.
Examples (Estimation)
Suppose it is known that the survival time
of a particular type of bulb has the
exponential distribution. You test 10 such
bulbs and find their respective survival
times as 150, 225, 275, 300, 95, 155, 325,
75, 20 and 400 hours respectively. Then
the estimated value of θ = 202.
Prediction
Multiple Linear Regression
Given ′k′ independent variables X1, X2,….., Xk
and one dependent variable Y we predict the
value of Y given by or y using the values of
Xi ′s. We need ′n′ ( n ≥ k+1) data points and the
multiple linear regression (MLR) equation is as
follows:
Yj = β1x1,j + β2x2,j +…..+ βkxk,j + εj
∀ j = 1, 2,….., n
Yˆ
Multiple Linear Regression
Note
 There is no randomness in measuring Xi
 The relationship is linear and not non-
linear. By non-linear we mean that at least
one derivative of Y wrt βi′s is a function of
at least one of the parameters. By
parameters we mean the βi′s.
Multiple Linear Regression
Assumptions for the MLR
 Xi, Y are normally distributed
 Xi are all non-stochastic
 εj ~ N(0,σ2
I)
 rank(X) = k
 n ≥ k
 No dependence between the Xj′s, i.e., the rank
of the matrix X is
 E(εjεl) = 0 ∀ i, j = 1, 2,….., n
 Cov(Xi,εj) = 0 ∀ i ≠ j, i, j = 1, 2,….., n
Multiple Linear Regression
 Find β1, β2,….., βk using the concept of
minimizing the sum of square of errors. This
is also known as least square method or
method of ordinary least square. The
estimates found are the
estimates of β1, β2,….,βk respectively.
 Utilize these estimates to find the forecasted
value of Y (i.e., or y) and compare those
with actual values of Y obtained in future.
kβββ ˆ,.....,ˆ,ˆ
21
Yˆ
To check for normality of data
We need to check for the normality of Xi′s and Y
1) List the observation number in the column # 1,call it i.
2) List the data in column # 2.
3) Sort the data from the smallest to the largest and place in
column # 3.
4) For each ith
of the n observations, calculate the
corresponding tail area of the standard normal distribution
(Z) as follows, A = (i – 0.375)/(n + 0.25). Put the values in
column # 4.
5) Use NORMSINV(A) function in MS-EXCEL to produce a
column of normal scores. Put these values in column # 5.
6) Make a copy of the sorted data (be sure to use paste
special and paste only the values) in column # 6.
7) Make a scatter plot of the data in columns # 5 and # 6.
To check for normality of data
Checking normality of data
0
50
100
150
200
250
300
350
400
450
-2.5 -1.6 -1.2 -1.0 -0.8 -0.6 -0.5 -0.3 -0.2 -0.1 0.1 0.2 0.3 0.5 0.6 0.8 1.0 1.2 1.6 3.0
Normal Score
Data
Basic transformation for MLR
Some transformation to convert to MLR
 X → X(p)
= {Xp
– 1}/p, as p → 0 then X(p)
=
logeX
 If the variability in Y increases with
increasing values of Y, then we use
logeY = β1logeX1 + β2logeX2 +….. + βklogeXk +
ε
Simple linear regression
In the simple linear regression we have
Yj = α + βXj + εj ∀ j = 1,2,…..,n
The question is how do we find α and β, provided we have ′n′
number of observations which constitutes the sample.
We minimize the sum of square of the error wrt to α and β
Finally: )(ˆˆ)( XEYE βα+=
∑
=
+−=∆
n
j
jj XY
1
2
)}ˆˆ({ βα
})()({ˆ)(ˆ)()(),cov( 2
XEXVXEYEXEYX ++=+ βα
Simple linear regression
After we have found out the estimators of α
and β, we use these values to predict/forecast
the subsequent future values of Y, i.e., we find
out y and compare those y′s with the
corresponding values of Y. Thus we find
and compare them with
corresponding values of Yk , for k = n+1, n+2,
…….
kk Xy βα ˆˆ +=
Simple linear regression
-0.2
0.0
0.2
0.4
0.6
0.8
1.0
-6.69 -6.39 -6.07 -6.68 -6.38 -6.07 -5.75 -5.13 -4.81 -4.10 -3.66 -3.26
X
Y,Y-hat,error
Y Y-hat error
Non-linear regression
• y = (β + γX)/(1 + αX)
• y = α(X - β)γ
• y = α - βloge(X + γ)
• y = α - βloge(X + γ)
• y = α[1 – exp(-βX)]γ
NOTE: For all these and other models we
minimize the sum of squares and find the
parameters α, β and γ.
Simple Moving Average
3MA = 3 month Moving Average
5MA = 5 month Moving Average
Month Actual 3MA 5MA
Jan 266.00
Feb 145.90 198.33
Mar 183.10 149.43 178.92
Apr 119.30 160.90 159.42
May 180.30 156.03 176.60
Jun 168.50 193.53 184.88
Jul 231.80 208.27 199.58
Aug 224.50 216.37 188.10
Sep 192.80 180.07 221.70
Oct 122.90 217.40 212.52
Nov 336.50 215.10 206.48
Dec 185.90 238.90 197.82
Jan 194.30 176.57 215.26
Feb 149.50 184.63 202.62
Mar 210.10 210.97 203.72
Apr 273.30 224.93 222.26
May 191.40 250.57 237.56
Jun 287.00 234.80 256.26
Month Actual 3MA 5MA
Jul 226.00 272.20 259.58
Aug 303.60 273.17 305.62
Sep 289.90 338.37 301.12
Oct 421.60 325.33 324.38
Nov 264.50 342.80 331.60
Dec 342.30 315.50 361.70
Jan 339.7 374.13 340.56
Feb 440.4 365.33 375.52
Mar 315.9 398.53 387.32
Apr 439.3 385.50 406.86
May 401.3 426.00 433.88
Jun 437.4 471.40 452.22
Jul 575.5 473.50 500.76
Aug 407.6 555.03 515.56
Sep 682 521.63 544.34
Oct 475.3 579.53 558.62
Nov 581.3 567.83
Dec 646.9
Simple Moving Averages
Actual, 3MA, 5MA
Simple Moving Averages (Yellow is actual; Blue is 3MA; Red is 5MA)
0
100
200
300
400
500
600
700
800
Jan
M
ar
M
ay
Jul
Sep
N
ov
Jan
M
ar
M
ay
Jul
Sep
N
ov
Jan
M
ar
M
ay
Jul
Sep
N
ov
Month
Value
Centred Moving Average
4MA(1) = 4 month Moving Average, 4MA(2) = 4 month Moving Average
2X4MA=Averages of 2MA(1) and 4MA(2)
Mth Actual 4MA(1) 4MA(2) 2X4MA
Jan 266.00
Feb 145.90
Mar 183.10 178.58 157.15 167.86
Apr 119.30 157.15 162.80 159.98
May 180.30 162.80 174.98 168.89
Jun 168.50 174.98 201.28 188.13
Jul 231.80 201.28 204.40 202.84
Aug 224.50 204.40 193.00 198.70
Sep 192.80 193.00 219.18 206.09
Oct 122.90 219.18 209.53 214.35
Nov 336.50 209.53 209.90 209.71
Dec 185.90 209.90 216.55 213.23
Jan 194.30 216.55 184.95 200.75
Feb 149.50 184.95 206.80 195.88
Mar 210.10 206.80 206.08 206.44
Apr 273.30 206.08 240.45 223.26
May 191.40 240.45 244.43 242.44
Jun 287.00 244.43 252.00 248.21
Mth Actual 4MA(1) 4MA(2) 2X4MA
Jul 226.00 252.00 276.63 264.31
Aug 303.60 276.63 310.28 293.45
Sep 289.90 310.28 319.90 315.09
Oct 421.60 319.90 329.58 324.74
Nov 264.50 329.58 342.03 335.80
Dec 342.30 342.03 346.73 344.38
Jan 339.7 346.73 359.58 353.15
Feb 440.4 359.58 383.83 371.70
Mar 315.9 383.83 399.23 391.53
Apr 439.3 399.23 398.48 398.85
May 401.3 398.48 463.38 430.93
Jun 437.4 463.38 455.45 459.41
Jul 575.5 455.45 525.63 490.54
Aug 407.6 525.63 535.10 530.36
Sep 682 535.10 536.55 535.83
Oct 475.3 536.55 596.38 566.46
Nov 581.3 596.38
Dec 646.9
Centred Moving Average
Actual, 4MA(1), 4MA(2), 2X4MA
Centred Moving Average (Yellow is Actual; Red is 4MA(1); Blue is 4MA(2); Green is
2X4MA)
0
100
200
300
400
500
600
700
800
Jan
M
ar
M
ay
Jul
Sep
Nov
Jan
M
ar
M
ay
Jul
Sep
Nov
Jan
M
ar
M
ay
Jul
Sep
Nov
Month
Values
Centred Moving Average
Note:
 4MA(1)=(Y1+Y2+Y3+Y4)/4
 4MA(2)=(Y2+Y3+Y4+Y5)/4
 2X4MA=(Y1+2*Y2+2*Y3+2*Y4+Y5)/8
Similarly we can have 2X6MA, 2X8MA,
2X12MA etc.
Centred Moving Average
Note:
 3MA(1)=(Y1+Y2+Y3)/3
 3MA(2)=(Y2+Y3+Y4)/3
 3MA(3)=(Y3+Y4+Y5)/3
 3X3MA=(Y1+2*Y2+3*Y3+2*Y4+Y5)/9
Weighted Moving Averages
In general a weighted k-point moving
average can be written as
Note:
 The total of the weights is equal to 1
 Weights are symmetric, i.e., aj = a-j
∑
−=
+=
m
mj
jtjt YaT
Weighted Moving Averages
Steps are:
1) 4MA(1)=(Y1+Y2+Y3+Y4)/4
2) 4MA(2)=(Y2+Y3+Y4+Y5)/4
3) 4MA(3)=(Y3+Y4+Y5+Y6)/4
4) 4MA(4)=(Y4+Y5+Y6+Y7)/4
5) 4X4MA=(Y1+2*Y2+3*Y3+4*Y4+3*Y5+2*Y6+Y7)/16
6) 5X4X4MA = a-2*4X4MA(1) + a-1*4X4MA(2) +
a0*4X4MA(3) + a1*4X4MA(4) + a2*4X4MA(5)
where a-2 = -3/4, a-1 = 3/4, a0 = 1, a1 = 3/4, a2 = -3/4
Exponential Smoothing Methods
1) Single Exponential Smoothing (one
parameter, adaptive parameter)
2) Holt′s linear method (suitable for trends)
3) Holt-Winter′s method (suitable for trends
and seasonality)
Single Exponential Smoothing
The general equation is:
Ft+1 = Ft + α(Yt – Ft) = αYt + (1 - α)Ft,
Note:
 Error term: Et = Yt - Ft
 Forecast value: Ft
 Actual value: Yt
 Weight: α ∈ (0,1)
 α is such that sum of square of errors is minimized
Single Exponential Smoothing
Month Y(t) F(t, 0.1) F(t.0.5) F(t,0.9)
Jan 200.0 200.0 200.0 200.0
Feb 135.0 200.0 200.0 200.0
Mar 195.0 193.5 167.5 141.5
Apr 197.5 193.7 181.3 189.7
May 310.0 194.0 189.4 196.7
Jun 175.0 205.6 249.7 298.7
Jul 155.0 202.6 212.3 187.4
Aug 130.0 197.8 183.7 158.2
Sep 220.0 191.0 156.8 132.8
Oct 277.5 193.9 188.4 211.3
Nov 235.0 202.3 233.0 270.9
Dec ------- 205.6 234.0 238.6
Single Exponential Smoothing
Single Exponential Smoothing
0
50
100
150
200
250
300
350
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Month
Values
Y(t) F(t,0.1) F(t,0.5) F(t,0.9)
Extension of Exponential Smoothing
The general equation is:
Ft+1 = α1Yt + α2Ft + α3Ft-1
Note:
 Error term: Et = Yt – Ft
 Forecast value: Ft
 Actual value: Yt
 Weights: αi ∈ (0,1) ∀ i = 1, 2 and 3
 α1 + α2 + α3 = 1
 αi′s are such that the sum of square of errors is
minimized
Extension of Exponential Smoothing
Extension of Exponential Smoothing
0
50
100
150
200
250
300
350
400
450
Jan Jul Jan Jul Jan Jul Jan Jul Jan Jul Jan Jul Jan Jul Jan Jul
Month
Values
Y(t) F(t)
Adaptive Exponential Smoothing
The general equation is:
Ft+1 = αtYt + (1 - αt)Ft
Note:
 Error term: Et = Yt – Ft
 Forecast value: Ft
 Actual value: Yt
 Smoothed Error: At = βEt + (1 - β)At-1
 Absolute Smoothed Error: Mt = β|Et| + (1 - β)Mt-1
 Weight: αt+1 = |At/Mt|
 α and β are such that sum of square of errors is
minimized
Adaptive Exponential Smoothing
Starting values:
 F2 = Y1
 α2 = β = 0.2
 A1 = M1 = 0
Adaptive Exponential Smoothing
Month Y(t) F(t) E(t) A(t) M(t) α β
Jan 200.0 0.0 0.0 0.2
Feb 135.0 200.0 -65.0 -13.0 13.0 0.2
Mar 195.0 187.0 8.0 -8.8 12.0 1.0
Apr 197.5 188.6 8.9 -5.3 11.4 0.7
May 310.0 190.4 119.6 19.7 33.0 0.5
Jun 175.0 214.3 -39.3 7.9 34.3 0.6
Jul 155.0 206.4 -51.4 -4.0 37.7 0.2
Aug 130.0 196.2 -66.2 -16.4 43.4 0.1
Sep 220.0 182.9 37.1 -5.7 42.1 0.4
Oct 277.5 190.3 87.2 12.9 51.1 0.1
Nov 235.0 207.8 27.2 15.7 46.4 0.3
Dec 213.2 0.3
Adaptive Exponential Smoothing
Adaptive Exponential Smoothing
0
50
100
150
200
250
300
350
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Month
Values
Y(t) F(t)
Adaptive Exponential Smoothing
Adaptive Exponential Smoothing
-100.00
-50.00
0.00
50.00
100.00
150.00
200.00
250.00
300.00
350.00
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Time
Values
A(t) M(t) E(t) F(t) Y(t)
Holt-Winter′s Method
The general equations are:
1) Lt = αYt/St-s + (1-α)(Lt-1 + bt-1)
2) bt = β(Lt – Lt-1) + (1 - β)bt-1
3) St = γYt/Lt + (1 - γ )St-s for t > s
4) Ft+m = (Lt + btm)St-s+m
5) Si = Yi/Ls where Ls = (Y1+…+Ys)/s for i ≤ s
Note:
 Forecast value: Ft
 Actual value: Yt
 Trend: bt
 Seasonal component: St
 Length of seasonality: s
 α, β and γ are chosen such that the sum of square of errors is minimized

More Related Content

What's hot

Sqqs1013 ch2-a122
Sqqs1013 ch2-a122Sqqs1013 ch2-a122
Sqqs1013 ch2-a122kim rae KI
 
Week1 GM533 Slides
Week1 GM533 SlidesWeek1 GM533 Slides
Week1 GM533 SlidesBrent Heard
 
Applied Business Statistics ,ken black , ch 3 part 1
Applied Business Statistics ,ken black , ch 3 part 1Applied Business Statistics ,ken black , ch 3 part 1
Applied Business Statistics ,ken black , ch 3 part 1AbdelmonsifFadl
 
Data Distribution &The Probability Distributions
Data Distribution &The Probability DistributionsData Distribution &The Probability Distributions
Data Distribution &The Probability Distributionsmahaaltememe
 
3.3 Measures of relative standing and boxplots
3.3 Measures of relative standing and boxplots3.3 Measures of relative standing and boxplots
3.3 Measures of relative standing and boxplotsLong Beach City College
 
Statistical inference: Probability and Distribution
Statistical inference: Probability and DistributionStatistical inference: Probability and Distribution
Statistical inference: Probability and DistributionEugene Yan Ziyou
 
Applied Business Statistics ch1
Applied Business Statistics ch1Applied Business Statistics ch1
Applied Business Statistics ch1AbdelmonsifFadl
 
Statistics Math project class 10th
Statistics Math project class 10thStatistics Math project class 10th
Statistics Math project class 10thRiya Singh
 
Chap02 describing data; numerical
Chap02 describing data; numericalChap02 describing data; numerical
Chap02 describing data; numericalJudianto Nugroho
 
Density Curves and Normal Distributions
Density Curves and Normal DistributionsDensity Curves and Normal Distributions
Density Curves and Normal Distributionsnszakir
 
analytical representation of data
 analytical representation of data analytical representation of data
analytical representation of dataUnsa Shakir
 
Applied Business Statistics ,ken black , ch 2
Applied Business Statistics ,ken black , ch 2Applied Business Statistics ,ken black , ch 2
Applied Business Statistics ,ken black , ch 2AbdelmonsifFadl
 

What's hot (20)

Sqqs1013 ch2-a122
Sqqs1013 ch2-a122Sqqs1013 ch2-a122
Sqqs1013 ch2-a122
 
Week1 GM533 Slides
Week1 GM533 SlidesWeek1 GM533 Slides
Week1 GM533 Slides
 
Business statistics
Business statisticsBusiness statistics
Business statistics
 
Applied Business Statistics ,ken black , ch 3 part 1
Applied Business Statistics ,ken black , ch 3 part 1Applied Business Statistics ,ken black , ch 3 part 1
Applied Business Statistics ,ken black , ch 3 part 1
 
Descriptive statistics and graphs
Descriptive statistics and graphsDescriptive statistics and graphs
Descriptive statistics and graphs
 
Data Distribution &The Probability Distributions
Data Distribution &The Probability DistributionsData Distribution &The Probability Distributions
Data Distribution &The Probability Distributions
 
STATISTICS
STATISTICSSTATISTICS
STATISTICS
 
3.3 Measures of relative standing and boxplots
3.3 Measures of relative standing and boxplots3.3 Measures of relative standing and boxplots
3.3 Measures of relative standing and boxplots
 
Measures of Variation
Measures of Variation Measures of Variation
Measures of Variation
 
3.2 Measures of variation
3.2 Measures of variation3.2 Measures of variation
3.2 Measures of variation
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Statistical inference: Probability and Distribution
Statistical inference: Probability and DistributionStatistical inference: Probability and Distribution
Statistical inference: Probability and Distribution
 
Applied Business Statistics ch1
Applied Business Statistics ch1Applied Business Statistics ch1
Applied Business Statistics ch1
 
Statistics Math project class 10th
Statistics Math project class 10thStatistics Math project class 10th
Statistics Math project class 10th
 
Chap02 describing data; numerical
Chap02 describing data; numericalChap02 describing data; numerical
Chap02 describing data; numerical
 
S1 pn
S1 pnS1 pn
S1 pn
 
Density Curves and Normal Distributions
Density Curves and Normal DistributionsDensity Curves and Normal Distributions
Density Curves and Normal Distributions
 
statistics
statisticsstatistics
statistics
 
analytical representation of data
 analytical representation of data analytical representation of data
analytical representation of data
 
Applied Business Statistics ,ken black , ch 2
Applied Business Statistics ,ken black , ch 2Applied Business Statistics ,ken black , ch 2
Applied Business Statistics ,ken black , ch 2
 

Viewers also liked

Istat Business Architecture (Conference of European Statistics Stakeholders ...
Istat Business Architecture  (Conference of European Statistics Stakeholders ...Istat Business Architecture  (Conference of European Statistics Stakeholders ...
Istat Business Architecture (Conference of European Statistics Stakeholders ...gbarcaroli
 
Debbiemistikaweni 1412510982 tugas5
Debbiemistikaweni 1412510982 tugas5Debbiemistikaweni 1412510982 tugas5
Debbiemistikaweni 1412510982 tugas5debbie95
 
Business architecture in perspective v2.1
Business architecture in perspective v2.1Business architecture in perspective v2.1
Business architecture in perspective v2.1John Bernhard
 
NSSO Contribution to Agriculture Statistics
NSSO Contribution to Agriculture StatisticsNSSO Contribution to Agriculture Statistics
NSSO Contribution to Agriculture StatisticsCSISA
 
sumandro_anatomy_of_nsso_data_opendatacamp_20120324
sumandro_anatomy_of_nsso_data_opendatacamp_20120324sumandro_anatomy_of_nsso_data_opendatacamp_20120324
sumandro_anatomy_of_nsso_data_opendatacamp_20120324Sumandro C
 
sumandro_the_art_of_nsso_data_5thelephant_20120728
sumandro_the_art_of_nsso_data_5thelephant_20120728sumandro_the_art_of_nsso_data_5thelephant_20120728
sumandro_the_art_of_nsso_data_5thelephant_20120728Sumandro C
 
Organizing aspects of sample survey
Organizing aspects of sample surveyOrganizing aspects of sample survey
Organizing aspects of sample surveyPartnered Health
 
National Statistics Office: OJT presentation
National Statistics Office: OJT presentationNational Statistics Office: OJT presentation
National Statistics Office: OJT presentationardj 777
 
Multiple Choice:.doc
Multiple Choice:.docMultiple Choice:.doc
Multiple Choice:.docbutest
 
NCFM - Normal distribution table options trading (advanced) module (NSE)
NCFM - Normal distribution table   options trading (advanced) module (NSE)NCFM - Normal distribution table   options trading (advanced) module (NSE)
NCFM - Normal distribution table options trading (advanced) module (NSE)Nimesh Parekh
 
Tes civil p ii deg
Tes civil p ii degTes civil p ii deg
Tes civil p ii degpra_sitta
 
Assistant statistical officer
Assistant statistical officerAssistant statistical officer
Assistant statistical officerpra_sitta
 
NUMERICAL METHODS MULTIPLE CHOICE QUESTIONS
NUMERICAL METHODS MULTIPLE CHOICE QUESTIONSNUMERICAL METHODS MULTIPLE CHOICE QUESTIONS
NUMERICAL METHODS MULTIPLE CHOICE QUESTIONSnaveen kumar
 
National income & related concepts
National income & related conceptsNational income & related concepts
National income & related conceptsdomsr
 
National income of india
National income of indiaNational income of india
National income of indiaVinay Dawar
 

Viewers also liked (20)

CSPA_june 2015 lalor
CSPA_june 2015 lalorCSPA_june 2015 lalor
CSPA_june 2015 lalor
 
Modernising official statistics
Modernising official statisticsModernising official statistics
Modernising official statistics
 
GSBPM-GAMSO_july 2015
GSBPM-GAMSO_july 2015GSBPM-GAMSO_july 2015
GSBPM-GAMSO_july 2015
 
Istat Business Architecture (Conference of European Statistics Stakeholders ...
Istat Business Architecture  (Conference of European Statistics Stakeholders ...Istat Business Architecture  (Conference of European Statistics Stakeholders ...
Istat Business Architecture (Conference of European Statistics Stakeholders ...
 
Debbiemistikaweni 1412510982 tugas5
Debbiemistikaweni 1412510982 tugas5Debbiemistikaweni 1412510982 tugas5
Debbiemistikaweni 1412510982 tugas5
 
Business architecture in perspective v2.1
Business architecture in perspective v2.1Business architecture in perspective v2.1
Business architecture in perspective v2.1
 
NSSO Contribution to Agriculture Statistics
NSSO Contribution to Agriculture StatisticsNSSO Contribution to Agriculture Statistics
NSSO Contribution to Agriculture Statistics
 
Probability
ProbabilityProbability
Probability
 
sumandro_anatomy_of_nsso_data_opendatacamp_20120324
sumandro_anatomy_of_nsso_data_opendatacamp_20120324sumandro_anatomy_of_nsso_data_opendatacamp_20120324
sumandro_anatomy_of_nsso_data_opendatacamp_20120324
 
sumandro_the_art_of_nsso_data_5thelephant_20120728
sumandro_the_art_of_nsso_data_5thelephant_20120728sumandro_the_art_of_nsso_data_5thelephant_20120728
sumandro_the_art_of_nsso_data_5thelephant_20120728
 
Organizing aspects of sample survey
Organizing aspects of sample surveyOrganizing aspects of sample survey
Organizing aspects of sample survey
 
National Statistics Office: OJT presentation
National Statistics Office: OJT presentationNational Statistics Office: OJT presentation
National Statistics Office: OJT presentation
 
Multiple Choice:.doc
Multiple Choice:.docMultiple Choice:.doc
Multiple Choice:.doc
 
NCFM - Normal distribution table options trading (advanced) module (NSE)
NCFM - Normal distribution table   options trading (advanced) module (NSE)NCFM - Normal distribution table   options trading (advanced) module (NSE)
NCFM - Normal distribution table options trading (advanced) module (NSE)
 
Tes civil p ii deg
Tes civil p ii degTes civil p ii deg
Tes civil p ii deg
 
Assistant statistical officer
Assistant statistical officerAssistant statistical officer
Assistant statistical officer
 
NUMERICAL METHODS MULTIPLE CHOICE QUESTIONS
NUMERICAL METHODS MULTIPLE CHOICE QUESTIONSNUMERICAL METHODS MULTIPLE CHOICE QUESTIONS
NUMERICAL METHODS MULTIPLE CHOICE QUESTIONS
 
National income & related concepts
National income & related conceptsNational income & related concepts
National income & related concepts
 
Health financing in india
Health financing in indiaHealth financing in india
Health financing in india
 
National income of india
National income of indiaNational income of india
National income of india
 

Similar to Statistical methods

Basics of Stats (2).pptx
Basics of Stats (2).pptxBasics of Stats (2).pptx
Basics of Stats (2).pptxmadihamaqbool6
 
Engineering Statistics
Engineering Statistics Engineering Statistics
Engineering Statistics Bahzad5
 
Basics of statistics by Arup Nama Das
Basics of statistics by Arup Nama DasBasics of statistics by Arup Nama Das
Basics of statistics by Arup Nama DasArup8
 
ap_stat_1.3.ppt
ap_stat_1.3.pptap_stat_1.3.ppt
ap_stat_1.3.pptfghgjd
 
Statistics
StatisticsStatistics
Statisticsitutor
 
presentation
presentationpresentation
presentationPwalmiki
 
Student’s presentation
Student’s presentationStudent’s presentation
Student’s presentationPwalmiki
 
Production involves transformation of inputs such as capital, equipment, labo...
Production involves transformation of inputs such as capital, equipment, labo...Production involves transformation of inputs such as capital, equipment, labo...
Production involves transformation of inputs such as capital, equipment, labo...Krushna Ktk
 
Types of Statistics
Types of Statistics Types of Statistics
Types of Statistics Rupak Roy
 
SAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docxSAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docxanhlodge
 
Lec 5 statistical intervals
Lec 5 statistical intervalsLec 5 statistical intervals
Lec 5 statistical intervalscairo university
 
Topic 2 Measures of Central Tendency.pptx
Topic 2   Measures of Central Tendency.pptxTopic 2   Measures of Central Tendency.pptx
Topic 2 Measures of Central Tendency.pptxCallplanetsDeveloper
 
advanced_statistics.pdf
advanced_statistics.pdfadvanced_statistics.pdf
advanced_statistics.pdfGerryMakilan2
 
Basic statistics 1
Basic statistics  1Basic statistics  1
Basic statistics 1Kumar P
 

Similar to Statistical methods (20)

Basics of Stats (2).pptx
Basics of Stats (2).pptxBasics of Stats (2).pptx
Basics of Stats (2).pptx
 
Descriptive
DescriptiveDescriptive
Descriptive
 
Engineering Statistics
Engineering Statistics Engineering Statistics
Engineering Statistics
 
Basics of statistics by Arup Nama Das
Basics of statistics by Arup Nama DasBasics of statistics by Arup Nama Das
Basics of statistics by Arup Nama Das
 
ap_stat_1.3.ppt
ap_stat_1.3.pptap_stat_1.3.ppt
ap_stat_1.3.ppt
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
SP and R.pptx
SP and R.pptxSP and R.pptx
SP and R.pptx
 
DATA COLLECTION IN RESEARCH
DATA COLLECTION IN RESEARCHDATA COLLECTION IN RESEARCH
DATA COLLECTION IN RESEARCH
 
Statistics
StatisticsStatistics
Statistics
 
presentation
presentationpresentation
presentation
 
Student’s presentation
Student’s presentationStudent’s presentation
Student’s presentation
 
Production involves transformation of inputs such as capital, equipment, labo...
Production involves transformation of inputs such as capital, equipment, labo...Production involves transformation of inputs such as capital, equipment, labo...
Production involves transformation of inputs such as capital, equipment, labo...
 
Types of Statistics
Types of Statistics Types of Statistics
Types of Statistics
 
SAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docxSAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docx
 
Lec 5 statistical intervals
Lec 5 statistical intervalsLec 5 statistical intervals
Lec 5 statistical intervals
 
Topic 2 Measures of Central Tendency.pptx
Topic 2   Measures of Central Tendency.pptxTopic 2   Measures of Central Tendency.pptx
Topic 2 Measures of Central Tendency.pptx
 
advanced_statistics.pdf
advanced_statistics.pdfadvanced_statistics.pdf
advanced_statistics.pdf
 
Basic statistics 1
Basic statistics  1Basic statistics  1
Basic statistics 1
 
Statistics
StatisticsStatistics
Statistics
 
Day 3 descriptive statistics
Day 3  descriptive statisticsDay 3  descriptive statistics
Day 3 descriptive statistics
 

More from venkataramanan Thiru (20)

Final paper14
Final paper14Final paper14
Final paper14
 
9 box pp matrix
9 box pp matrix9 box pp matrix
9 box pp matrix
 
Idt amendments-29122014
Idt amendments-29122014Idt amendments-29122014
Idt amendments-29122014
 
Dt amendments-29122014
Dt amendments-29122014Dt amendments-29122014
Dt amendments-29122014
 
All inone financial mgt
All inone financial mgtAll inone financial mgt
All inone financial mgt
 
Factories act 1948
Factories act 1948Factories act 1948
Factories act 1948
 
Cost behaviour &marginal cost analysis
Cost behaviour &marginal cost analysisCost behaviour &marginal cost analysis
Cost behaviour &marginal cost analysis
 
Introducing financial analysis
Introducing financial analysisIntroducing financial analysis
Introducing financial analysis
 
Law sale of goods act
Law  sale of goods actLaw  sale of goods act
Law sale of goods act
 
Tax law mcq for cs executive
Tax law mcq for cs executiveTax law mcq for cs executive
Tax law mcq for cs executive
 
Indirect tax 2013
Indirect tax 2013Indirect tax 2013
Indirect tax 2013
 
Movers
MoversMovers
Movers
 
Operation management problems
Operation management problemsOperation management problems
Operation management problems
 
Quiz show cenvat
Quiz show cenvatQuiz show cenvat
Quiz show cenvat
 
Income tax-ppt-revised
Income tax-ppt-revisedIncome tax-ppt-revised
Income tax-ppt-revised
 
Contract act
Contract actContract act
Contract act
 
Indirect tax amendmdnds effective from 2012
Indirect tax amendmdnds effective from 2012Indirect tax amendmdnds effective from 2012
Indirect tax amendmdnds effective from 2012
 
Service tax –the salient features
Service tax –the salient featuresService tax –the salient features
Service tax –the salient features
 
Quiz show cenvat
Quiz show cenvatQuiz show cenvat
Quiz show cenvat
 
Indian Contract act
Indian Contract actIndian Contract act
Indian Contract act
 

Statistical methods

  • 1. 1 QUANTITATIVE METHODS I Ashok K Mittal Department of IME IIT Kanpur
  • 2. Statistics Q1 I want to invest Rs 1000 , in which company should I invest Growth Returns Risk
  • 3. Statistics • Q2 How do I know which Company will give me • High Return • Or • Growth • But Risk should be Low
  • 4. Statistics • Collect Information from the Past • Qualitative • Quantitative ( Data) • Analyze Information ( Data) to provide patterns of past performance ( Descriptive Statistics) • Project these patterns to answer your questions ( Inference)
  • 5. Types of data  Primary: You do a survey to find out the percentage of people living below the poverty line in Allahabad  Secondary: You are interested in studying the performance of banks and for that you study the RBI published documents
  • 6. Descriptive Statistics Presentation of data  Non frequency data  Frequency data
  • 7. Non frequency data Time series representation of data BSE(30) Close 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 3-Jan-94 5-Jan-94 7-Jan-94 11-Jan-94 13-Jan-94 17-Jan-94 19-Jan-94 24-Jan-94 27-Jan-94 31-Jan-94 2-Feb-94 4-Feb-94 8-Feb-94 10-Feb-94 14-Feb-94 16-Feb-94 18-Feb-94 22-Feb-94 24-Feb-94 28-Feb-94 1-Mar-94 3-Mar-94 7-Mar-94 9-Mar-94 15-Mar-94 17-Mar-94 21-Mar-94 23-Mar-94 25-Mar-94 29-Mar-94 31-Mar-94 Date BSE(30)Close
  • 8. Non frequency data Spatial series representation of data Fertiliser Consumption for few Indian states for 1999-2000 (in tonnes) 0 500000 1000000 1500000 2000000 2500000 3000000 3500000 A nd hra P radeshK arnata ka K eralaT am ilN adu G ujarat M adh ya P rad esh M aha rashtraR ajasthan H aryan a P un jab U ttar P radesh B ihar O rissa W e stB e ngal A ssam States Fertiliserconsumption(intonnes) Fertiliser Consumption
  • 9. Frequency data: Tabular representation India at a glance Year (% of GDP) 1983 1993 2002 2003 Agriculture 36.6 31.0 22.7 22.2 Industry 25.8 26.3 26.6 26.6 Mfg 16.3 16.1 15.6 15.8 Services 37.6 42.8 50.7 51.2 Pvt Consump 71.8 37.4 65.0 64.9 GOI consump 10.6 11.4 12.5 12.8 Import 8.1 10.0 15.6 16.0 Domes save 17.6 22.5 24.2 22.2 Interests paid 0.4 1.3 0.7 18.3 Note: 2003 refers to 2003-2004; data are preliminary. Gross domestic savings figures are taken directly from India′s central statistical organization.
  • 10. Frequency data Line diagram representation Fertiliser Consumption for few Indian states for 1999-2000 (in tonnes) 0 500000 1000000 1500000 2000000 2500000 3000000 3500000 Andhra PradeshKarnataka KeralaTam ilN adu G ujarat M adhya P radesh M aharashtraR ajasthan H aryana Punjab U ttarPradesh Bihar O rissa W estBengal Assam States Fertiliserconsumption(intonnes) Fertiliser Consumption
  • 11. Frequency data Bar diagram (histogram) representation World Population (projected mid 2004) 0 1000000000 2000000000 3000000000 4000000000 5000000000 6000000000 7000000000 1950 1960 1970 1980 1990 2000 Year Population World Population
  • 12. Frequency data Bar diagram (histogram) representation Height and Weight of individuals 0 20 40 60 80 100 120 140 160 180 200 Ram Shyam Rahim Praveen Saikat Govind Alan Individual Height/Weight Height (in cms) Weight (in kgs)
  • 13. Frequency data Pie diagram/chart representation Median marks in JMET (2003) Verbal Quantitative Analytical Data Interpresentation
  • 14. Frequency data Box plot representation The box plot is also called the box whisker plot. A box plot is a set of five summary measures of distribution of the data which are  median  lower quartile  upper quartile  smallest observation  largest observation.
  • 15. Frequency data Box plot representation Median LQ UQ Whisker
  • 16. Frequency data Box plot representation Here:  UQ – LQ = Inter quartile range (IQR)  X = Smallest observation within certain percentage of LQ  Y = Largest observation within certain percentage of UQ
  • 17. Important note A cumulative frequency diagram is called the ogive. The abscissa of the point of intersection represents the median of the data.
  • 20. Measurements of uncertainty Concept of uncertainty and different measures of uncertainty, Probability as a measure of uncertainty, Description of qualitative as well as quantitative probability, Assessment of probability, Concepts of Decision trees, Random Variables, Distributions, Expectations, Probability plots, etc.
  • 21. Definitions Quantitative variable: It can be described by a number for which arithmetic operations such as averaging make sense. Qualitative (or categorical) variable: It simply records a qualitative, e.g., good, bad, right, wrong, etc. We know statistics deals with measurements, some being qualitative others being quantitative. The measurements are the actual numerical values of a variable. Qualitative variables could be described by numbers, although such a description might be arbitrary, e.g., good = 1, bad = 0, right = 1, wrong = 0, etc.
  • 22. Scales of measurement Nominal scale: In this scale numbers are used simply as labels for groups or classes. If we are dealing with a data set which consists of colours blue, red, green and yellow, then we can designate blue = 3, red = 4, green = 5 and yellow = 6. We can state that the numbers stand for the category to which a data point belongs. It must be remembered that nothing is sacrosanct regarding the numbering against each category. This scale is used for qualitative data rather than quantitative data.
  • 23. Scales of measurement Ordinal Scale: In this scale of measurement, data elements may be ordered according to relative size or quality. For example a customer or a buyer can rank a particular characteristics of a car as good, average, bad and while doing so he/she can assign some numeric value which may be as follows, characteristic good = 10, average = 5 and bad = 0.
  • 24. Scales of measurement Interval Scale: For the interval scale we specify intervals in a way so as to note a particular characteristic, which we are measuring and assign that item or data point under a particular interval depending on the data point. Consider we are measuring the age of school going students between classes 5 to 12 in the city of Kanpur. We may form intervals 10-12 years, 12-14 years,....., 18- 20 years. Now when we have one data point, i.e., the age of a student we put that data under any one particular interval, e.g. if the student's age is 11 years, we immediately put that under the interval 10-12 years.
  • 25. Scales of measurement Ratio Scale: If two measurements are in ratio scale, then we can take ratios of measurements. The ratio scale represents the reading for each recorded data in a way which enables us to take a ratio of the readings in order to depict it either pictorially or in figures. Examples of ratio scale are measurements of weight, height, area, length etc.
  • 26. Definitions: different measures 1) Measure of central tendency  Mean (Arithmetic mean (AM), Geometric mean (GM), Harmonic mean (HM))  Median  Mode 1) Measure of dispersion  Variance or Standard deviation  Skewness  Kurtosis
  • 27. Definition: Mean Given N number of observations, x1, x2, ….., xN we define the following ].....[ 1 21 NXXX N AM +++= N NXXXGM 1 21 ]*.....**[= 1 21 )] 1 ..... 11 ( 1 [ − +++= NXXXN HM
  • 28. Definition: Median and Mode  Median(µe) : The median of a data set is the value below which lies half of the data points. To find the median we use F (µe) = 0.5.  Mode (µo): The mode of a data set is the value that occurs most frequently. Hence f(µo) ≥ f(x); ∀ x.
  • 29. Definition: Variance, Standard deviation, Skewness, Kurtosis  Variance: V[X] =  Standard deviation (SD) = σ  Skewness =  Kurtosis = ( ) 3 3 2 3 2 3 1 σ µ µ µ βγ === ( )[ ]22 XEXE −=σ       −=−= 33 4 4 22 σ µ βγ
  • 30. Example Consider we have the following data points: 5, 7, 10, 7, 10, 11, 3, 5, 5 For these data points we have µ = 7; µe = 10; µo = 5; σ2 = 6.89
  • 31. Descriptive statistics Suppose the data are available in the form of a frequency distribution. Assume there are k classes and the mid-points of the corresponding class intervals being x1, x2,…., xk. While the corresponding frequencies are f1, f2,….., fk, such that n = f1+f2+…..+fk Then: ∑ = = k i ii fx n 1 1 µ ∑ = −= k i ii fxx n 1 2 1 2 })( 1 {σ
  • 32. Consider m groups of observations with respective means µ1, µ2,….., µm and standard deviations σ1, σ2,….., σm. Let the group sizes be n1, n2,….., nm such that n = n1, n2,….., nm. Then: Descriptive statistics ∑ = = m i iiOVERALL f n 1 1 µµ 2 1 1 2 1 2 }])({ 1 [ ∑∑ == −+= m i OVERALLii m i iiOVERALL nn n µµσσ
  • 34. Random event Random experiment: Is an experiment whose outcome cannot be predicted with certainty.  Sample space (Ω) : The set of all possible outcomes of a random experiment  Sample point (ωi): The elements of the sample space  Event (A): Is a subset of the sample space such that it is a collection of sample point(s).
  • 35. Probability Probability (P(A)): Of an event is defined as a quantitative measure of uncertainty of the occurrence of the event  Objective probability: Based on game of chance and which can be mathematically proved or verified . If the experiment is the same for two different persons, then the value of objective probability would remain the same. It is the limiting definition of relative frequency. Example: be probability of getting the number 5 when we roll a fair die.  Subjective probability: Based on personal judgment, intuition and subjective criteria. Its value will change from person to person. Example one person sees the chance of India winning the test series with Australia high while the other person sees it to be low.
  • 36. Random event For a random experiment, we denote P(ωi) = pi Where:  P(ωi) = pi = Probability of occurrence of the sample point ωi  P(A) = Probability of occurrence of the event  ∑ ∈ = A i i pAP ϖ )( 1)( ==Ω ∑ Ω∈i ipP ϖ
  • 37. Example 1 Suppose there are two dice each with faces 1, 2,....., 6 and they are rolled simultaneously. This rolling of the two dice would constitute our random experiment Then we have:  Ω = {(1,1), (1,2), (1,3), (1,4), (1,5), (1,6), (2,1),.….., (5,6), (6,1), (6,2), (6,3), (6,4), (6,5), (6,6)}.  ωi = (1,1), (1,2),…., (6,5), (6,6)  We define the event is such that the outcomes for each die are equal in one simultaneous throw, then A = {(1, 1), (2, 2),….., (6, 6)}  P(ωi): p1 = p2 = ….. = p36 = 1/36  P(A) = p1 + p8 + p15 + p22 + p29 + p36 = 6/36
  • 38. Example 2 Suppose a coin is tossed repeatedly till the first head is obtained. Then we have:  Ω = {(H), (T,H), (T,T,H),………}  ωi = (H), (T,H), (T,T,H),…..  We define the event such that at most 3 tosses are needed to obtain the first head, then A = {(H), (T,H), (T,T,H)}  P(ωi): p1 = ½, p2 = (½)2 , p3 = (½)3 , p4 = (½)4 ,..…  P(A) = p1 + p2 + p3 = 7/8
  • 39. Example 3 In a club there 10 members of whom 5 are Asians and the rest are Americans. A committee of 3 members has to be formed and these members are to be chosen randomly. Find the probability that there will be at least 1 Asian and at least 1 American in the committee Total number of cases = 10 C2 and the number of cases favouring the formation of the committee is 5 C2*5 C1 + 5 C1*5 C2 Hence P(A) = 100/120
  • 40. Example 4 Suppose we continue with example 2 which we have just discussed and we define the event B , that al least 5 tosses are needed to produce the first head  Ω = {(H), (T,H), (T,T,H),………}  ωi = (H), (T,H), (T,T,H),…..  P(ωi): p1 = ½, p2 = (½)2 , p3 = (½)3 , p4 = (½)4 ,..…  P(B) = p5+p6+p7+ ….. = 1 – (p1+p2+p3+p4)
  • 41. Theorem in probability For any event A, B ∈ Ω  0 ≤ P(A) ≤ 1  If A ⊂ B, then P(A) ≤ P(B)  P(A U B) = P(A) + P(B) – P(A ∩ B)  P(AC ) = 1 – P(A)  P(Ω) = 1  P(φ) = 0
  • 42. Definitions  Mutually exclusive: Consider n events A1, A2,….., An. They are mutually exclusive if no two of them can occur together, i.e., P(Ai ∩ Aj) = 0. ∀ i, j (i ≠ j) ∈ n  Mutually exhaustive: Consider n events A1, A2,….., An. They are mutually exhaustive if at least one of them must occur and P(A1UA2U…..UAn) = 1
  • 43. Example 5 Suppose a fair die with faces 1, 2,….., 6 is rolled. Then Ω = {1, 2, 3, 4, 5, 6}. Let us define the events A1 = {1, 2}, A2 = {3, 4, 5, 6} and A3 = {3, 5}  The events A2 and A3 are neither mutually exclusive nor exhaustive  A1 and A3 are mutually exclusive but not exhaustive  A1, A2 and A3 are not mutually exclusive but are exhaustive  A1 and A2 are mutually exclusive and exhaustive
  • 44. Conditional probability Let A and B be two events such that P(B) > 0. Then the conditional probability of A given B is Assume Ω = {1, 2, 3, 4, 5, 6}, A = {2}, B = {2, 4, 6}. Then A ∩ B = {2} and )( )( )|( BP BAP BAP ∩ = 3 1 63 61 )|( ==BAP 1 61 61 )|( ==ABP
  • 45. Baye′s Theorem Let B1, B2,….., Bn be mutually exclusive and exhaustive events such that P(Bi) > 0, for every i =1, 2,…., n and A be any event such that then we have∑ = = n i ii BPBAPAP 1 )()|()( ∑ = = n i ii jj j BPBAP BPBAP ABP 1 )()|( )()|( )|(
  • 46. Independence of events Two events A and B are called independent if P(A∩B) = P(A)*P(B)
  • 48. Distribution Depending what are the outcomes of an experiment a random variable (r.v) is used to denote the outcome of the experiment and we usually denote the r.v using X, Y or Z and the corresponding probability distribution is denoted by f(x), f(y) or f(z)  Discrete: probability mass function (pmf)  Continuous: probability density function (pdf)
  • 49. Discrete distribution 1) Uniform discrete distribution 2) Binomial distribution 3) Negative binomial distribution 4) Geometric distribution 5) Hypergeometric distribution 6) Poisson distribution 7) Log distribution
  • 50. Bernoulli Trails 1) Each trial has two possible outcomes, say a success and a failure. 2) The trials are independent 3) The probability of success remains the same and so does the probability of failure from one trial to another
  • 51. Uniform discrete distribution [X ~ UD (a , b) ] f(x) = 1/n x = a, a+k, a+2k,….., b  a and b are the parameters where a, b ∈ R  E[X] =  V[X] =  Example: Generating the random numbers 1, 2, 3,…, 10. Hence X~UD(1,10) where a=1, k=1, b=10. Hence n=10. )1( 2 −+ n k a ) 12 1 ( 2 2 −n k
  • 52. Uniform discrete distribution Uniform discrete distribution 0 0.02 0.04 0.06 0.08 0.1 0.12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 x f(x)
  • 53. Binomial distribution [X ~ B (p , n)] f(x) = n Cxpx qn-x x = 0, 1, 2,…..,n  n and p are the parameters where p ∈ [0, 1] and n ∈ Z+  E[X] = np  V[X] = npq  Example: Consider you are checking the quality of the product coming out of the shop floor. A product can either pass (with probability p = 0.8) or fail (with probability q = 0.2) and for checking you take such 50 products (n = 50). Then if X is the random variable denoting the number of success in these 50 inspections, we have X~50 Cx(0.8)x (0.2)50-x
  • 54. Binomial distribution Binomial distribution 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0 2 4 6 8 10 12 14 16 18 20 22 24 30 32 34 36 38 40 42 44 46 48 50 x f(x)
  • 55. Negative binomial distribution [X ~ NB (p , r)] f(x) = r+x-1 Cr-1pr qx x = r, r+1,.…..  p and r are the parameters where p ∈ [0, 1] and r ∈ Z+  E[X] = rq/p  V[X] = rq/p2  Example: Consider the example above where you are still inspecting items from the production line. But now you are interested in finding the probability distribution of the number of failures preceding the 5th success of getting the right product. Then, we have, considering p=0.8, q=0.2 X ~ 5+x-1 C5-1(0.8)5 (0.2)x
  • 56. Negative binomial distribution Negative binomial distribution 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 x f(x)
  • 57. Geometric distribution [X ~ G (p)] f(x) = pqx x = 0,1,2,…..  p is the parameter where p ∈ [0, 1]  E[X] = q/p (r = 1 in the Negative Binomial distribution case)  V[X] = q/p2 (r = 1 in the Negative Binomial distribution case)  Example: Consider the example above. But now you are interested in finding the probability distribution of the number of failures preceding the 1st success of getting the right product. Then, we have considering p=0.8, q=0.2 X~ (0.8)(0.2)x
  • 58. Geometric distribution Geometric distribution 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 x f(x)
  • 59. Hypergeometric distribution [X ~ HG (N, n, p)] f(x) = Np Cx Nq Cn-x/N Cn 0 ≤ x ≤ Np and 0 ≤ (n – x) ≤ Nq  N, n and p are the parameters  E[X] = np  V[X] = npq{(N – n)/(N – 1)}  Example: Consider the example above. But now you are interested in finding the probability distribution of the number of failures(success) of getting the wrong(right) product when we choose n number of products for inspection out of the total population N. If the population is 100 and we choose 10 out of those, then the probability distribution of getting the right product, denoted by X is given by X~85 Cx 15 C10-x/100 C10  Remember  p (0.85) and q (0.15) are the proportions of getting a good item and bad item respectively.  In this distribution we are considering the choosing is done without replacement
  • 61. Poisson distribution [X ~ P (λ)] f(x) = e-λ λx /x! x = 0,1,2,…..  λ is the parameter where λ > 0  E[X] = λ  V[X] = λ  Example: Consider the arrival of the number of customers at the bank teller counter. If we are interested in finding the probability distribution of the number of customers arriving at the counter in specific intervals of time and we know that the average number of customers arriving is 5, then, we have X~ e-5 5x /x!
  • 63. Log distribution [X ~ L (p)] f(x) = -(loge p)-1 x-1 (1 – p)x x = 1,2,3,…..  p is the parameter where p ∈ (0, 1)  E[X] = -(1-p)/(plogep)  V[X] = -(1-p)[1 + (1 - p)/logep]/(p2 logep)  Example 1) Emission of gases from engines against fuel type 2) Used to represent the distribution of the number of items of a product purchased by a buyer in a specified period of time
  • 64. Log distribution Log distribution 0 0.1 0.2 0.3 0.4 0.5 0.6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 x f(x)
  • 65. Continuous distribution 1) Uniform distribution 2) Normal distribution 3) Exponential distribution 4) Chi-Square distribution 5) Gamma distribution 6) Beta distribution 7) Cauchy distribution
  • 66. Continuous distribution 8) t-distribution 9) F-distribution 10)Log-normal distribution 11)Weibull distribution 12)Double exponential distribution 13)Pareto distribution 14)Logistic distribution
  • 67. Uniform distribution [X ~ U (a, b)] f(x) = 1/(b – a) a ≤ x ≤ b  a and b are the parameters where a, b ∈ R and a < b  E[X] = (a+b)/2  V[X] = (b-a)2 /12  Example: Choosing any number between 1 and 10, both inclusive, from the real line
  • 69. Normal distribution [X ~ N ( µ, σ2 )] -∞ < x < ∞  µX, σ2 X are the parameters where µX ∈ R and σ2 X > 0  E[X] = µX  V[X] = σ2 X  Example: Consider the average age of a student between class VII and VIII selected at random from all the schools in the city of Kanpur 2 2 2 )( 2 1 )( X Xx X exf σ µ σπ − − =
  • 70. Normal distribution Normal distribution 0 0.05 0.1 0.15 0.2 0.25 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 x f(x)
  • 71. Log-normal distribution [X ~ LN ( µ, σ2 )] 0 < x < ∞  µX, σ2 X are the parameters where µX ∈ R and σ2 X > 0  E[X] = exp(µX+σ2 X/2)  V[X] = exp(2µX+σ2 X){exp(σ2 X)-1}  Example: Stock prices return distribution 2 2 2 )(log 1 2 1 )( X Xe x X e x xf σ µ σπ − − =
  • 72. Log-normal distribution Log-normal distribution 0 0.002 0.004 0.006 0.008 0.01 0.012 0.5 3 5.5 8 10.5 13 15.5 18 20.5 23 25.5 28 30.5 33 35.5 38 x f(x)
  • 73. Relationship between Poisson and Exponential distribution If a process has the intervals between successive events as independent and identical and it is exponentially distributed then the number of events in a specified time interval will be a Poisson distribution
  • 74. Normal distribution results Normal distribution 0 0.05 0.1 0.15 0.2 0.25 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 x f(x) a b
  • 75. z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359 0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753 0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141 0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517 0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879 0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224 0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549 0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852 0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133 0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389 1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621 1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830 1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015 1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177 1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319 1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441 1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545 1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633 1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706 1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767 2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817 2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857 2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890 2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916 2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936 2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952 2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964 2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974 2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981 2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986 3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990 543210-1-2-3-4-5 0.4 0.3 0.2 0.1 0.0 Z f(z) Standard Normal Distribution 1.56 { Standard Normal Probabilities Look in row labeled 1.5 and column labeled .06 to find P(0 ≤ z ≤ 1.56) = .4406 Finding Probabilities using Standard Normal Distribution: P(0 < Z < 1.56)
  • 76. Standard Normal distribution Z ~ N(0,1) given by the equation The area within an interval (a,b) is given by which is not integrable algebraically. The Taylor’s expansion of the above assists in speeding up the calculation, which is 2 2 2 1 )( z ezf − = π dzebZaF b a z ∫ − =≤≤ 2 2 2 1 )( π ∑ ∞ = + + − +=≤ 0 12 !2)12( )1( 2 1 2 1 )( k k kk kk z zZF π
  • 77. Cumulative distribution function (cdf) or the distribution function We denote the distribution function by F(x) ∑ ≤ =≤= xx i i xfxXPxF )()()( ∫∫ ∞−∞− ==≤= xx xdFdxxfxXPxF )()()()(
  • 78. Properties of distribution function 1) F(x) is non-decreasing in x, i.e., if x1 ≤ x2, then F(x1) ≤ F(x2) 2) Lt F(x) = 0 as x → - ∞ 3) Lt F(x) = 1 as x → + ∞ 4) F(x) is right continuous
  • 79. Standard normal distribution Putting Z=(X-µX)/σX in the normal distribution we have the standard normal distribution Where: µZ = 0 and σZ = 1 Remember • F(x) = P(X ≤ x) = F(z) = F(Z ≤ z) • f(z) = φ(z) • 2 2 2 1 )( z ezf − = π )()()()()( zxdFdxzfzZPzF zz Φ===≤= ∫∫ ∞−∞−
  • 80. Standard normal distribution Standard Normal distribution -0.05 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 z f(z)
  • 81. Exponential distribution [ X ~ E (a, θ)] a < x < ∞  a and θ are the parameters where a ∈ R and θ > 0  E[X] = a + θ  V[X] = θ2  ` Example: The life distribution of the number of hours a electric bulb survives. θ θ )( 1 )( ax exf − − =
  • 82. Exponential distribution Exponential distribution 0 0.05 0.1 0.15 0.2 0.25 0.3 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 x f(x)
  • 83. Normal CDF Plot Normal F(x) 0 0.2 0.4 0.6 0.8 1 1.2 1.83 2.46 2.74 2.99 3.12 3.25 3.36 3.45 3.51 3.56 3.74 3.8 3.9 3.98 4.08 4.28 4.48 4.78 5.16 X NormalF(x)
  • 84. Exponential CDF Plot Exponential F(x) 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 1.83 2.46 2.74 2.99 3.12 3.25 3.36 3.45 3.51 3.56 3.74 3.8 3.9 3.98 4.08 4.28 4.48 4.78 5.16 X F(X)
  • 85. Arrival time problem # 1 In a factory shop floor for a certain CNC machine (machine marked # 1) the number of jobs arriving per unit time are given below # of Arrivals Frequency 0 2 1 4 2 4 3 1 4 2 5 1 6 4 7 6 8 4 9 1
  • 86. Arrival time problem # 1 Frequency distribution of arrivals 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 8 9 # of Arrivals Frequency
  • 87. Arrival time problem # 1 Relative Frequency of Number of Arrivals 0 0.05 0.1 0.15 0.2 0.25 0 1 2 3 4 5 6 7 8 9 # of Arrivals RelativeFrequency
  • 88. Arrival time problem # 1 Cumulative Relative Frequency of Number of Arrivals 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 1 2 3 4 5 6 7 8 9 10 # of Arrivals CumulativeRelativeFrequency
  • 89. Arrival time problem # 1 1) The probability of number of arrivals of jobs being equal to or more than 7 is about 0.18. 2) The average number of arrival of jobs is 5. 3) The probability of number of arrivals of jobs being less than or equal to 4 is about 0.45.
  • 91. SAT problem This data set [ ] includes eight variables: 1) STATE: Name of state 2) COST: Current expenditure per pupil (measured in thousands of dollars per averagedaily attendance in public elementary and secondary schools) 3) RATIO: Average pupil/teacher ratio in public elementary and secondary schools during Fall 1994 4) SALARY: Estimated average annual salary of teachers in public elementary and secondary schools during 1994-95 (in thousands of dollars) 5) PERCENT: percentage of all eligible students taking the SAT in 1994-95 6) VERBAL: Average verbal SAT score in 1994-95 7) MATH: Average math SAT score in 1994-95 8) TOTAL: Average total score on the SAT in 1994-95
  • 92. SAT problem Histogram for Cost 0 2 4 6 8 10 12 St at e COST
  • 93. SAT problem Histograms for Cost and Ratio 0 5 10 15 20 25 30 Alabama Arkansas Connecticut Georgia Illinois Kansas Maine Michigan Missouri Nevada NewMexico NorthDakota Oregon SouthCarolina Texas Virginia Wisconsin State CostandRatio COST RATIO
  • 94. SAT problem Histogramof Cost, Ratio and Salary 0 10 20 30 40 50 60 Alabama Arizona California Connecticut Florida Hawaii Illinois Iowa Kentucky Maine Massachusetts Minnesota Missouri Nebraska New NewMexico NorthCarolina Ohio Oregon RhodeIslan SouthDakota Texas Vermont Washington Wisconsin State Cost,RatioandSalary COST RATIO SALARY
  • 95. SAT problem Average value is given by Variance is given by Covariance is given by ( ) ∑ = = n i iX n XE 1 1 ( ) ( ){ }∑ = −= n i i XEX n XV 1 21 ( ) ( ){ } ( ){ }[ ] ( ) ( )YVXVYEYXEXEYXCov YX ,, ρ=−−=
  • 96. SAT problem COST RATIO SALARY PERCENT VERBAL MATH TOTAL Mean 5.90526 16.858 34.82892 35.24 457.14 508.78 965.92 Median 5.7675 16.6 33.2875 28 448 497.5 945.5 Maximum 9.774 24.3 50.045 81 516 592 1107 Minimum 3.656 13.8 25.994 4 401 443 844 Standard Deviation(1) 1.362807 2.266355 5.941265 26.76242 35.17595 40.20473 74.82056 Standard Deviation(2) 1.34911 2.243577 5.881552 26.49344 34.82241 39.80065 74.06857
  • 97. SAT problem COST RATIO SALARY PERCENT VERBAL MATH TOTAL COST 1.820097 -1.12303 6.901753 21.18202 -19.2638 -18.7619 -38.0258 RATIO -1.12303 5.033636 -0.01512 -12.6639 4.98188 8.52076 13.50264 SALARY 6.901753 -0.01512 34.59266 96.10822 -97.6868 -93.9432 -191.63 PERCENT 21.18202 -12.6639 96.10822 701.9024 -824.094 -916.727 -1740.82 VERBAL -19.2638 4.98188 -97.6868 -824.094 1212.6 1344.731 2557.331 MATHS -18.7619 8.52076 -93.9432 -916.727 1344.731 1584.092 2928.822 TOTAL -38.0258 13.50264 -191.63 -1740.82 2557.331 2928.822 5486.154
  • 98. SAT problem COST RATIO SALARY PERCENT VERBAL MATH TOTAL COST 1 -0.37103 0.869802 0.592627 -0.41005 -0.34941 -0.38054 RATIO -0.37103 1 -0.00115 -0.21305 0.063767 0.095422 0.081254 SALARY 0.869802 -0.00115 1 0.61678 -0.47696 -0.40131 -0.43988 PERCENT 0.592627 -0.21305 0.61678 1 -0.89326 -0.86938 -0.88712 VERBAL -0.41005 0.063767 -0.47696 -0.89326 1 0.970256 0.991503 MATHS -0.34941 0.095422 -0.40131 -0.86938 0.970256 1 0.993502 TOTAL -0.38054 0.081254 -0.43988 -0.88712 0.991503 0.993502 1
  • 99. Inference 1) Point estimation 2) Interval estimation 3) Hypothesis testing
  • 100. Sampling • Population: N –Population distribution –Parameter (θ) • Sample: n –Sampling distribution –Statistic (tn)
  • 101. Types of sampling • Probability Sampling – Simple Random Sampling – Stratified Random Sampling – Cluster Sampling – Multistage Sampling – Systematic Sampling • Judgement Sampling – Quota Sampling – Purposive Sampling
  • 102. Simple Random Sampling A simple random sampling of size (n) from a finite population (N) is a sample selected such that each possible sample of size n has he same probability of being selected. This would be akin to SRSWOR
  • 103. Simple Random Sampling A simple random sampling of size (n) from an infinite population (N) is a sample selected such that the following conditions hold  Each element selected comes from the population.  Each element is selected independently This would be akin to SRSWR
  • 105. Chi-square distribution Suppose Z1, Z2,….., Zn are ′n′ independent observations from N(0, 1), then Z2 1 + Z2 2 + Z2 3 +….. + Z2 n ~ χ2 n
  • 106. Chi-square distribution 0 ≤ x <∞  n is the parameter (degree of freedom) where n ∈ Z+  E[X] = n  V[X] = 2n ) 2 (1) 2 ( 22) 2 ( 1 )( xn n ex n xf −− Γ = ]~[ 2 nX χ
  • 108. t-distribution Suppose Z ~ N(0, 1), Y ~ χ2 n and they are independent, then nt n Y Z ~
  • 109. t-distribution [X ~ tn] • n is the parameter where k ∈ Z+ • E[X] = 0 (n > 1) • V[X] = n/(n – 2), (n > 2) ]1[ ) 2 ( ] 2 1 [ )( 2 n x n n n xf + Γ + Γ = π
  • 111. F-distribution Suppose X ~ χ2 n, Y ~ χ2 m and they are independent, then mnF m Y n X ,~
  • 112. F-distribution [X ~ Fn,m] 0 < x < ∞  n, m are the parameter (degrees of freedom) where n, m ∈ Z+  E[X] = m/(m - 2), (m > 2)  V[X] = 2m2 (n + m -2)/[n(m – 2)2 (m – 4)], (m > 4) 2 1) 2 ( 22 ))( 2 () 2 ( ) 2 ( )( mn nmn mnx mn x mn mn xf + − +ΓΓ + Γ =
  • 114. Some results If X1, X2,….., Xn are ′n′ observations from X ~ N(µX, σ2 X) and then n n X n XXX = ++ .....21 )1,0(~ N n X X Xn σ µ−
  • 115. Some results If and then 2 12 2 , ~ )1( − − n X XnSn χ σ ∑ = − − = n j njXn XX n S 1 22 , }{ )1( 1 ∑ = −= n j XjXn X n S 1 22/ , }{ 1 µ 2 2 2/ , ~ n X XnnS χ σ 1 , ~ )( − − n Xn Xn t S Xn µ
  • 116. Some results If X1, X2,….., Xn are ′mX′ observations from X ~ N(µX, σ2 X) and Y1, Y2,….., Ym are ′mY′ observations from Y ~ N(µY, σ2 Y) and more over these samples are independent then 1,12 2 2 2 2 2 2 2 ~))(( )1( )1( )1( )1( −−= − − − − YX mm Y X X Y Y Y YY X X XX F S S m Sm m Sm σ σ σ σ
  • 118. Estimators and their properties Estimator: Any statistic (a random function) which is used to estimate the population parameter  Unbiasedness Eθ (tn) = θ  Consistency P[|tn - θ| < ε] = 1 as n → ∞
  • 119. Estimators (Discrete distribution) 1) X ~ UD (a , b) then and 2) X ~ P (λ), then 3) X ~ B (n, p), then ),...,min(ˆ 1 nXXa = ),...,max(ˆ 1 nXXb = nX=λˆ n favouring p # ˆ =
  • 120. Estimators (Continuous distribution) 1) X ~ N(µ, σ2 ), then 2) X ~ N(µ, σ2 ) and if µ is known then 3) X ~ N(µ, σ2 ) and if µ is unknown then 4) X ~ E(θ), then ∑ = −= n i iX n 1 22 }{ 1 ˆ µσ nX=µˆ nX=θˆ ∑ = − − = n i ni XX n 1 22 }{ )1( 1 ˆσ
  • 121. Examples (Estimation) Number of jobs arriving in a unit time for a CNC machine Consider we choose from a discrete distribution whose population distribution [X ~ UD (20, 35)] is not know. We select the values from the distribution and the numbers sampled are 22, 34, 33, 21, 29, 29, 30. Then the best estimate for a, i.e., and the best estimate for b, i.e., 21ˆ =a 34ˆ =b
  • 122. Examples (Estimation) You are testing the components coming out of the shop floor and find that 9 out of 30 components fail. Then the estimated value of p (proportion of bad items in the population) = 9/30.
  • 123. Examples (Estimation) At a particular teller machine in one of the bank branches it was found that the number of customers arriving in an unit time span were 4, 6, 7, 4, 3, 5, 6, and 5. Then for this Poisson process the estimated value of λ is 5.
  • 124. Examples (Estimation) Suppose it is known that the survival time of a particular type of bulb has the exponential distribution. You test 10 such bulbs and find their respective survival times as 150, 225, 275, 300, 95, 155, 325, 75, 20 and 400 hours respectively. Then the estimated value of θ = 202.
  • 126. Multiple Linear Regression Given ′k′ independent variables X1, X2,….., Xk and one dependent variable Y we predict the value of Y given by or y using the values of Xi ′s. We need ′n′ ( n ≥ k+1) data points and the multiple linear regression (MLR) equation is as follows: Yj = β1x1,j + β2x2,j +…..+ βkxk,j + εj ∀ j = 1, 2,….., n Yˆ
  • 127. Multiple Linear Regression Note  There is no randomness in measuring Xi  The relationship is linear and not non- linear. By non-linear we mean that at least one derivative of Y wrt βi′s is a function of at least one of the parameters. By parameters we mean the βi′s.
  • 128. Multiple Linear Regression Assumptions for the MLR  Xi, Y are normally distributed  Xi are all non-stochastic  εj ~ N(0,σ2 I)  rank(X) = k  n ≥ k  No dependence between the Xj′s, i.e., the rank of the matrix X is  E(εjεl) = 0 ∀ i, j = 1, 2,….., n  Cov(Xi,εj) = 0 ∀ i ≠ j, i, j = 1, 2,….., n
  • 129. Multiple Linear Regression  Find β1, β2,….., βk using the concept of minimizing the sum of square of errors. This is also known as least square method or method of ordinary least square. The estimates found are the estimates of β1, β2,….,βk respectively.  Utilize these estimates to find the forecasted value of Y (i.e., or y) and compare those with actual values of Y obtained in future. kβββ ˆ,.....,ˆ,ˆ 21 Yˆ
  • 130. To check for normality of data We need to check for the normality of Xi′s and Y 1) List the observation number in the column # 1,call it i. 2) List the data in column # 2. 3) Sort the data from the smallest to the largest and place in column # 3. 4) For each ith of the n observations, calculate the corresponding tail area of the standard normal distribution (Z) as follows, A = (i – 0.375)/(n + 0.25). Put the values in column # 4. 5) Use NORMSINV(A) function in MS-EXCEL to produce a column of normal scores. Put these values in column # 5. 6) Make a copy of the sorted data (be sure to use paste special and paste only the values) in column # 6. 7) Make a scatter plot of the data in columns # 5 and # 6.
  • 131. To check for normality of data Checking normality of data 0 50 100 150 200 250 300 350 400 450 -2.5 -1.6 -1.2 -1.0 -0.8 -0.6 -0.5 -0.3 -0.2 -0.1 0.1 0.2 0.3 0.5 0.6 0.8 1.0 1.2 1.6 3.0 Normal Score Data
  • 132. Basic transformation for MLR Some transformation to convert to MLR  X → X(p) = {Xp – 1}/p, as p → 0 then X(p) = logeX  If the variability in Y increases with increasing values of Y, then we use logeY = β1logeX1 + β2logeX2 +….. + βklogeXk + ε
  • 133. Simple linear regression In the simple linear regression we have Yj = α + βXj + εj ∀ j = 1,2,…..,n The question is how do we find α and β, provided we have ′n′ number of observations which constitutes the sample. We minimize the sum of square of the error wrt to α and β Finally: )(ˆˆ)( XEYE βα+= ∑ = +−=∆ n j jj XY 1 2 )}ˆˆ({ βα })()({ˆ)(ˆ)()(),cov( 2 XEXVXEYEXEYX ++=+ βα
  • 134. Simple linear regression After we have found out the estimators of α and β, we use these values to predict/forecast the subsequent future values of Y, i.e., we find out y and compare those y′s with the corresponding values of Y. Thus we find and compare them with corresponding values of Yk , for k = n+1, n+2, ……. kk Xy βα ˆˆ +=
  • 135. Simple linear regression -0.2 0.0 0.2 0.4 0.6 0.8 1.0 -6.69 -6.39 -6.07 -6.68 -6.38 -6.07 -5.75 -5.13 -4.81 -4.10 -3.66 -3.26 X Y,Y-hat,error Y Y-hat error
  • 136. Non-linear regression • y = (β + γX)/(1 + αX) • y = α(X - β)γ • y = α - βloge(X + γ) • y = α - βloge(X + γ) • y = α[1 – exp(-βX)]γ NOTE: For all these and other models we minimize the sum of squares and find the parameters α, β and γ.
  • 137. Simple Moving Average 3MA = 3 month Moving Average 5MA = 5 month Moving Average Month Actual 3MA 5MA Jan 266.00 Feb 145.90 198.33 Mar 183.10 149.43 178.92 Apr 119.30 160.90 159.42 May 180.30 156.03 176.60 Jun 168.50 193.53 184.88 Jul 231.80 208.27 199.58 Aug 224.50 216.37 188.10 Sep 192.80 180.07 221.70 Oct 122.90 217.40 212.52 Nov 336.50 215.10 206.48 Dec 185.90 238.90 197.82 Jan 194.30 176.57 215.26 Feb 149.50 184.63 202.62 Mar 210.10 210.97 203.72 Apr 273.30 224.93 222.26 May 191.40 250.57 237.56 Jun 287.00 234.80 256.26 Month Actual 3MA 5MA Jul 226.00 272.20 259.58 Aug 303.60 273.17 305.62 Sep 289.90 338.37 301.12 Oct 421.60 325.33 324.38 Nov 264.50 342.80 331.60 Dec 342.30 315.50 361.70 Jan 339.7 374.13 340.56 Feb 440.4 365.33 375.52 Mar 315.9 398.53 387.32 Apr 439.3 385.50 406.86 May 401.3 426.00 433.88 Jun 437.4 471.40 452.22 Jul 575.5 473.50 500.76 Aug 407.6 555.03 515.56 Sep 682 521.63 544.34 Oct 475.3 579.53 558.62 Nov 581.3 567.83 Dec 646.9
  • 138. Simple Moving Averages Actual, 3MA, 5MA Simple Moving Averages (Yellow is actual; Blue is 3MA; Red is 5MA) 0 100 200 300 400 500 600 700 800 Jan M ar M ay Jul Sep N ov Jan M ar M ay Jul Sep N ov Jan M ar M ay Jul Sep N ov Month Value
  • 139. Centred Moving Average 4MA(1) = 4 month Moving Average, 4MA(2) = 4 month Moving Average 2X4MA=Averages of 2MA(1) and 4MA(2) Mth Actual 4MA(1) 4MA(2) 2X4MA Jan 266.00 Feb 145.90 Mar 183.10 178.58 157.15 167.86 Apr 119.30 157.15 162.80 159.98 May 180.30 162.80 174.98 168.89 Jun 168.50 174.98 201.28 188.13 Jul 231.80 201.28 204.40 202.84 Aug 224.50 204.40 193.00 198.70 Sep 192.80 193.00 219.18 206.09 Oct 122.90 219.18 209.53 214.35 Nov 336.50 209.53 209.90 209.71 Dec 185.90 209.90 216.55 213.23 Jan 194.30 216.55 184.95 200.75 Feb 149.50 184.95 206.80 195.88 Mar 210.10 206.80 206.08 206.44 Apr 273.30 206.08 240.45 223.26 May 191.40 240.45 244.43 242.44 Jun 287.00 244.43 252.00 248.21 Mth Actual 4MA(1) 4MA(2) 2X4MA Jul 226.00 252.00 276.63 264.31 Aug 303.60 276.63 310.28 293.45 Sep 289.90 310.28 319.90 315.09 Oct 421.60 319.90 329.58 324.74 Nov 264.50 329.58 342.03 335.80 Dec 342.30 342.03 346.73 344.38 Jan 339.7 346.73 359.58 353.15 Feb 440.4 359.58 383.83 371.70 Mar 315.9 383.83 399.23 391.53 Apr 439.3 399.23 398.48 398.85 May 401.3 398.48 463.38 430.93 Jun 437.4 463.38 455.45 459.41 Jul 575.5 455.45 525.63 490.54 Aug 407.6 525.63 535.10 530.36 Sep 682 535.10 536.55 535.83 Oct 475.3 536.55 596.38 566.46 Nov 581.3 596.38 Dec 646.9
  • 140. Centred Moving Average Actual, 4MA(1), 4MA(2), 2X4MA Centred Moving Average (Yellow is Actual; Red is 4MA(1); Blue is 4MA(2); Green is 2X4MA) 0 100 200 300 400 500 600 700 800 Jan M ar M ay Jul Sep Nov Jan M ar M ay Jul Sep Nov Jan M ar M ay Jul Sep Nov Month Values
  • 141. Centred Moving Average Note:  4MA(1)=(Y1+Y2+Y3+Y4)/4  4MA(2)=(Y2+Y3+Y4+Y5)/4  2X4MA=(Y1+2*Y2+2*Y3+2*Y4+Y5)/8 Similarly we can have 2X6MA, 2X8MA, 2X12MA etc.
  • 142. Centred Moving Average Note:  3MA(1)=(Y1+Y2+Y3)/3  3MA(2)=(Y2+Y3+Y4)/3  3MA(3)=(Y3+Y4+Y5)/3  3X3MA=(Y1+2*Y2+3*Y3+2*Y4+Y5)/9
  • 143. Weighted Moving Averages In general a weighted k-point moving average can be written as Note:  The total of the weights is equal to 1  Weights are symmetric, i.e., aj = a-j ∑ −= += m mj jtjt YaT
  • 144. Weighted Moving Averages Steps are: 1) 4MA(1)=(Y1+Y2+Y3+Y4)/4 2) 4MA(2)=(Y2+Y3+Y4+Y5)/4 3) 4MA(3)=(Y3+Y4+Y5+Y6)/4 4) 4MA(4)=(Y4+Y5+Y6+Y7)/4 5) 4X4MA=(Y1+2*Y2+3*Y3+4*Y4+3*Y5+2*Y6+Y7)/16 6) 5X4X4MA = a-2*4X4MA(1) + a-1*4X4MA(2) + a0*4X4MA(3) + a1*4X4MA(4) + a2*4X4MA(5) where a-2 = -3/4, a-1 = 3/4, a0 = 1, a1 = 3/4, a2 = -3/4
  • 145. Exponential Smoothing Methods 1) Single Exponential Smoothing (one parameter, adaptive parameter) 2) Holt′s linear method (suitable for trends) 3) Holt-Winter′s method (suitable for trends and seasonality)
  • 146. Single Exponential Smoothing The general equation is: Ft+1 = Ft + α(Yt – Ft) = αYt + (1 - α)Ft, Note:  Error term: Et = Yt - Ft  Forecast value: Ft  Actual value: Yt  Weight: α ∈ (0,1)  α is such that sum of square of errors is minimized
  • 147. Single Exponential Smoothing Month Y(t) F(t, 0.1) F(t.0.5) F(t,0.9) Jan 200.0 200.0 200.0 200.0 Feb 135.0 200.0 200.0 200.0 Mar 195.0 193.5 167.5 141.5 Apr 197.5 193.7 181.3 189.7 May 310.0 194.0 189.4 196.7 Jun 175.0 205.6 249.7 298.7 Jul 155.0 202.6 212.3 187.4 Aug 130.0 197.8 183.7 158.2 Sep 220.0 191.0 156.8 132.8 Oct 277.5 193.9 188.4 211.3 Nov 235.0 202.3 233.0 270.9 Dec ------- 205.6 234.0 238.6
  • 148. Single Exponential Smoothing Single Exponential Smoothing 0 50 100 150 200 250 300 350 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Month Values Y(t) F(t,0.1) F(t,0.5) F(t,0.9)
  • 149. Extension of Exponential Smoothing The general equation is: Ft+1 = α1Yt + α2Ft + α3Ft-1 Note:  Error term: Et = Yt – Ft  Forecast value: Ft  Actual value: Yt  Weights: αi ∈ (0,1) ∀ i = 1, 2 and 3  α1 + α2 + α3 = 1  αi′s are such that the sum of square of errors is minimized
  • 150. Extension of Exponential Smoothing Extension of Exponential Smoothing 0 50 100 150 200 250 300 350 400 450 Jan Jul Jan Jul Jan Jul Jan Jul Jan Jul Jan Jul Jan Jul Jan Jul Month Values Y(t) F(t)
  • 151. Adaptive Exponential Smoothing The general equation is: Ft+1 = αtYt + (1 - αt)Ft Note:  Error term: Et = Yt – Ft  Forecast value: Ft  Actual value: Yt  Smoothed Error: At = βEt + (1 - β)At-1  Absolute Smoothed Error: Mt = β|Et| + (1 - β)Mt-1  Weight: αt+1 = |At/Mt|  α and β are such that sum of square of errors is minimized
  • 152. Adaptive Exponential Smoothing Starting values:  F2 = Y1  α2 = β = 0.2  A1 = M1 = 0
  • 153. Adaptive Exponential Smoothing Month Y(t) F(t) E(t) A(t) M(t) α β Jan 200.0 0.0 0.0 0.2 Feb 135.0 200.0 -65.0 -13.0 13.0 0.2 Mar 195.0 187.0 8.0 -8.8 12.0 1.0 Apr 197.5 188.6 8.9 -5.3 11.4 0.7 May 310.0 190.4 119.6 19.7 33.0 0.5 Jun 175.0 214.3 -39.3 7.9 34.3 0.6 Jul 155.0 206.4 -51.4 -4.0 37.7 0.2 Aug 130.0 196.2 -66.2 -16.4 43.4 0.1 Sep 220.0 182.9 37.1 -5.7 42.1 0.4 Oct 277.5 190.3 87.2 12.9 51.1 0.1 Nov 235.0 207.8 27.2 15.7 46.4 0.3 Dec 213.2 0.3
  • 154. Adaptive Exponential Smoothing Adaptive Exponential Smoothing 0 50 100 150 200 250 300 350 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Month Values Y(t) F(t)
  • 155. Adaptive Exponential Smoothing Adaptive Exponential Smoothing -100.00 -50.00 0.00 50.00 100.00 150.00 200.00 250.00 300.00 350.00 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Time Values A(t) M(t) E(t) F(t) Y(t)
  • 156. Holt-Winter′s Method The general equations are: 1) Lt = αYt/St-s + (1-α)(Lt-1 + bt-1) 2) bt = β(Lt – Lt-1) + (1 - β)bt-1 3) St = γYt/Lt + (1 - γ )St-s for t > s 4) Ft+m = (Lt + btm)St-s+m 5) Si = Yi/Ls where Ls = (Y1+…+Ys)/s for i ≤ s Note:  Forecast value: Ft  Actual value: Yt  Trend: bt  Seasonal component: St  Length of seasonality: s  α, β and γ are chosen such that the sum of square of errors is minimized