1. Noida Institute of Engineering and
Technology, Greater Noida
Statistics & Probability
AAS0303
Dr. Ritika Saini
Assistant Professor
Dept. of Mathematics
3/23/2023
1
Unit: I
Descriptive Measures
B.Tech.-3rd Sem(DS)
Dr. Anil Agarwal Unit I
Dr. Ritika Saini Unit-I
2. 3/23/2023 2
Sequence of Content
Dr. Ritika Saini Unit-I
(1) Name of Subject with code, Course and Subject teacher.
(2) Brief Introduction of Faculty.
(3) Evaluation Scheme.
(4) Subject Syllabus.
(5) Branch Wise Application.
(6) Course Objective.
(7) Course Outcomes(COs).
(8) Program Outcomes(POs).
(9) COs and POs Mapping.
(10)Program Specific Outcomes(PSOs)
(11) COs and PSOs Mapping.
(12)Program Educational Objectives(PEOs).
3. 3/23/2023 3
Sequence of Content
Dr. Ritika Saini Unit-I
(13)Result Analysis.
(14) End Semester Question Paper Templates.
(15) Prequisite /Recap.
(16) Brief Introduction about the Subject.
(17) Unit Content.
(18) Unit Objective.
(19) Topic Objective/Topic Outcome.
(20) Lecture related to topic.
(21) Daily Quiz.
(22) Weekly Assignment.
(23) Topic Links.
4. 3/23/2023 4
Sequence of Content
Dr. Ritika Saini Unit-I
(24) MCQ(End of Unit).
(25) Glossary questions.
(26) Old question Papers(Sessional + University).
(27) Expected Questions For External Examination.
(28) Recap of Unit.
5. Sl.
No.
Subject
Codes Subject Name
Periods Evaluation Scheme End
Semester Total
Credi t
L T P CT TA TOTAL PS TE PE
WEEKS COMPULSORY INDUCTION PROGRAM
1 AAS0303 Statistics and Probability 3 1 0 30 20 50 100 150 4
2 ACSE0306 Discrete Structures 3 0 0 30 20 50 100 150 3
3 ACSE0305 Computer Organization &
Architecture
3 0 0 30 20 50 100 150 3
4 ACSE0302 Object Oriented Techniques
using Java
3 0 0 30 20 50 100 150 3
5 ACSE0301 Data Structures 3 1 0 30 20 50 100 150 4
6 ACSAI0301 Introduction to Artificial
Intelligence
3 0 0 30 20 50 100 150 3
7 ACSE0352 Object Oriented Techniques
using Java Lab
0 0 2 25 25 50 1
8 ACSE0351 Data Structures Lab 0 0 2 25 25 50 1
9 ACSAI0351 Introduction to Artificial
Intelligence Lab
0 0 2 25 25 50 1
10 ACSE0359 Internship Assessment-I 0 0 2 50 50 1
11
ANC0301/
ANC0302
Cyber Security*/
Environmental Science * (Non
Credit)
2 0 0 30 20 50 50 100 0
12 MOOCs** (For B.Tech.
Hons. Degree)
GRAND TOTAL 1100 24
3/23/2023 5
Evaluation Scheme
Dr. Ritika Saini Unit-I
6. 3/23/2023 6
Syllabus of AAS0303
UNIT-I Descriptive measures 8 Hours
Measures of central tendency â mean, median, mode, measures of dispersion â mean deviation, standard
deviation, quartile deviation, variance, Moment, Skewness and kurtosis, least squares principles of curve
fitting, Covariance, Correlation and Regression analysis, Correlation coefficient: Karl Pearson coefficient,
rank correlation coefficient, uni-variate and multivariate linear regression, application of regression
analysis, Logistic Regression, time series analysis- Trend analysis (Least square method).
UNIT-II Probability and Random variable 8 Hours
Probability Definition, The Law of Addition, Multiplication and Conditional Probability, Bayesâ Theorem,
Random variables: discrete and continuous, probability mass function, density function, distribution
function, Mathematical expectation, mean, variance. Moment generating function, characteristic function,
Two dimensional random variables: probability mass function, density function,
UNIT-III Probability distribution 8 Hours
Probability Distribution (Continuous and discrete- Normal, Exponential, Binomial, Poisson distribution),
Central Limit theorem
UNIT-IV Test of Hypothesis & Statistical Inference 8 Hours
Sampling and population, uni-variate and bi-variate sampling, re-sampling, errors in sampling, Sampling
distributions, Hypothesis testing- p value, z test, t test (For mean), Confidence intervals, F test; Chi-square
test, ANOVA: One way ANOVA,
Statistical Inference, Parameter estimation, Least square estimation method, Maximum Likelihood
estimation.
UNIT-V Aptitude-III 8 Hours
Time & Work, Pipe & Cistern, Time, Speed & Distance, Boat & Stream, Sitting Arrangement, Clock &
Calendar.
7. ⢠Data Analysis
⢠Artificial intelligence
⢠Digital Communication: Information theory and coding.
3/23/2023 7
Branch Wise Application
Dr. Ritika Saini Unit-I
8. ⢠The objective of this course is to familiarize the engineers with
concept of Statistical techniques, probability distribution, hypothesis
testing and ANOVA and numerical aptitude. It aims to show case the
students with standard concepts and tools from B. Tech to deal with
advanced level of mathematics and applications that would be
essential for their disciplines.
The student will be able to understand:
⢠The concept of Descriptive measurements.
⢠The concept of probability & Random variable.
⢠Probability distributions.
⢠The concept of hypothesis testing & Statistical inferences.
⢠The concept of numerical aptitude.
3/23/2023
Dr. Ritika Saini Unit-I
8
Course Objective
9. ⢠CO1: Understand the concept of moments, Skewness,
kurtosis, correlation, curve fitting and regression
analysis, Time-Series analysis etc.
⢠CO2: Understand the concept of Probability and Random variables.
⢠CO3: Remember the concept of probability to evaluate
probability distributions.
⢠CO4: Apply the concept of hypothesis testing and estimation
of parameters.
⢠CO5: Solve the problems of Time & Work, Pipe & Cistern,
Time, Speed & Distance, Boat & Stream, Sitting arrangement,
Clock & Calendar etc.
3/23/2023 Dr. Ritika Saini Unit-I 9
Course Outcome
13. 3/23/2023 Dr. Ritika Saini Unit-I 13
CO-PSO Mapping(CO1)
*1= Low *2= Medium *3= High
CO PSO 1 PSO 2 PSO 3
CO1 3 2 1
CO2 1 2 1
CO3 2 2 2
CO4 3 2 1
CO5 3 2 2
14. PEO-1: To have an excellent scientific and engineering breadth so as to
comprehend, analyze, design and provide sustainable solutions for real-
life problems using state-of-the-art technologies.
PEO-2: To have a successful career in industries, to pursue higher studies
or to support entrepreneurial endeavors and to face the global
challenges.
PEO-3: To have an effective communication skills, professional attitude,
ethical values and a desire to learn specific knowledge in emerging
trends, technologies for research, innovation and
product development and contribution to society.
PEO-4: To have life-long learning for up-skilling and re-skilling for
successful professional career as engineer, scientist, entrepreneur and
bureaucrat for betterment of society.
3/23/2023 Dr. Ritika Saini Unit-I 14
Program Educational Objectives(PEOs)
15. Branch Semester Sections No. of
enrolled
Students
No.
Passed
Students
% Passed
AIML III A, B, C 199 199 100%
3/23/2023 Dr. Ritika Saini Unit-I 15
Result Analysis
17. ï§ Knowledge of Maths -I of B.Tech.
ï§ Knowledge of Maths -II of B.Tech.
ï§ Knowledge of Basic Statistics.
3/23/2023 Dr. Ritika Saini Unit-I 17
Prerequisite and Recap(CO1)
18. ⢠In first four modules, we will discuss Statistics and probability.
⢠In 5th module we will discuss aptitude part.
3/23/2023 18
Brief Introduction about the subject
Dr. Ritika Saini Unit-I
19. ⢠Introduction
⢠Measures of central tendency â mean, median, mode
⢠Measures of dispersion â mean deviation, standard deviation,
quartile deviation, variance
⢠Moment
⢠Skewness and kurtosis
⢠Least squares principles of curve fitting, Covariance
⢠Correlation and Regression analysis
⢠Correlation coefficient: Karl Pearson coefficient, rank correlation
coefficient
⢠Uni-variate and multivariate linear regression
⢠Application of regression analysis, Logistic Regression
⢠Time series analysis- Trend analysis (Least square method).
3/23/2023 19
Unit Content
Dr. Ritika Saini Unit-I
20. ⢠The objective of this course is to familiarize the engineers with
concept of âDescriptive measurementsâ in the Statistical
techniques.
⢠It aims to show case the students with standard concepts and
tools from B. Tech to deal with advanced level of mathematics and
applications that would be essential for their disciplines.
3/23/2023
Dr. Ritika Saini Unit-I
20
Unit Objective(CO1)
21. Measures of central tendency:
⢠To present a brief picture of data- It helps in giving a brief
description of the main feature of the entire data.
⢠Essential for comparison- It helps in reducing the data to a single
value which is used for doing comparative studies.
⢠Helps in decision making- Most of the companies use measuring
central tendency to plan and develop their businesses economy.
⢠Formulation of policies- Many governments rely on this medium
while forming any policies.
3/23/2023 Dr. Ritika Saini Unit-I 21
Topic objective (CO1)
22. ï±Measures of Central Tendency or Averages:
Definition : According to Prof. Bowley: Averages are âstatistical
constants which enable us to comprehend in a single effort the
significance of the whole.â
Types of Measures of Central Tendency: There are five types
of measures of centraltendency
ï Arithmetic Mean or Simple Mean
ï Median
ï Mode
ï Geometric Mean
ï Harmonic Mean
3/23/2023 Dr. Ritika Saini Unit-I 22
Measures of Central Tendency (CO1)
23. Requisites for an Ideal Measure of Central Tendency:
According to Prof. Yule, the following are the characteristics to be
satisfied by an ideal measure of central tendency.
ï rigidly defined.
ï readily comprehensible and easy to calculate.
ï based on all the observations.
ï suitable for further mathematical treatment.
ï affected as little as possible by fluctuations of sampling.
ï not be affected much by extreme values (not due to Prof. Yule).
3/23/2023 Dr. Ritika Saini Unit-I 23
Measures of Central Tendency (CO1)
24. ïArithmetic Mean:
Definition
Arithmetic mean of a set of observations is their sum divided by the
number of observations, e.g., the arithmetic mean x¯ of nobservations
x1,x2,...,xnis given by:
ð¥ =
x1+ x2+ ⊠+ xn
ð
=
1
ð
ð=1
ð
ð¥ð
ï¶ In case of the frequencydistributionxi|fi,i=1,2,...,n,where
fi is the frequency of the variable xi,
ð¥ =
ð1x1 +ð2 x2 +⯠+ ððxn
ð1 + ð2 + ⯠+ ðð
=
ð=1
ð
ððð¥ð
ð=1
ð
ðð
=
1
ð
ð=1
ð
ððð¥ð , where
ð=1
ð
ðð = ð
3/23/2023 Dr. Ritika Saini Unit-I 24
Arithmetic Mean(CO1)
25. In case of grouped or continuous frequency distribution, x is taken as
the mid-value of the correspondingclass.
Example: Find the arithmetic mean of the following frequency
distribution:
Solution:
Computation of mean
ð¥ =
ð1x1 +ð2 x2 +⯠+ ððxn
ð1 + ð2 + ⯠+ ð
ð
=
ð=1
ð
ððð¥ð
ð=1
ð
ðð
=
1
ð
ð=1
ð
ððð¥ð
, where
ð=1
ð
ðð = ð
3/23/2023 Dr. Ritika Saini Unit-I 25
Arithmetic Mean(CO1)
26. 3/23/2023 Dr. Ritika Saini Unit-I 26
Arithmetic Mean(CO1)
By using formula ð=1
ð
ðð = ð = 73, ð=1
ð
ððð¥ð = 299
ðððð =
1
ð
ð=1
ð
ððð¥ð =
299
73
= 4.09
27. Example: Calculate the arithmetic mean of the marks from the
following table:
Solution: i=10,D=x-A, A=35,
then ð =35 +
âððð
ððð
= ðð â ð = ðð
3/23/2023 Dr. Ritika Saini Unit-I 27
Daily Quiz (CO1)
X F fx D=x-A Fd
5 12 60 5-35=-30 -360
15 18 270 15-35=-20 -360
25 27 675 25-35=-10 -270
35 20 700 35-35=0 0
45 17 765 45-35=10 170
55 6 330 55-35=20 120
28. ï¶When the values of x or(and)f are large:
The calculation of mean by above formula is time-consuming and
tedious. Therefore the deviations of the given values from any
arbitrary point âAâ is taken given as follows:
Let di = xi âA.
Thenfidi=fi(xiâA)=fixiâAfi
Summing both sides over i from 1 to n, we get
ð=1
ð
fidi =
ð=1
ð
fixi â A
ð=1
ð
fi =
ð=1
ð
fixi â A . N
â
1
ð ð=1
ð
fidi =
1
ð ð=1
ð
fixi â A
1
ð ð=1
ð
fi =
1
ð ð=1
ð
fixi â A = ð¥ + ðŽ
3/23/2023 Dr. Ritika Saini Unit-I 28
Arithmetic Mean(CO1)
29. Properties of Arithmetic Mean:
1. Property.: The Algebraic sum of the deviations of all the variates
from their arithmetic mean is zero.
2. Property: The sum of the squares of the deviations of a set of values
is minimum when taken about mean.
3. Property:(Mean of the composite series)if ð¥ð, (i = 1, 2, ..., k) are the
means of k composite series of sizes ni, i = 1, 2, ..., k respectively,
then the mean ð¥of the composite series obtained on combining the
component series is givenas:
ð1 = 60, ð¥1 = 25, ð2 = 66, ð¥2 =35
ð¥ =
ð1ð¥1 + ð2ð¥2 + ⯠+ ððð¥ð
ð1 + ð2 + ⯠+ðð
=
ð ððð¥ð
ð ðð
3/23/2023 Dr. Ritika Saini Unit-I 29
Arithmetic Mean(CO1)
30. where ð¥ is the arithmetic mean of the distribution.
âŽð¥ = ðŽ +
1
ð ð=1
ð
fidi
This formula is much more convenient to apply than previous formula.
Any number can serve the purpose of arbitrary point âAâ but, usually
the value of x corresponding to the middle part of distribution will be
much moreconvenient.
ï¶Grouped or Continuous Frequency Distribution:
The arithmetic is reduced to greater extent by taking
di =
ð¥ðâðŽ
â
where A is an arbitrary point and h is thecommon magnitude of
class interval.
⎠We have hdi= xi â A and proceeding exactly as in previous slide, we
getð¥ = ðŽ +
â
ð ð=1
ð
fidi
3/23/2023 Dr. Ritika Saini Unit-I 30
Arithmetic Mean(CO1)
31. Example: Calculate the mean for the following frequencydistribution:
Solution: Arithmetic mean =25.404
Example: The average salary of male employees in a farm was Rs. 5,200
and that of females was Rs. 4,200. The mean salary of all the
employees was Rs. 5,000.Find the percentage of male and female
employees.
Solution: The percentage of male and female employees are 80 and 20.
5000=
100âð 5200+ð.4200
100
â 5000 Ã 100 = 5200 Ã 100 â 5200ð +
4200ð â 1000ð = 20000 â ð = 20%, ð = 100 â ð = 100 â 20 =
80%
3/23/2023 Dr. Ritika Saini Unit-I 31
Daily Quiz(CO1)
Class
interval
0-8 8-16 16-24 24-32 32-40 40-48
Frequency 8 7 16 24 15 7
32. ïMedian:
Definition: Median of a distribution is the value of the variable which
divides it into two equal parts.
It is the value such that the number of observations above it is equal to
the number of observations below it. The median is thus a positional
average.
ï¶ Ungrouped Data:
If the number of observations is odd then median is the middle value
after the values have been arranged in ascending or descending order
of magnitude.
⢠In case of even number of observations, there are two middle
terms and median is obtained by taking the arithmetic mean of
middle terms.
3/23/2023 Dr. Ritika Saini Unit-I 32
Median(CO1)
33. Example
1. Median of Values 25, 20, 15, 35, 18. Median:20
2. Median of Values 8, 20, 50, 25, 15, 30. Median:22.5
ï¶ Discrete FrequencyDistribution
In this case median is obtained by considering
thecumulativefrequencies. The steps involved
i. Find
ð
2
, where N= ð=1
ð
ðð
ii. See the cumulative frequency (c.f.) just greater than
ð
2
.
iii. corresponding value of x ismedian.
3/23/2023 Dr. Ritika Saini Unit-I 33
Median(CO1)
34. Example: Obtain the median for the following frequencydistribution:
Solution:
i. Find
ð
2
=
8+10+11+16+20+25+15+9+6
2
=
120
2
= 60, where N= ð=1
ð
ðð
ii. See the cumulative frequency (c.f.) just greater than
ð
2
.
iii. corresponding value of x ismedian.
3/23/2023 Dr. Ritika Saini Unit-I 34
Median(CO1)
35. Here N =120, The cumulative frequency just greater than
ð
2
is 65 and
the 2 value of x corresponding to 65 is 5. Therefore, median is 5.
3/23/2023 Dr. Ritika Saini Unit-I 35
Median(CO1)
36. ï¶Continuous Frequency Distribution
In this case, the class corresponding to the c.f. justgreater
ð
2
is calledthe medianclass and the value of medianis
obtained by theformula:
where
⢠l is the lower limit of theclass,
⢠fis the frequency of the medianclass,
⢠h is the magnitude of the medianclass,
⢠c is the c.f. of the class preceding the medianclass,
⢠N= ð=1
ð
ðð
3/23/2023 Dr. Ritika Saini Unit-I 36
Median = ð +
â
ð
ð
2
â ð
Median(CO1)
37. Example : find the median wages of the following distribution.
Solution: The median wage is Rs. 4,675.
3/23/2023 Dr. Ritika Saini Unit-I 37
Daily Quiz(CO1)
Wages No. of workers
2000-3000 3
3000-4000 5
4000-5000 20
5000-6000 10
6000-7000 5
38. ,,
Median = ð +
â
ð
ð
2
â ð =4000+50(21.5-8)=4000+675=4675
⢠L is the lower limit of the class,=4000
⢠f is the frequency of the medianclass,=20
⢠h is the magnitude of the medianclass,=1000
⢠c is the c.f. of the class preceding the median class=8,
⢠N= ð=1
ð
ðð=43
The median wage is Rs.4,675.
3/23/2023 Dr. Ritika Saini Unit-I 38
Daily Quiz(CO1)
Wages No of workes f C.F.
2000-3000 3 3
3000-4000 5 8
4000-5000 20 28
5000-6000 10 38
6000-7000 5 43
N=43
39. Uses:
ï§ Median is the only average to be used while dealing with qualitative
data which cannot be measured quantitatively but still can be
arranged in ascending or descending order of magnitude, e.g., to
find the average intelligence or average honesty among a group of
people.
ï§ It is to be used for determining the typical value in problems
concerning wages, distribution of wealth, etc.
3/23/2023 Dr. Ritika Saini Unit-I 39
Median(CO1)
40. ïMode:
⢠Mode is the value which occurs most frequently in a set of
observations and around which the other items of the set cluster
densely.
⢠It is the point of maximum frequency or the point of greatest
density.
⢠In other words the mode or modal value of the distribution is that
value of the variate for which frequency is maximum.
Calculation of Mode
ï¶ In case of discrete distribution: Mode is the value of x
corresponding to maximum frequency but in any one (or more)of
the following cases.
3/23/2023 Dr. Ritika Saini Unit-I 40
Mode(CO1)
41. i. If the maximum frequency is repeated.
ii. If the maximum frequency occurs in the very beginning or at the
end of distribution .
iii. If there are irregularities in the distribution, the value of mode is
determined by the method of grouping.
ï¶ In case of continuous frequency distribution: mode is given by the
formula
where ð is the lower limit,â ð¡âð width and ðð the frequency of the
model class ð1ððð ð2 are the frequencies of the classes preceding and
succeeding the modal class respectively. While applying the above
formula it is necessary to see that the class intervals are of the same
size.
3/23/2023 Dr. Ritika Saini Unit-I 41
Mode(CO1)
Mode= ð +
ððâð1
2ððâð1âð2
à â
42. ï¶ For a symmetrical distribution, mean, median and mode coincide.
When mode is ill defined ,where the method of grouping also fails
its value can be ascertained by the formula
Mode=3Median-2Mean
This measure is called the empirical mode.
Q. Calculate the mode from the following frequency distribution.
Solution: Method of Grouping :
3/23/2023 Dr. Ritika Saini Unit-I 42
Size(ð) 4 5 6 7 8 9 10 11 12 13
Frequency f 2 5 8 9 12 14 14 15 11 13
Mode(CO1)
44. Since the item 10 occurs maximum number of times i.e.5times,hence
the mode is 10.
3/23/2023 Dr. Ritika Saini Unit-I 44
ðªðððððð ðºððð ðð ðððð ðððððð ððð. ððððððððð
1 max.15 11
2max 29 10, 11
3 max 28 9, 10
4 max 40 10, 11, 12
5 max 40 8 9 10
6 max 43 9 10 11
Mode(CO1)
45. Q. Find the mode of the following:
Solution: Here the greatest frequency 32 lies in the class 16-20.Hence
modal class is 16-20.But the actual limits of this class are 15.5-20.5.
ð = 15.5, ðð = 32, ð1 = 16, ð2 = 24, â = 5
3/23/2023 Dr. Ritika Saini Unit-I 45
Marks 0-5 6-10 11-15 16-20 21-25
No. of
candidates
7 10 16 32 24
Marks 26-30 31-35 36-40 41-45
No. of
candidates
18 10 5 1
Mode(CO1)
47. 3/23/2023 Dr. Ritika Saini Unit-I 47
Daily Quiz(CO3)
Q.1 Calculate the mean, median and mode of the following data-
Wages (in Rs) 0-
20
20-40 40-
60
60-80 80-
100
100-
120
120-140
No. of
Workers
6 8 10 12 6 5 3
48. ïŒ Measures of central tendency
ïŒ Mean
ïŒ Mode
ïŒ Median
3/23/2023 Dr. Ritika Saini Unit-I 48
Recap(CO1)
49. Measuring Dispersion:
We will measure the Dispersion of given data by calculating:
Range
Inter quartile range
Mean deviation
Standard deviation
Variance
Coefficient of Variation
Topic Objective (CO1)
3/23/2023 49
Dr. Ritika Saini Unit-I
50. 50
Definition
⢠Measures of dispersion are descriptive statistics that describe how
similar a set of scores are to each other
â The more similar the scores are to each other, the lower the
measure of dispersion will be
â The less similar the scores are to each other, the higher the
measure of dispersion will be
â In general, the more spread out a distribution is, the larger the
measure of dispersion will be
Measures of Dispersion(CO1)
3/23/2023 Dr. Ritika Saini Unit-I
51. 51
⢠Which of the distributions
of scores has the larger
dispersion?
0
25
50
75
100
125
1 2 3 4 5 6 7 8 9 10
0
25
50
75
100
125
1 2 3 4 5 6 7 8 9 10
⢠The upper distribution has
more dispersion because
the scores are more
spread out.
⢠That is, they are less
similar to each other.
Measures of Dispersion(CO-1)
3/23/2023 Dr. Ritika Saini Unit-I
52. ï¢ Easy to understand
ï¢ Simple to calculate
ï¢ Uniquely defined
ï¢ Based on all observations
ï¢ Not affected by extreme observations
ï¢ Capable of further algebraic treatment
PROPERTIES OF A GOOD MEASURE OF
DISPERSION(CO1)
3/23/2023 52
Dr. Ritika Saini Unit-I
53. Expressed in the
same units in which
data is expressed
Ex: Rupees, Kgs,
Ltr, Km etc.
Absolute Relative
In the form of ratio
or percentage, so is
independent of
units
It is also called
Coefficient of
Dispersion
MEASURES OF DISPERSION(CO1)
3/23/2023 53
Dr. Ritika Saini Unit-I
54. 54
⢠There are some measures of dispersion:
â Range
â Inter quartile range
â Mean deviation
â Standard deviation
â Variance
â Coefficient of Variation
Measures of Dispersion(CO1)
3/23/2023 Dr. Ritika Saini Unit-I
55. RANGE:-
ï¢ It is the simplest measures ofdispersion
ï¢ It is defined as the difference between thelargest and smallest
values in theseries
R = L âS
R = Range, L = Largest Value, S = SmallestValue
Coefficient of Range=
ð¿âð
ð¿+ð
1.RANGE (R) (CO1)
3/23/2023 55
Dr. Ritika Saini Unit-I
56. Individual Series:-
Q1: Find the range &Coefficient of Range for the following data: 20, 35,
25, 30,15
Solution:-
L = Largest Value=35
S = SmallestValue=15
(Range)R = L âS=35-15=20
Coefficient of Range=
ð¿âð
ð¿+ð
=
35â15
35+15
=
20
50
= 0.4
PRACTICE PROBLEMS âRANGE(CO1)
3/23/2023 56
Dr. Ritika Saini Unit-I
57. Continuous Frequency Distribution:
Q3: Find the range &Coefficient of Range:
Solution:- L = Upper limit of Largest class=30
S =Lower limit of SmallestValue=5
(Range)R = L âS=30-5=25
Coefficient of Range=
ð¿âð
ð¿+ð
=
30â5
30+5
=
25
35
=
5
7
= 0.714
Size 5-10 10-15 15-20 20-25 25-30
F 4 9 15 30 40
PRACTICE PROBLEMS âRANGE(CO1)
3/23/2023 57
Dr. Ritika Saini Unit-I
58. Q1: Find the range & Coefficient of Range for the following data:
25, 38, 45, 30, 15
Ans:30,0.5
Q2: Find the range & Coefficient of Range.
Q3: Find the range & Coefficient of Range.
Daily Quiz âRANGE(CO1)
3/23/2023 58
Dr. Ritika Saini Unit-I
59. ï¢ Canât be calculated in
open ended distributions
ï¢ Not based on all the
observations
ï¢ Affected by sampling
fluctuations
ï¢ Affected by extreme
values
MERITS
ï¢ Simple to understand
ï¢ Easy to calculate
ï¢ Widely used in
statistical quality
control
DEMERITS
RANGE(CO1)
3/23/2023 59
Dr. Ritika Saini Unit-I
60. ï¢ Interquartile Range is the difference between the
upper quartile (Q3) and the lower quartile (Q1)
ï¢ It covers dispersion of middle 50% of the items of the series
ï¢ Symbolically, Interquartile Range = Q3 â Q1
ï¢ Quartile Deviation is half of the interquartile range. It is also called
Semi Interquartile Range.
ï¢ Symbolically, Quartile Deviation = ð3 âð1
2
ï¢ Coefficient of Quartile Deviation: It is the relative
measure of quartile deviation.
ï¢ Coefficient of Q.D. =
ð3âð1
ð3+ð1
2. INTERQUARTILE RANGE & QUARTILE
DEVIATION(CO1)
3/23/2023 60
Dr. Ritika Saini Unit-I
65. PRACTICE PROBLEMS â IQR &QD(CO1)
3/23/2023 65
Dr. Ritika Saini Unit-I
X F C.F.
0-20 4 4
20-40 10 14
40-60 15 29
60-80 20 49
80-100 11 60
N=60
66. Q1: Find quartile deviation and coefficient of quartile
deviation of the followings:
4,8,10,7,15,11,18,14,12,16
Ans: 3.75, 0.32
Q2:
Ans: 10, 5, 0.11
Q3:
Ans: 14.33, 0.19
X 0-10 10-20 20-30 30-40 40-50 60
F 2 8 20 35 42 20
Age 0-20 20-40 40-60 60-80 80-100
Persons 4 10 15 20 11
Daily Quiz â IQR &QD(CO1)
3/23/2023 66
Dr. Ritika Saini Unit-I
67. ï¢ It is also called Average Deviation
ï¢ It is defined as the arithmetic average of the deviation of the
various items of a series computed from measures of central
tendencylike mean or median.
There are some formulas to calculate mean deviation.
3. MEAN DEVIATION (M.D.) (CO1)
3/23/2023 67
Dr. Ritika Saini Unit-I
68. Q1: Calculate M.D. from Mean & Median & coefficient of Mean
Deviation from thefollowing data: 20, 22, 25, 38, 40, 50, 65, 70,75.
Solution:ððððð¥ =
ð¥
ð
=
20+22+25+38+40+50+65+70+75
9
=
405
9
= 45
ðððððð = ð£ððð¢ð ðð
ð + 1
2
ð¡â ð¡ððð
= ð£ððð¢ð ðð
9 + 1
2
ð¡â ð¡ððð
= 40
Table of Deviation from mean and from median: Next ppt
3/23/2023 Dr. Ritika Saini Unit-I 68
PRACTICE PROBLEMS-M.D.(CO1)
69. 3/23/2023 Dr. Ritika Saini Unit-I 69
PRACTICE PROBLEMS â M.D.(CO1)
Marks X Deviation from mean
45 ð ð = ð¿ â ðð
Deviation from median
40 ð ð = ð¿ â ðð
20 25 20
22 23 18
25 20 15
38 7 2
40 5 0
50 5 10
65 20 25
70 25 30
75 30 35
N=9, ð =
405
ð ð =160 ð ð =155
70. 3/23/2023 Dr. Ritika Saini Unit-I 70
PRACTICE PROBLEMS â M.D.(CO1)
M.D from Mean ð. ð·.ð¥ =
ð ð
ð
=
160
9
= 17.78
Coefficient of ð. ð·.ð¥ =
ð.ð·.ð¥
ð¥
=
17.78
45
= 0.39
M.D from Median ð. ð·.ð =
ð ð
ð
=
155
9
= 17.22
Coefficient of ð. ð·.ð =
ð.ð·.ð
ð
=
17.22
40
= 0.43
71. Q2: Calculate M.D. from Mean & Median & coefficient of Mean
Deviation from thefollowing data:
Solution:
3/23/2023 Dr. Ritika Saini Unit-I 71
PRACTICE PROBLEMS â M.D.(CO1)
x F c.f. ð ð
= ð¿ â ðð
f ð ð Fx ð ð
= ð¿ â ðð
f ð ð
20 8 8 20 160 160 21 168
30 12 20 10 120 360 11 132
40 20 40 0 0 800 1 20
50 10 50 10 100 500 9 90
60 6 56 20 120 360 19 114
70 4 60 30 120 280 29 116
N=
60
f ð ð = 620
2460
f ð ð = 640
75. ï¢ Ignoring â±â signs are not
appropriate
ï¢ Not accurate for Mode
ï¢ Difficult to calculate if
value of Mean or Median
comes in fractions
ï¢ Not capable of further
algebraic treatment
ï¢ Not used in statistical
conclusions.
Merits
ï¢ Simple to understand
ï¢ Easy to compute
ï¢ Less effected by extreme
items
ï¢ Useful in fields like
Economics, Commerce
etc.
ï¢ Comparisons about
formation of different
series can be easily made
as deviations are taken
from a central value
Demerits
MEAN DEVIATION(CO-1)
3/23/2023 75
Dr. Ritika Saini Unit-I
77. 77
⢠When the deviate scores are squared in variance, their unit of
measure is squared as well
â E.g. If peopleâs weights are measured in pounds, then the
variance of the weights would be expressed in pounds2 (or
squared pounds)
⢠Since squared units of measure are often awkward to deal
with, the square root of variance is often used instead
â The standard deviation is the square root of variance
5. Standard Deviation(CO1)
⢠Standard deviation ð = variance
⢠Variance = (Standard deviation)2
3/23/2023 Dr. Ritika Saini Unit-I
81. ï¶For an Individual Series : If ð¥1, ð¥2,âŠ..ð¥ð are the values of the
variable under consideration , ð¥ is defined as
ï¶For a frequency Distribution: If ð¥1,ð¥2,âŠ.,ð¥ð are the values of a
variable ð¥ with the corresponding frequencies ð1, ð2, ⊠. , ðð
respectively ð¥ is defined as
ð = ð¥ =
ðð¥
ð
: ð = ð
3/23/2023 Dr. Ritika Saini Unit-I 81
Variance (CO1)
ð2 =
ð=1
ð
ð¥ð â ð¥ 2
ð
;
82. where ð = ð=1
ð
ðð
Note. In case of a frequency distribution with class intervals, the values
of ð¥ are the midpoints of the intervals.
Example1. Find the Variance and standard deviation for the following
individual series.
Solution:
3/23/2023 Dr. Ritika Saini Unit-I 82
ð 3 6 8 10 18
ð2 =
ð=1
ð
ðð ð¥ð â ð¥ 2
ð
;
Variance (CO1)
84. ⢠Example: Find the variance and standard deviation for the
following frequency distribution.
⢠Sol.
3/23/2023 Dr. Ritika Saini Unit-I 84
Marks 5-15 15-25 25-35 35-45 45-55 55-65
No. of
students
10 20 25 20 15 10
Variance (CO1)
87. Q1. Find the mean of the following data:
14,20,30,22,25,18,40,50,55 and 65
Q2. Find the mode of the following distribution:
6,4,3,5,6,3,3,2,4,3,4,3,3,4,4,2,3
3/23/2023 87
Daily Quiz(CO1)
Dr. Ritika Saini Unit-I
88. Q1. Discuss the scope of Statistics.
Q2. State the objectives and essentials of an Ideal average.
Q3. Find the mean of the following data:
15,20,30,22,25,18,40,50,55 and 65
Q4. Find the mode of the following distribution:
7,4,3,5,6,3,3,2,4,3,4,3,3,4,4,2,3
3/23/2023 88
Weekly Assignment(CO1)
Dr. Ritika Saini Unit-I
89. Moments:
⢠In mathematical statistics it involve a basic calculation. These
calculations can be used to find a probability distribution's
mean, variance, and skewness.
3/23/2023 Dr. Ritika Saini Unit-I 89
Topic Objective (CO1)
90. ï±Moments: The moment of a distribution are the arithmetic
means of the various powers of the deviations of items from
some given number.
ï Moments about mean (central moment)
ï Moments about any arbitrary number (Raw Moment)
ï Moments about origin
3/23/2023 Dr. Ritika Saini Unit-I 90
Moments (CO1)
91. Individual data: Moment about mean ðð = ð=1
ð
ð¥ðâð¥ ð
ð
; r = 0,1,2, ⊠.
Frequency distribution: Moment about mean ðð = ð=1
ð
ð ð¥ðâð¥ ð
ð
; r =
0,1,2, ⊠.
⢠Individual data: Moment about any value ðâ²ð = ð=1
ð
ð¥ðâðŽ ð
ð
; r =
0,1,2, ⊠.
Frequency distribution:
Moment about any value ðâ²ð = ð=1
ð
ð ð¥ðâðŽ ð
ð
; r = 0,1,2, ⊠.
⢠Individual data: Moment about origin ðð = ð=1
ð
ð¥ð
ð
ð
; r = 0,1,2, ⊠.
Frequency distribution:Moment about origin ðð = ð=1
ð
ð ð¥ð
ð
ð
; r =
0,1,2, ⊠.
3/23/2023 Dr. Ritika Saini Unit-I 91
Summary (CO1)
92. ïMoment about mean (central moment):
ï¶ For an Individual Series :If ð¥1, ð¥2,âŠ..ð¥ð are the values of the variable
under consideration , the ðð¡â moment ðð about mean ð¥ is defined
as
ï¶ For a frequency Distribution: If ð¥1,ð¥2,âŠ.,ð¥ð are the values of a
variable ð¥ with the corresponding frequencies ð1, ð2, ⊠. , ðð
respectively then ðð¡â moment ðð about the mean ð¥ is defined as
3/23/2023 Dr. Ritika Saini Unit-I 92
Central Moments (CO1)
Moment about mean ðð = ð=1
ð
ð¥ðâð¥ ð
ð
; r = 0,1,2, ⊠.
93. where ð = ð=1
ð
ðð
in particular ð0 =
1
ð ð=1
ð
ðð ð¥ð â ð¥ 0 =
1
ð ð=1
ð
ðð =
ð
ð
= 1
Note. In case of a frequency distribution with class intervals, the values
of ð¥ are the midpoints of the intervals.
Example1. Find the first four moments for the following individual
series.
Solution: Calculation of Moments
3/23/2023 Dr. Ritika Saini Unit-I 93
ð 3 6 8 10 18
ðð =
ð=1
ð
ðð ð¥ð â ð¥ ð
ð
; r = 0,1,2 ⊠.
Central Moments (CO1)
97. For any distribution,ð0 = 1 for r=1
ð1 =
1
ð
ð=1
ð
ðð ð¥ð â ð¥ =
1
ð
ð=1
ð
ððð¥ð â ð¥
1
ð
ð=1
ð
ðð = ð¥ â ð¥ = 0
For any distribution,ð1 = 0, for r=2,
ð2 =
1
ð
ð=1
ð
ðð ð¥ð â ð¥ 2 = ð. ð· 2 = ðððððððð
Therefore for any distribution ,ð2 coincides with the variance of the
distribution.
Similarly,ð3 =
1
ð ð=1
ð
ðð ð¥ð â ð¥ 3
ð4 =
1
ð ð=1
ð
ðð ð¥ð â ð¥ 4and so on.
3/23/2023 Dr. Ritika Saini Unit-I 97
Central Moments (CO1)
98. ⢠Example: Find ð1,ð2,ð3,ð4 for the following frequency
distribution.
⢠Sol. Calculation of Moments:
3/23/2023 Dr. Ritika Saini Unit-I 98
Marks 5-15 15-25 25-35 35-45 45-55 55-65
No.of
students
10 20 25 20 15 10
Central Moments (CO1)
101. SHEPARDâS CORRECTIONS FOR MOMENTS: While
computing moments for frequency distribution with class intervals, we
take variables ð¥ as the midpoint of class intervals which means that we
have assumed the frequencies concentrated at the midpoints of class
intervals. The above assumption is true when the distribution is
symmetrical and the no. of class intervals is not greater than
1
20
of the
range, otherwise the computation of moments will have error called
grouping error.
This error is corrected by the following formula given by
W.F.Sheppard.
3/23/2023 Dr. Ritika Saini Unit-I 101
Central Moments (CO1)
ðð = ðð â
ðð
ðð
102. Where h is the width of class interval while ð2ððð ð3 require no
correction. These formulae are known as Sheppardâs corrections.
Example: Find the corrected values of the following moments using
Sheppard's correction. The width of classes in the distribution is 10.
ð2 = 214 ð3 = 468 ð4 = 96712
Sol. We have ð2 = 214 ð3 = 468 ð4 = 96712 h=10
ð2(ððððððð¡ðð) = ð2 â
â2
12
= 214 â
10 2
12
= 214 â 8.333
= 205.667
ð3 ððððððð¡ðð = ð3 = 468
3/23/2023 Dr. Ritika Saini Unit-I 102
Central Moments (CO1)
ð4 = ð4 â
1
2
â2ð2 +
7
240
â4
104. ï MOMENTS ABOUT AN ARBITARY NUMBER(Raw
Moments):
ï¶ If ð¥1, ð¥2, ð¥3, ⊠. . , ð¥ð are the values of a variable ð¥ with the corresponding
frequencies ð1, ð2, ð3,âŠ..ðð respectively then ðð¡â
moment ððâ² about the
number ð¥ = ðŽ is defined as
Where,ð = ð=1
ð
ðð
For ð = 0, ðâ²0 =
1
ð ð=1
ð
ðð ð¥ð â ðŽ 0
= 1
3/23/2023 Dr. Ritika Saini Unit-I 104
Raw Moments (CO1)
ðâ²ð =
1
ð
ð=1
ð
ðð ð¥ð â ðŽ ð; ð = 0,1,2, âŠ
105. For ð = 1, ðâ²1 =
1
ð ð=1
ð
ðð ð¥ð â ðŽ =
1
ð ð=1
ð
ððð¥ð â
ðŽ
ð ð=1
ð
ðð = ð¥ â ðŽ
For ð = 2, ðâ²2 =
1
ð ð=1
ð
ðð ð¥ð â ðŽ 2
For ð = 3, ðâ²3 =
1
ð ð=1
ð
ðð ð¥ð â ðŽ 3 and so on.
In Calculation work, if we find that there is some common factor â(>1)
in values of ð¥ â ðŽ,we can ease our calculation work by defining ð¢ =
ð¥âðŽ
â
.
In that case , we have
ðâ²ð =
1
ð
ð=1
ð
ððð¢ð
ð âð; ð = 0,1,2, ⊠.
3/23/2023 Dr. Ritika Saini Unit-I 105
Raw Moments (CO1)
107. ïMOMENTS ABOUT THE ORIGIN:
If ð¥1, ð¥2, ⊠⊠, ð¥ð be the values of a variable ð¥ with corresponding
frequencies ð1, ð2, ⊠⊠, ðð respectively then ðð¡â moment about the
origin ð£ð is defined as
Where, ð = ð=1
ð
ðð
For ð = 0, ð£0 =
1
ð ð=1
ð
ððð¥ð
0 =
ð
ð
= 1
For ð = 1, ð£1 =
1
ð ð=1
ð
ððð¥ð = ð¥
For ð = 2, ð£2 =
1
ð ð=1
ð
ððð¥ð
2
and so on.
3/23/2023 Dr. Ritika Saini Unit-I 107
Moments about the origin (CO1)
ð£ð =
1
ð
ð=1
ð
ððð¥ð
ð
; r = 0,1,2, ⊠.
111. ï¶KARL PERSONâS ð· ðšðµð« ðž COEFFICIENTS:
Karl Pearson defined the following four coefficients based upon the
first four moments of a frequency distribution about it mean:
The practical use of this coefficients is to measure the skewness and
kurtosis of a frequency distribution .These coefficients are pure
numbers independent of units of measurement.
3/23/2023 Dr. Ritika Saini Unit-I 111
KARL PERSONâS COEFFICIENTS(CO1)
ðœ1 =
ð3
2
ð2
3 ðœ2 =
ð4
ð2
2 (ðœ âcoefficients)
ðŸ1 = + ðœ1ðŸ2 = ðœ2 â 3 (ðŸ âcoefficients)
112. Example1 : The first three moments of a distribution about the
value â2â of the variable are 1,16 and â40.Show that the mean
is 3,variance is 15 and ð3 = â86.
Solution: We have A=2,ðâ²1 = 1,ðâ²2 = 16 and ðâ²3 = â40
We have that ðâ²1 = ð¥ â ðŽ â¹ ð¥ = ðâ²1 + ðŽ = 1 + 2 = 3
Variance=ð2 = ðâ²2 â ðâ²1
2
= 16 â 1 2 = 15
ð3 = ðâ²3 â 3ðâ²
2ðâ²
1 + 2ðâ²
1
3
= â40 â 3 16 1 + 2 1 3
= â40 â 48 + 2 = â86.
3/23/2023 Dr. Ritika Saini Unit-I 112
KARL PERSONâS COEFFICIENTS(CO1)
113. Example 2:The first moments of a distribution about the value â35â
areâ1.8,240, â1020 ððð 144000.Find the values of ð1, ð2, ð3, ð4.
Solution:ð1 = 0
ð2 = ðâ²2 â ð1â²2 = 240 â â1.8 2 = 236.76
ð3 = ðâ²3 â 3ðâ²2ðâ²1 +2ðâ²1
3
= â1020 â 3 240 â1.8 + 2 â1.8 3 = 264.36
ð4 = ðâ²4 â 4ðâ²
3ðâ²
1 + 6ðâ²
2ðâ²2
1 â 3ðâ²4
1
= 144000 â 4 â1020 â1.8 + 6 240 â1.8 2â3 â1.84 4
= 141290.11.
3/23/2023 Dr. Ritika Saini Unit-I 113
KARL PERSONâS COEFFICIENTS(CO1)
114. Example 3:Calculate the variance and third central moment from
the following data.
Solution: Calculation of Moments
3/23/2023 Dr. Ritika Saini Unit-I 114
ðð 0 1 2 3 4 5 6 7 8
ð¹ð 1 9 26 59 72 52 29 7 1
ð ð ð =
ðâðš
ð
, ðš = ð, ð = ð ðð ððð ððð
0 1 -4 -4 16 -64
1 9 -3 -27 81 -243
2 26 -2 -52 104 -208
3 59 -1 -59 59 -59
4 72 0 0 0 0
KARL PERSONâS COEFFICIENTS(CO1)
115. 3/23/2023 Dr. Ritika Saini Unit-I 115
ðâ²1 =
ðð¢
ð
h =
â7
256
= â0.02734
ðâ²2 =
ðð¢2
ð
â2
=
507
256
=1.9805
KARL PERSONâS COEFFICIENTS(CO1)
117. Example 4: The first four moments of a distribution about the value
â4âof the
variable are -1.5,17,â30 and 108.Find the moments about mean,
about origin;ðœ1 ððð ðœ2 also find the moments about the point ð¥ = 2.
Solution: We have A=4,ðâ²1 = â1.5, ðâ²
2 = 17, ðâ²
3 = â30, ðâ²
4 = 108
Moments about mean
ð1 = 0
ð2 = ðâ²2 â ð1â²2 = 14.75
ð3 = ðâ²3 â 3ðâ²
2ðâ²
1 + 2ð1â²3 = 39.75
ð4 = ðâ²4 â 4ðâ²
3ðâ²
1 + 6ðâ²
2ð1â²2 â 3ð1â²4 = 142.3125
ð¥ = ðâ²1 + ðŽ = â1.5 + 4 = 2.5
3/23/2023 Dr. Ritika Saini Unit-I 117
KARL PERSONâS COEFFICIENTS(CO1)
119. 3/23/2023 Dr. Ritika Saini Unit-I 119
Daily Quiz(CO1)
Q1. The first four moments of a distribution are 3,
10.5,40.5,168.Comment upon the nature of the distribution.
Q2. For a distribution, the mean is 10,variance is 16,ðŸ1 is 1 and
ðœ2 is 4. Find the first four moment about origin.
120. Skewness
⢠It tells us whether the distribution is normal or not
⢠It gives us an idea about the nature and degree of
concentration of observations about the mean
⢠The empirical relation of mean, median and mode are based
on a moderately skewed distribution
3/23/2023 Dr. Ritika Saini Unit-I 120
Topicobjective(CO1)
121. ï±Skewness:
⢠It meanslack of symmetry.
⢠It gives us an idea about the shape ofthe curve which we candraw with
the help of the given data.
⢠A distribution issaidto beskewedifâ
Mean, median and mode fall at different points, i.e.,
Mean Æ= Median Æ= Mode;
⢠Quartiles are not equidistant from median; and
⢠The curve drawn with the help of the given data is not symmetrical
but stretched more to one side than to the other.
3/23/2023 Dr. Ritika Saini Unit-I 121
Skewness(CO1)
122. Symmetrical Distribution:
A symmetric distribution is a type of distribution where the left
side of the distribution mirrors the right side. In a symmetric
distribution, the mean,modeand medianall fall at the same point.
3/23/2023 Dr. Ritika Saini Unit-I 122
Skewness(CO1)
123. Measures o f Skewness:
The measuresof skewnessare:
⢠Sk = M âMd,
⢠Sk = M âMo,
⢠Sk = (Q3 â Md) â (Md â Q1),
where M is the mean, Md , the median, Mo , the mode, Q1, the first quartile
deviation andQ3, the third quartile deviation of the distribution.
Thesearethe absolute measuresof skewness.
⢠C o e f f i c i e n t s o f Skewness: For comparing two series we do not
calculate these absolute measures but we calculate the relative measures
called the coefficients of skewness which are pure numbers independent of
units of measurement.
3/23/2023 Dr. Ritika Saini Unit-I 123
Skewness(CO1)
124. The following arethe coefficients ofskewness:
⢠Prof. Karl PearsonâsCoefficient of Skewness,
⢠Prof. BowleyâsCoefficient of Skewness,
⢠Coefficient of SkewnessbaseduponMoments.
P r o f. K a r l Pearsonâs C o e f f i c i e n t o f Skewness:
Definition
⢠It isdefined as:
ððŸð =
ðŽ. ð. âðððð
ð. ð·
=
3 ð â Md
Ï
whereÏisthe standard deviation of the distribution. If modeisill-
ðððð=3Median-2mean
3/23/2023 Dr. Ritika Saini Unit-I 124
Skewness(CO1)
125. defined, then using the empirical relation,
Mo = 3Md â 2M, for amoderately asymmetricaldistribution, we have
⢠From abovetwo formulas, weobservethat Sk = 0 if M = Mo = Md.
⢠Hence for a symmetrical distribution, mean, median and mode
coincide.
⢠Skewness is positive if M > Mo or M > Md , and negative if M <
Mo or M < Md.
⢠Limits are:|Sk |†3or â3 †Sk â€3.
⢠However,in practice, theselimits arerarely attained.
3/23/2023 Dr. Ritika Saini Unit-I 125
Skewness(CO1)
126. Coefficient of Skewness based on Moment
Definition:
It isdefinedas: ðŸ1 =
ð3
ð2
3
whereðŸ1arePearsonâsCoefficients anddefined as:
Sk= 0, if either ðœ1= 0 or ðœ2= â3. Thus Sk= 0, if and only
if ðœ1=0.
Thus for asymmetrical distribution ðœ1=0.
In this respectðœ1istakenasameasureofskewness.
3/23/2023 Dr. Ritika Saini Unit-I 126
Skewness(CO1)
127. ⢠The coefficient of skewness based upon moments is to be regarded as
without sign.
⢠The Pearsonâs and Bowleyâs coefficients of skewness can be positive as
well asnegative.
ï¶Positively Skewed Distribution: The skewness is
positive if the larger tail of the distribution lies towards the higher
valuesof the variate (the right),i.e., if the curve drawn
with the help of the given data is
stretched moreto the right than
to the left.
3/23/2023 Dr. Ritika Saini Unit-I 127
Skewness(CO1)
128. ï¶Negatively Skewed Distribution:
The skewness is negative if the larger tail of the distribution lies
towards the lower values of the variate (the left), i.e., if the curve
drawn with the help of the given data is stretched more to the left
than to the right.
3/23/2023 Dr. Ritika Saini Unit-I 128
Skewness(CO1)
129. Pearsonâs ð·ða n d ðž ð C o e f f i c i e n t s :
ðž ð = ð·ð = ±
ðð
ðð
ð
Q1. Karl Pearson coefficient of skewness of a distribution is 0.32, its
standard deviation is 6.5 and mean is 29.6. find the mode of the
distribution.
Solution: Given that ððŸð = 0.32, Ï=6.5mean=29.6
ððŸð =
ðŽ. ð. âðððð
ð. ð·
=
3 ð â Md
Ï
0.32 =
29.6 â ðððð
6.5
â¹ ðððð = 27.52
3/23/2023 Dr. Ritika Saini Unit-I 129
Skewness(CO1)
130. Kurtosis:
⢠Describe the concepts of kurtosis
⢠Explain the different measures of kurtosis
⢠Explain how kurtosis describe the shape of a distribution.
3/23/2023 Dr. Ritika Saini Unit-I 130
Topic objective (CO1)
131. ï±Kurtosis
⢠If we know the measures of central tendency, dispersion and
skewness, we still cannot form a complete idea about the
distribution. Let usconsiderthe figure in which all the three curves
⢠A, B, and C are symmetrical about the mean and have the same
range.
3/23/2023 Dr. Ritika Saini Unit-I 131
Kurtosis (CO1)
132. Definition: Kurtosis is also known asConvexity of the Frequency Curvedue to
Prof. KarlPearson.
⢠It enables us to have an idea about the flatness or peaknessof the
frequencycurve.
⢠It ismeasurebythe coefficient β2 or its derivationγ2 givenas:
ðœ2 =
ð4
ð2
2
⢠Curve of the type A which is neither flat nor peaked is called the normal
curve ormesokurtic curveandfor such curve ðœ2= 3, i.e., γ2=0.
⢠Curve of the type B which is flatter than the normal curve is known as
platycurticcurve andfor suchcurve ðœ2<3, i.e., γ2<0.
3/23/2023 Dr. Ritika Saini Unit-I 132
Kurtosis (CO1)
133. Curve of the type C which is more peaked than the normal curveis called
leptokurticcurveandfor suchcurve ðœ2>3, i.e., γ2>0.
Q2. For a distribution, the mean is 10,variance is 16,γ1 is +1and ðœ2is 4.
Commentabout the nature ofdistribution. Also find third central moment.
Solution: 1 = ±
ðð
ðððð
â ðð=64,ðð=16,
4 =
ð4
256
â ð4 = 1024
Since γ1= +1, thedistributionis moderatelypositivelyskewed,i.e,
if we draw the curve of the given distribution, it will have longer tail towards
theright.
Further,since ðœ2= 4>3,thedistributionis leptokurtic,i.e.,
itwillbeslightly morepeakedthan thenormalcurve.
3/23/2023 Dr. Ritika Saini Unit-I 133
Kurtosis (CO1)
134. Example 3: The first four moment about the working mean 28.5 of a
distribution are 0.294,7.144,42.409 and 454.98. Calculate the first four
moment about mean. Also evaluate ðœ1 and ðœ2and comment upon the
skewness and kurtosis of the distribution.
Solution: ðâ²1= .294,ðâ²2 = 7.144, ðâ²3 = 42.409, ðâ²4 = 454.98Moment
about mean
ð1 = 0,
ð2 = ð2
â²
â ð1â²2
= 7.0576.
ð3 = ð3
â²
â 3ð2
â²
ð1â² + 2ð1â²3 = 36.1588,
ð4 = ð4
â²
â 4ð3
â²
ð1
â²
+ 6ð2
â²
ð1â²2
â 3ð1â²4
= 408.7896
3/23/2023 Dr. Ritika Saini Unit-I 134
Kurtosis (CO1)
135. ðœ1 =
ð2
3
ð2
3 = 3.7193,
ðœ2 =
ð4
ð2
2
= 8.207
Skewness :ðœ1 is positive so ðŸ 1 =
1.9285 so distribution is positivley skewed.
Kurtosis: ðœ2 = 8.207 > 3 so distribution is leptokutic.
3/23/2023 Dr. Ritika Saini Unit-I 135
Kurtosis (CO1)
136. Q1. Find all four central moments and Discuss Skewness and
Kurtosis for the following distribution-
3/23/2023 Dr. Ritika Saini Unit-I 136
Daily Quiz(CO1)
Range of
Expenditures
2-4 4-6 6-8 8-10 10-12
No. of
families
38 292 389 212 69
138. ð¥ =
ðð¥
ð
=
6964
1000
= 6.964 = 7
Moment about mean
ð1 =
ð(ð¥ â ð¥)
ð
= 0,
ð2 =
ð(ð¥ â ð¥)2
ð
= 3.728. ð3 =
ð(ð¥ â ð¥)3
ð
= 1.344
ð4 =
ð(ð¥ â ð¥)4
ð
= 35.456
ðœ1 =
ð2
3
ð2
3 =
(1.344)2
(3.3728)3
= 0.034, ðœ2 =
ð4
ð2
2
=
35.456
(3.728)2
= 2.55
Skewness :ðœ1 is positive so ðŸ 1 = 0.184 so distribution is positivley skewed.
Kurtosis: ðœ2 = 2.554 < 3 so distribution is platykurtic.
3/23/2023 Dr. Ritika Saini Unit-I 138
Kurtosis (CO1)
139. Example : The First four moments of a distribution about ð¥ = 4are
1, 4, 10, ððð 45.Find the first four moments about mean. Discuss the
Skewness and Kurtosis and also comment upon the nature of the
distribution.
Solution: Here We haveA = 4, ðâ²1 = 1, ðâ²
2 = 4,
ðâ²
3 = 10, ðâ²
4 = 45
Moments about mean
ð1 = 0
ð2 = ðâ²2 â ð1â²2
= 4 â 1 2
= 3
ð3 = ðâ²3 â 3ðâ²
2ðâ²
1 + 2ð1â²3
= 10 â 3 4 1 + 2 1 3
= 0
ð4 = ðâ²4 â 4ðâ²
3ðâ²
1 + 6ðâ²
2ð1â²2
â 3ð1â²4
= 45 â 4 10 1 + 6 4 1 2 â 3 1 4 = 26
3/23/2023 Dr. Ritika Saini Unit-I 139
Skewness& Kurtosis (CO1)
140. Skewness: The Coefficients of skewness, ðŸ1 =
ð3
ð2
3
=
0
33
= 0
Hence distribution is symmetrical.
Kurtosis: Since ðœ2 =
ð4
ð2
2 =
26
3 2 = 2.89 < 3.
Hence distribution is Platykurtic.
3/23/2023 Dr. Ritika Saini Unit-I 140
Skewness& Kurtosis (CO1)
141. Example :Calculate the first four moments about mean from the
following data.
Also find the measures of skewness and kurtosis.
Solution: Calculation of Moments
Since
ðððð ð¥ =
ðð¥
ð
3/23/2023 Dr. Ritika Saini Unit-I 141
ðð 2 2.5 3 3.5 4 4.5 5
ð¹ð 5 38 65 92 70 40 10
Skewness& Kurtosis (CO1)
144. Third central moment= 0.009840.
ð4 = ðâ²4 â 4ðâ²
3ðâ²
1 + 6ðâ²
2ð1â²2 â 3ð1â²4
= 0.5074 â 4 0.0609 0.0375 + 6 0.4548 0.0375 2 â
3 0.0375 4
= 0.5021.
Fourth central moment= 0.5021.
Skewness: The Coefficients of skewness, ðŸ1 =
ð3
ð2
3
=
0.009840
0.4533 3
= 0.03224
Hence distribution is positive skewed.
Kurtosis: Since ðœ2 =
ð4
ð2
2 =
0.5021
0.4533 2 = 2.4437 < 3.
Hence distribution is Platykurtic.
3/23/2023 Dr. Ritika Saini Unit-I 144
Skewness& Kurtosis (CO1)
145. ïŒ Moments
ïŒ Relation between ð£ð ððð ðð
ïŒ Relation between ðð ððð ðâ²ð
ïŒ Moment generating function.
ïŒ Skewness
ïŒ Kurtosis
3/23/2023 Dr. Ritika Saini Unit-I 145
Recap(CO1)
146. Curve Fitting:
⢠The objective of curve fitting is to find the parameters of a
mathematical model that describes a set of data in a way that
minimizes the difference between the model and the data.
3/23/2023 Dr. Ritika Saini Unit-I 146
Topic objectives(CO1)
147. ï±Curve Fitting :Curve fitting means an exact relationship between
two variables by algebraic equation. It enables us to represent the
relationship between two variables by simple algebraic expressions
e.g. polynomials, exponential or logarithmic functions. .It is also
used to estimate the values of one variable corresponding to the
specified values of other variables.
ï¶METHOD OF LEAST SQUARES: Method of least squares
provides a unique set of values to the constants and hence suggests
a curve of best fit to the given data.
3/23/2023 Dr. Ritika Saini Unit-I 147
Curve Fitting (CO1)
148. ⢠FITTING A STRAIGHT LINE: Let ð¥ð, ðŠð , ð = 1,2, ⊠. ð be n sets of
observations of related data and
ðŠ = ð. 1 + ð. ð¥ (1)
Normal equations
ðŠ = ðð + ð ð¥ (2)
ð¥ðŠ = ð ð¥ + ð ð¥2 (3)
If n is odd then,ð¢ =
ð¥â(ðððððð ð¡ððð)
ððð¡ððð£ðð(â)
If n is even then,ð¢ =
ð¥â(ðððð ðð ð¡ð€ð ðððððð ð¡ðððð )
1
2
(ððð¡ððð£ðð)
3/23/2023 Dr. Ritika Saini Unit-I 148
Curve Fitting (CO1)
149. Q. Fit a straight line to the following data by least square
method.
Sol. Let the straight line obtained from the given data be
ðŠ = ð. 1 + ðð¥ (1)
then the normal equations are
ðŠ = ðð + ð ð¥ (2)
ð¥ðŠ = ð ð¥ + ð ð¥2 (3) m=5
3/23/2023 Dr. Ritika Saini Unit-I 149
ð 0 1 2 3 4
ðŠ 1 1.8 3.3 4.5 6.3
Curve Fitting (CO1)
151. ï FITTING OF AN EXPONENTIAL CURVE
Let ðŠ = ðððð¥
Taking logarithm on both sides, we get
log10 ðŠ = log10 ð + ðð¥ log10 ð
ð = ðŽ + ðµð
Where ð = log10 ðŠ , ðŽ = log10 ð,ðµ = ð log10 ð, ð = ð¥
The normal equation for (1) are
ð = ððŽ + ðµ ð ððð ðð = ðŽ ð + ðµ ð2
Solving these, we get A and B.
Then ð = ððð¡ðððð ðŽððð ðµ =
ðµ
log10 ð
3/23/2023 Dr. Ritika Saini Unit-I 151
Curve Fitting (CO1)
152. ï FITTING OF THE CURVE
Let ðŠ = ðð¥ð
Taking logarithm on both sides, we get
log10 ðŠ = log10 ð + ð log10 ð¥
ð = ðŽ + ðµð
Where ð = log10 ðŠ , ðŽ = log10 ð,ðµ = ð , ð = log10 ð¥
The normal equation to (1) are
ð = ððŽ + ðµ ð ððð ðð = ðŽ ð + ðµ ð2
Which results A and B on solving and ð = ððð¡ðððð ðŽ, ð = ðµ.
3/23/2023 Dr. Ritika Saini Unit-I 152
Curve Fitting (CO1)
153. ï FITTING OF THE CURVEð = ððð
Taking logarithm on both sides, we get
ððð ðŠ = ððð ð + ð¥ðððð
ð = ðŽ + ðµð
Where ð = ððð ðŠ , ðŽ = ðððð,ðµ = ðððð , ð = ð¥.
This is a linear equation in ð and ð.
For estimating ðŽ ððð ðµ, equation to are
ð = ððŽ + ðµ ð ððð ðð = ðŽ ð + ðµ ð2
Where n is the number of Pairs of values of ð¥ ððð ðŠ.
Ultimately, ð = ððð¡ðððð ðŽ ððð ð = ððð¡ðððð(ðµ).
Example 2. Obtain a relation of the form ðŠ = ððð¥
for the following
data by the method of least squares:
3/23/2023 Dr. Ritika Saini Unit-I 153
Curve Fitting (CO1)
155. The normal equations are ð = 5ðŽ + ðµ ð¥
ð¥ð = ðŽ ð¥ + ðµ ð¥2
Substituting the above values, we get
7.5455=5A+20B and 33.1812=20A+90B
On solving A=0.31 and B=0.3
ð = ððð¡ðððððŽ = 2.04 ððð ð = ððð¡ðððððµ = 1.995
Hence the required curve is ðŠ = 2.04(1.995)ð¥
3/23/2023 Dr. Ritika Saini Unit-I 155
Curve Fitting (CO1)
156. ï FITTING OF THE CURVE ð±ð² = ð + ðð
ð¥ðŠ = ð + ðð¥ â ðŠ =
ð
ð¥
+ ð
ð = ðð + ð, ð€âððð ð =
1
ð¥
Normal equations are ð = ðð + ð ð¥ ððð ðð = ð ð¥ + ð ð¥2.
ï FITTING OF THE CURVE ð = ððð
+
ð
ð
normal equations are
ð¥2
ðŠ = ð ð¥4
+ ð ð¥ and
ðŠ
ð¥
= ð ð¥ + ð
1
ð¥2
3/23/2023 Dr. Ritika Saini Unit-I 156
Curve Fitting (CO1)
157. ï FITTING OF THE CURVE ð = ðð + ððð
Normal equations are
ð¥ðŠ = ð ð¥2 + ð ð¥3 and ð¥2ðŠ = ð ð¥3 + ð ð¥4
ï FITTING OF THE CURVE ð = ðð +
ð
ð
normal equations are
ð¥ðŠ = ð ð¥2 + ðð and
ðŠ
ð¥
= ðð + ð
1
ð¥2
Where n is the numbers of pairs of values of ð¥ ððð ðŠ.
3/23/2023 Dr. Ritika Saini Unit-I 157
Curve Fitting (CO1)
158. ï FITTING OF THE CURVE ðð
= ððð
+ ðð + ðª
Normal equations are 2ð¥ ð¥2 = ð ð¥4 + ð ð¥3 + ð ð¥2
2ð¥
ð¥ = ð ð¥3
+ ð ð¥2
+ ð ð¥
2ð¥ = ð ð¥2 + ð ð¥ + ðð
Where m is no.of points (ð¥ð, ðŠð)
ï FITTING OF THE CURVE ð = ððâðð + ððâðð
Normal equations are
ðŠðâ3ð¥ = ð ðâ6ð¥ + ð ðâ5ð¥
ðŠðâ2ð¥ = ð ðâ5ð¥ + ð ðâ4ð¥
3/23/2023 Dr. Ritika Saini Unit-I 158
Curve Fitting (CO1)
159. Example 3. By the method of least squares, find the curve ðŠ =
ðð¥ + ðð¥2
that best fits the following data:
Sol. Normal equations are
ð¥ðŠ = ð ð¥2 + ð ð¥3
ð¥2
ðŠ = ð ð¥3
+ ð ð¥4
Let us form a table as below:
3/23/2023 Dr. Ritika Saini Unit-I 159
ð 1 2 3 4 5
ðŠ 1.8 5.1 8.9 14.1 19.8
Curve Fitting (CO1)
161. Substituting these values in equation(1) and (2),we get
194.1=55ð+225ð
822.9=225ð+979ð
ð =
83.85
55
â 1.52 and b=
317.4
664
â .49
Hence required parabolic curve is ðŠ = 1.52ð¥ + 0.49ð¥2
ï FITTING OF THE CURVE ðððž
= ð â¹ ð = ð
ð
ðžð
âð
ðž
Taking logarithm on both side we get
ðððð£ =
1
ðŸ
ðððð â
1
ðŸ
ðððð
3/23/2023 Dr. Ritika Saini Unit-I 161
Curve Fitting (CO1)
162. ð = ðŽ + ðµð
Where ð = ðððð£, ðŽ =
1
ðŸ
ðððð, ðµ = â
1
ðŸ
and ð = ðððð
ðŸ ððð ð are determined by above equations. Normal equations
are obtained as that of the straight line.
Example 4. Fit the curve ðððŸ = ð to following data:
3/23/2023 Dr. Ritika Saini Unit-I 162
Curve Fitting (CO1)
163. Solution: ðððŸ = ð
ð =
ð
ð
1
ðŸ
= ð
1
ðŸð
â1
ðŸ
log ð =
1
ðŸ
log ð â
1
ðŸ
log ð
Which is of the form
ð = ðŽ + ðµð
Where Y = log ð , ð = log ð , A =
1
ðŸ
log ð , ðµ = â
1
ðŸ
3/23/2023 Dr. Ritika Saini Unit-I 163
Curve Fitting (CO1)
166. Example 5. Use the method of least squares to the fit the curve:
ðŠ =
ð0
ð¥
+ ð1 ð¥ to the following table of values:
ï Solution: Let given curve is ð =
ðð
ð
+ ðð ð
Normal equations are
ðŠ
ð¥
= ð0
1
ð¥2
+ ð1
1
ð¥
ðŠ ð¥ = ð0
1
ð¥
+ ð1 ð¥ .
3/23/2023 Dr. Ritika Saini Unit-I 166
Curve Fitting (CO1)
X 0.1 0.2 0.4 0.5 1 2
Y 21 11 7 6 5 6
168. 33,71524 = 10.10081ð0 + 4.2ð1
so we have
ð0 = 1.97327, ð1 = 3.28182
Hence the curve is
ð =
1.97327
ð
+ 3.28182 ð
3/23/2023 Dr. Ritika Saini Unit-I 168
Curve Fitting (CO1)
169. Q. Fit a second degree parabola to the following data-
3/23/2023 Dr. Ritika Saini Unit-I 169
Daily Quiz(CO1)
ð¥ 0 1 2 3 4
ð 1 0 3 10 21
170. ïŒ Moments
ïŒ Relation between ð£ð ððð ðð
ïŒ Relation between ðð ððð ðâ²ð
ïŒ Moment generating function.
ïŒ Skewness & kurtosis
ïŒ Curve fitting
3/23/2023 Dr. Ritika Saini Unit-I 170
Recap(CO1)
171. Correlation
⢠Identify the direction and strength of a correlation between two
factors.
⢠Compute and interpret the Pearson correlation coefficient and
test for significance.
⢠Compute and interpret the coefficient of determination.
⢠Compute and interpret the Spearman correlation coefficient and
test for significance.
3/23/2023 Dr. Ritika Saini Unit-I 171
Topic objective (CO1)
172. ïC o r r e l at i o n : In a bivariate distribution we are interested to find
out if thereisany correlationbetweenthetwovariablesunder study.
⢠If the change in one variable affects a change in the other variable, the
variablesaresaid to becorrelated.
ï¶Positive C o r re l at i o n
⢠If the two variables deviate in the same direction, i.e., if the increase (or
decrease) in one results in a corresponding increase (or decrease) in the
other, correlation is said to be director positive.
⢠For example, the correlation between (i) the heights and weights of a
group of persons,and (ii) the income and expenditure;is positive.
3/23/2023 Dr. Ritika Saini Unit-I 172
Correlation(CO1)
173. ïNegative C o r re l at i o n :
⢠If the two variables deviate in the opposite directions, i.e., if increase (or
decrease) in one results in corresponding decrease (or increase) in the
other, correlation is said to be diverseornegative.
⢠For example, the correlation between (i) the price and demand of a
commodity, and (ii) the volume and pressure of a perfect gas; is
negative.
ïP e r f e c t C o r re l at i o n
⢠Correlation is said to be perfect if the deviation in one variable is
followed bya correspondingand proportional deviation in the other.
3/23/2023 Dr. Ritika Saini Unit-I 173
Correlation(CO1)
174. S c a t t e r Diagram:
⢠For the bivariate distribution (xi, yi ); i = 1, 2, ..., n, if the values of the
variables X and Y are plotted along the x-axis and y-axis respectively in the
x-y plane, the diagram of dots so obtained is known asscatter diagram.
⢠It is the simplest way of the diagrammatic representation of bivariate
data.
⢠From the scatter diagram, we can form an idea whether the variables are
correlated or not.
⢠For example, if the points are very dense, i.e., very close to each other, a
correlation is expected.
⢠Ifthe points arewidely scattered, a poor correlation is expected.
⢠This method, however, is not suitable if the number of observations is
fairly large.
3/23/2023 Dr. Ritika Saini Unit-I 174
Correlation(CO1)
175. C o r re l at i o n Coefficient:
⢠The correlation coefficientdue to Karl Pearson is defined as a measure of
intensity or degreeof linear relationship between twovariables.
⢠K a r l PearsonâsC o r r e l a t i o n C o e f f i c i e n t
⢠Karl Pearsonâs correlation coefficient between two variables X and Y , is
denoted by r (X, Y ) or rXY, is a measure of linear relationship between them
and is definedas:
⢠r(X,Y)=
ð¶ðð£(ð¥,ðŠ)
ÏXÏY
⢠f(xi, yi); i= 1,2,...,n is the bivariate distribution, then
⢠Cov(X,Y)=E[{XâE(X)}{YâE(Y)}]
3/23/2023 Dr. Ritika Saini Unit-I 175
Correlation(CO1)
176. KARL PEARSONâS CO âEFFICIENT OF CORRELATION(OR
PRODUCT MOMENT CORRELATION CO-EFFICIENT)
Correlation co-efficient between two variable ð¥ ððð ðŠ, usually denoted
by ð ð¥, ðŠ ðð ðð¥ðŠ is a numerical measure of linear relationship between
them and defined as
ðð¥ðŠ =
ð¥ð â ð¥ ðŠð â ðŠ
ð¥ð â ð¥ 2 ðŠð â ðŠ 2
=
1
ð
ð¥ð â ð¥ ðŠð â ðŠ
1
ð
ð¥ð â ð¥ 2.
1
ð
ðŠð â ðŠ 2
3/23/2023 Dr. Ritika Saini Unit-I 176
Correlation(CO1)
177. =
1
ð
ð¥ð â ð¥ ðŠð â ðŠ
ðð¥ððŠ
ðð¥ðŠ =
ð¥ â ð¥ ðŠ â ðŠ
ððð¥ððŠ
Or ð ð¥, ðŠ =
ð ð¥ðŠâ ð¥ ðŠ
ð ð¥2â ð¥ 2 ð ðŠ2â ðŠ 2
Here ð is the no. of pairs of values of ð¥ ððð ðŠ.
Note: Correlation co efficient is independent of change of origin and
scale.
Let us define two new variables ð¢ ððð ð£ ðð
ð¢ =
ð¥âð
â
, ð£ =
ðŠâð
ð
where ð, ð, â, ð ððð ðððð ð¡ððð¡ ð¡âðð ðð¥ðŠ = ðð¢ð£
Then ð ð¢, ð£ =
ð ð¢ð£â ð¢ ð£
ð ð¢2â ð¢ 2 ð ð£2â ð£ 2
3/23/2023 Dr. Ritika Saini Unit-I 177
Correlation(CO1)
181. Hence,n=6,ð¢ =
1
ð
ð¢ =
1
6
â3 = â
1
2
; ð£ =
1
ð
ð£ =
1
6
â3 = â
1
2
Then ðð¢ð£ =
ð ð¢ð£â ð¢ ð£
ð ð¢2â ð¢ 2 ð ð£2â ð£ 2
=
6 Ã 12 â â3 â3
6 Ã 19 â â3 2 6 Ã 19 â â3 2
=
63
105 105
= 0.6
ï¶ Calculation of co-efficient of correlation for a bivariate frequency
distribution.
⢠If the bivariate data on ð¥ ððð ðŠ is presented on a two way correlation
table and ð is the frequency of a particular rectangle
⢠In the correlation table then
3/23/2023 Dr. Ritika Saini Unit-I 181
Correlation(CO1)
182. ðð¥ðŠ =
ðð¥ðŠ â
1
ð
ðð¥ ððŠ
ðð¥2 â
1
ð
ðð¥ 2 ððŠ2 â
1
ð
ððŠ 2
Since change of origin and scale do not affect the co-efficient of
correlation.ðð¥ðŠ = ðð¢ð£ where the new variables ð¢, ð£ are properly
chosen.
Q. The following table given according to age the frequency of
marks obtained by 100 students is an intelligence test:
3/23/2023 Dr. Ritika Saini Unit-I 182
Correlation(CO1)
183. Calculate the coefficient of correlation between age and intelligence.
Solution: Age and intelligence be denoted by ð¥ ððð ðŠ respectively.
3/23/2023 Dr. Ritika Saini Unit-I 183
Marks 18 19 20 21 total
10-20 4 2 2 8
20-30 5 4 6 4 19
30-40 6 8 10 11 35
40-50 4 4 6 8 22
50-60 2 4 4 10
60-70 2 3 1 6
Total 19 22 31 28 100
Correlation(CO1)
186. RANK CORRELATION:
Definition: Assuming that no two individuals are bracketed equal in
either classification,each of the variables X and Y takes the values 1,
2,...,n.
Hence, the rank correlation coefficient between A andBisdenoted by
r,and is givenas:
ð = ð â
ð ð«ð
ð
ð ðð â ð
3/23/2023 Dr. Ritika Saini Unit-I 186
Rank Correlation(CO1)
187. Question. Compute the rank correlation coefficient for the
following data.
Sol. Here the ranks are given and ð = 10
3/23/2023 Dr. Ritika Saini Unit-I 187
Person A B C D E F G H I J
Rank in
maths
9 10 6 5 7 2 4 8 1 3
Rank in
physics
1 2 3 4 5 6 7 8 9 10
Rank Correlation(CO1)
188. 3/23/2023 Dr. Ritika Saini Unit-I 188
Person ð¹ð ð¹ð D=ð¹ð â ð¹ð ð«ð
A 9 1 8 64
B 10 2 8 64
C 6 3 3 9
D 5 4 1 1
E 7 5 2 4
F 2 6 -4 16
G 4 7 -3 9
H 8 8 0 0
I 1 9 -8 64
J 3 10 -7 49
ð·2
= 280
Rank Correlation(CO1)
189. ð = 1 â
6 ð·2
ð ð2 â 1
= 1 â
6 Ã 280
10 100 â 1
= 1 â 1.697 = â0.697
Uses:
⢠It is used for finding correlation coefficient if we are dealing with
qualitative characteristicswhich cannot be measured quantitatively but
can be arrangedserially.
⢠It can also be usedwhereactual data aregiven.
⢠In case of extreme observations,Spearmanâs formula is preferred to
Pearsonâsformula.
Limitations
⢠It is not applicable in the caseof bivariate frequency distribution.
3/23/2023 Dr. Ritika Saini Unit-I 189
Rank Correlation(CO1)
190. ⢠For n >30, this formula should not be used unless the ranks are given,
since in the contrary casethe calculations arequitetime-consuming.
TIED RANKS: If some of the individuals receive the same rankin a
rankingofmerit,theyaresaidtobetied.
⢠Let us suppose that m of the individuals, say, (k + 1)th,
(k+2)th,...,(k+m)th,aretied.
⢠Then each of these m individuals assigned a common rank,which is
arithmetic meanof the ranksk + 1,k+2,...,k+m.
ð = ð â
ð ð«ð +
ð
ðð
ðð ðð
ð â ð +
ð
ðð
ðð ðð
ð â ð + â¯
ð ðð â ð
3/23/2023 Dr. Ritika Saini Unit-I 190
Tied Correlation(CO1)
191. Question: Obtain the rank correlation co-efficient for the
following data:
Solution: Here marks are given so write down the ranks
3/23/2023 Dr. Ritika Saini Unit-I 191
ð 68 64 75 50 64 80 75 40 55 64
ðŠ 62 58 68 45 81 60 68 48 50 70
Tied Correlation(CO1)
194. Q1. Find the rank correlation coefficient for the following data:
3/23/2023 Dr. Ritika Saini Unit-I 194
Daily Quiz(CO1)
ð¥ 23 27 28 28 29 30 31 33 35 36
ðŠ 18 20 22 27 21 29 27 29 28 29
195. ïŒ Correlation
ïŒ Karl Pearson coefficient of correlation
ïŒ Rank Correlation
ïŒ Tied Rank
3/23/2023 Dr. Ritika Saini Unit-I 195
Recap(CO1)
196. Regression:
⢠Explanation of the variation in the dependent variable, based
on the variation in independent variables and Predict the
values of the dependent variable.
3/23/2023 Dr. Ritika Saini Unit-I 196
Topic objectives (CO1)
197. ï±REGRESSION ANALYSIS:
⢠Regression measures the nature and extent of correlation
.Regression is the estimation or prediction of unknown values of one
variable from known values of another variable.
Difference between curve fitting and regression analysis: The only
fundamental difference, if any between problems of curve fitting and
regression is that in regression, any of the variables may be considered
as independent or dependent while in curve fitting, one variable cannot
be dependent.
Curve of regression and regression equation:
⢠If two variates ð¥ ððð ðŠ are correlated i.e., there exists an association
or relationship between them, then the scatter diagram
3/23/2023 Dr. Ritika Saini Unit-I 197
Regression Analysis(CO1)
198. will be more or less concentrated round a curve. This curve is called
the curve of regression and the relationship is said to be expressed by
means of curvilinear regression.
⢠The mathematical equation of the regression curve is called
regression equation.
Some following types of regression will discuss here:
ï Linear Regression
ï Non- linear Regression
ï Multiple linear Regression
3/23/2023 Dr. Ritika Saini Unit-I 198
Regression Analysis(CO1)
199. ïLINEAR REGRESSION:
⢠When the point of the scatter diagram concentrated round a
straight line, the regression is called linear and this straight
line is known as the line of regression.
⢠Regression will be called non-linear if there exists a
relationship other than a straight line between the variables
under consideration.
3/23/2023 Dr. Ritika Saini Unit-I 199
Linear Regression(CO1)
200. LINES OF REGRESSION: A line of regression is the straight line
which gives the best fit in the least square sense to the given
frequency.
LINES OF REGRESSION
Let ðŠ = ð + ðð¥ ----.(1)
be the equation of regression line of ðŠ ðð ð¥.
ðŠ = ðð + ð ð¥ ⊠⊠.(2)
ð¥ðŠ = ð ð¥ + ð ð¥2 ⊠⊠.(3)
Solving (2) and (3) for âðâ and âðâ we get.
ð =
ð¥ðŠâ
1
ð
ð¥ ðŠ
ð¥2â
1
ð
ð¥ 2
=
ð ð¥ðŠâ ð¥ ðŠ
ð ð¥2â ð¥ 2 âŠ..(4)
3/23/2023 Dr. Ritika Saini Unit-I 200
Linear Regression(CO1)
201. ð =
ðŠ
ð
â ð
ð¥
ð
= ðŠ â ðð¥ ⊠âŠ(5)
Eqt.(5) given ðŠ = ð + ðð¥
Hence ðŠ = ð + ðð¥ line passes through point ð¥, ðŠ
Putting ð = ðŠ â ðð¥ in equation ðŠ = ð + ðð¥ ,we get
ðŠ â ðŠ = ð ð¥ â ð¥ âŠâŠâŠ(6)
Eqt.(6) is called regression line of ðŠ ðð ð¥.â² ðâ² is called the regression
coefficient of ðŠ ðð ð¥ and is usually denoted by ððŠð¥.
ðŠ â ðŠ = ððŠð¥ ð¥ â ð¥
ððŠð¥ = ð
ððŠ
ðð¥
3/23/2023 Dr. Ritika Saini Unit-I 201
Linear Regression(CO1)
202. ð¥ = ð + ððŠ
ð¥ â ð¥ = ðð¥ðŠ ðŠ â ðŠ
Where ðð¥ðŠ is the regression coefficient of ð¥ ðð ðŠ and is given by
ðð¥ðŠ =
ð ð¥ðŠ â ð¥ ðŠ
ð ðŠ2 â ( ðŠ)2
Or ðð¥ðŠ = ð
ðð¥
ððŠ
where the terms have their usual meanings.
USE OF REGRESSION ANALYSIS:
A) In the field of a business this tool of statistical analysis is widely used
.Businessmen are interested in predicting future production,
Consumption ,investment, prices, profits and sales etc.
B) In the field of economic planning and sociological studies, projections
of population birth rates ,death and other similar variables are of great
use.
3/23/2023 Dr. Ritika Saini Unit-I 202
Linear Regression(CO1)
203. Where ð¥ ððð ðŠare mean values while
ððŠð¥ =
ð ð¥ðŠ â ð¥ ðŠ
ð ð¥2 â ð¥ 2
In eqt.(3),shifting the origin to ð¥, ðŠ , we get
ð¥ â ð¥ ðŠ â ðŠ = ð ð¥ â ð¥ + ð ð¥ â ð¥ 2
â ðððð¥ððŠ = ð 0 + ðððð¥
2
â ð = ð
ððŠ
ðð¥
Where ð is the coefficient of correlation ðð¥ððð ððŠ are the standard
deviations of ð¥ ððð ðŠ series respectively.
3/23/2023 Dr. Ritika Saini Unit-I 203
Linear Regression(CO1)
204. PROPERTIES OF REGRESSION COEFFICIENTS:
Property 1. Correlation coefficient is the geometric mean between the
regression coefficients.
Proof :The coefficients of regression are
ðððŠ
ðð¥
and
ððð¥
ððŠ
.
G.M. between them =
ðððŠ
ðð¥
Ã
ððð¥
ððŠ
= ð2 = r = coefficient of
correlation.
Property 2. If one of the regression coefficients is greater than unity,
the other must be less than unity.
Proof. The two regression coefficients are ððŠð¥ =
ðððŠ
ðð¥
and ðð¥ðŠ =
ððð¥
ððŠ
.
3/23/2023 Dr. Ritika Saini Unit-I 204
Regression Analysis Properties(CO1)
205. Let ððŠð¥ >1,then
1
ððŠð¥
< 1
Since ððŠð¥. ðð¥ðŠ = ð2 †1
ðð¥ðŠ â€
1
ððŠð¥
< 1
Similarly if ðð¥ðŠ > 1, ð¡âðð ððŠð¥ < 1.
Property 3. Airthmetic mean of regression coefficient is greater than
the Correlation coefficient.
Proof. We have to prove that
ððŠð¥+ ðð¥ðŠ
2
> ð
r
ððŠ
ðð¥
+ r
ðð¥
ððŠ
> 2ð
3/23/2023 Dr. Ritika Saini Unit-I 205
Regression Analysis Properties(CO1)
206. ðð¥
2
+ ððŠ
2
> 2ðð¥ððŠ
ðð¥ â ððŠ
2
> 0 which is true.
Property 4: Regression coefficients are independent of the origin but
not of scale.
Proof. Let ð¢ =
ð¥âð
â
, ð£ =
ðŠâð
ð
, where a, b, h and k are constants
byx =
ðððŠ
ðð¥
= r.
ððð£
âðð¢
=
ð
â
ððð£
ðð¢
=
ð
â
ðð£ð¢
Similarly, ðð¥ðŠ =
â
ð
ðð¢ð£ ,
Thus ððŠð¥ and ðð¥ðŠ are both independent of a and b but not of â ððð ð.
3/23/2023 Dr. Ritika Saini Unit-I 206
Regression Analysis Properties(CO1)
207. Property 5: The correlation coefficient and the two regression
coefficient have same sign.
Proof: Regression coefficient of ðŠ ðð ð¥ = ððŠð¥ = ð
ððŠ
ðð¥
Regression coefficient of x ðð ðŠ = ðð¥ðŠ = ð
ðð¥
ððŠ
Since ðð¥ and ððŠ are both positive; ððŠð¥, ðð¥ðŠ and ð have same sign.
⢠ANGLE BETWEEN TWO LINES OF REGRESSION:
If ð is the acute angle between the two regression lines in the case of
two variables ð¥ ððð ðŠ ,show that
3/23/2023 Dr. Ritika Saini Unit-I 207
Regression Analysis Properties(CO1)
208. ð¡ððð =
1âð2
ð
.
ðð¥ððŠ
ðð¥
2+ððŠ
2 , where ð, ðð¥,ððŠ have their usual meanings.
Explain the significance of the formula where ð = 0 ððð ð = ±1
Proof: Equations to the lines of regression of ðŠ ðð ð¥ ððð ð¥ ðð ðŠ ððð
ðŠ â ðŠ =
ðððŠ
ðð¥
ð¥ â ð¥ and (ð¥ â ð¥)=
ððð¥
ððŠ
(ðŠ â ðŠ)
The slopes are ð1 =
ðððŠ
ðð¥
and ð2 =
ððŠ
ððð¥
tanð = ±
ð2âð1
1+ð2ð1
= ±
ððŠ
ððð¥
â
ðððŠ
ðð¥
1+
ððŠ2
ðð¥2
3/23/2023 Dr. Ritika Saini Unit-I 208
Regression Analysis Properties(CO1)
209. = ±
1 â ð2
ð
.
ððŠ
ðð¥
.
ðð¥
2
ðð¥
2 + ððŠ
2
= ±
1 â ð2
ð
.
ðð¥ððŠ
ðð¥
2 + ððŠ
2
Since ð2 †1 and ðð¥, ððŠ are positive.
tanð =
1âð2
ð
.
ðð¥ððŠ
ðð¥
2+ððŠ
2 Where ð = 0, ð =
ð
2
the two lines of regression
are Perpendicular to each other. Hence the estimated value of ðŠ is the
same for all values of ð¥ and vice versa.
When ð = ±1, ð¡ððð = 0 so that ð = 0 ðð ð
Hence the lines of regression coincide and there is perfect correlation
between the two variates ð¥ ððð ðŠ.
3/23/2023 Dr. Ritika Saini Unit-I 209
Regression Analysis Properties(CO1)
210. Q. The equation of two regression lines, obtained in a correlation
analysis of 60 observations are:
5ð¥ = 6ðŠ + 24 ððð 1000ðŠ = 768ð¥ â 3608.What is the correlation
Coefficient ?Show that the ratio of coefficient of variability of
ð¥ ð¡ð ð¡âðð¡ ðð ðŠ is
5
24
.What is the ratio of variance of ð¥ ððð ðŠ?
Solution: Regression line of ð¥ ðð ðŠ ðð
5ð¥ = 6ðŠ + 24
ð¥ =
6
5
ðŠ +
24
5
ðð¥ðŠ =
6
5
Regression line of ðŠ ðð ð¥ ðð
3/23/2023 Dr. Ritika Saini Unit-I 210
Linear Regression(CO1)
211. 1000ðŠ = 768ð¥ â 3608
ðŠ = 0.768ð¥ â 3.608
ððŠð¥ = 0.768
ð
ðð¥
ððŠ
=
6
5
âŠâŠ..(3)
ð
ððŠ
ðð¥
=0.768âŠ.(4)
Multiply equations(3) and (4) we get
ð2
= 0.9216 â ð = 0.96
Dividing (3) by (4) we get
ðð¥
2
ððŠ
2
=
6
5
Ã
1
0.768
= 1.5625
3/23/2023 Dr. Ritika Saini Unit-I 211
Linear Regression(CO1)
212. Taking square root, we get
ðð¥
ððŠ
=1.25 =
5
4
Since the regression lines pass through the point(ð¥, ðŠ) we have
5ð¥ = 6ðŠ + 24
1000ðŠ = 768ð¥ â 3608
Solving the above equation ð¥ððððŠ ,we get ð¥=6, ðŠ =1
Coefficient of variability of ð¥ =
ðð¥
ð¥
Coefficient of variability of y =
ððŠ
ðŠ
Required ratio=
ðð¥
ð¥
Ã
ðŠ
ððŠ
=
ðŠ
ð¥
ðð¥
ððŠ
=
1
6
Ã
5
4
=
5
24
3/23/2023 Dr. Ritika Saini Unit-I 212
Linear Regression(CO1)
213. ïNON-LINEAR REGRESSION:
Let ðŠ = ð. 1 + ðð¥ + ðð¥2
Be a second degree parabolic curve of regression of ðŠ on ð¥.
â ðŠ = ðð + ð ð¥ + ð ð¥2
â ð¥ðŠ = ð ð¥ + ð ð¥2
+ ð ð¥3
â ð¥2ðŠ = ð ð¥2 + ð ð¥3 + ð ð¥4
3/23/2023 Dr. Ritika Saini Unit-I 213
Non-Linear Regression(CO1)
214. ïMULTIPLE LINEAR REGRESSION:
Where the dependent variable is a function of two or more linear or
non linear independent variables. consider such a linear function as
ðŠ = ð + ðð¥ + ðð§
ðŠ = ðð + ð ð¥ + ð ð§
ð¥ðŠ = ð ð¥ + ð ð¥2 + ð ð¥ð§
ðŠð§ = ð ð§ + ð ð¥ð§ + ð ð§2
Solving the above equations we get values of ð, ð ððð ð then we get
linear function ðŠ = ð + ðð¥ + ðð§ is called the regression plan.
3/23/2023 Dr. Ritika Saini Unit-I 214
Multiple Linear Regression(CO1)
215. Q. Obtain a regression plane by using multiple linear regression
To fit the data given below.
Sol. Let ðŠ = ð + ðð¥ + ðð§ ðð ð¡âð ðððð¢ðððð ðððððð ð ððð ððððð ð€âððð
ð, ð, ð ððð ð¡âð ðððð ð¡ððð¡ð ð¡ð be determined by following equations.
ðŠ = ðð + ð ð¥ + ð ð§
ð¥ðŠ = ð ð¥ + ð ð¥2
+ ð ð¥ð§
3/23/2023 Dr. Ritika Saini Unit-I 215
ð 1 2 3 4
ðŠ 12 18 24 30
Multiple Linear Regression(CO1)
ð§ 0 1 2 3
218. Q1. Two lines of regression are given by 7ð¥ â 16ðŠ + 9 = 0 and
â 4ð¥ + 5ðŠ â 3 = 0 and ð£ðð(ð¥)=16.Calculate
(i) the mean of ð¥ and ðŠ
(ii) variance of ðŠ
(iii) The correlation coefficient.
3/23/2023 Dr. Ritika Saini Unit-I 218
Daily Quiz(CO1)
219. Q1. Fit a straight line trend by the method of least square to the following
data:
Q2. From the following data calculate Karl Pearson's coefficient of skewness
Q3. Write regression equations of X on Y and of Y on X for the following data
-
3/23/2023 Dr. Ritika Saini Unit-I 219
Weekly Assignment(CO1)
Year 1979 1980 1981 1982 1983 1984
Production
5 7 9 10 12 17
Marks
Less than
10 20 30 40 50 60 70
No. of
students
10 30 60 110 150 180 200
220. Q4. Fit a straight line trend by the method of least squares to the
following data: -
3/23/2023 Dr. Ritika Saini Unit-I 220
Weekly Assignment(CO1)
X 1 2 3 4 5
Y 2 4 5 3 6
Year 2012 2013 2014 2015 2016 2017
Sales of
T.V. sets
(inâ000)
7 10 12 14 17 24
221. Suggested Youtube/other Video Links:
https://youtu.be/wWenULjri40
https://youtu.be/mL9-WX7wLAo
https://youtu.be/nPsfqz9EljY
https://youtu.be/nqPS29IvnHk
https://youtu.be/aaQXMbpbNKw
https://youtu.be/wDXMYRPup0Y
https://youtu.be/m9a6rg0tNSM
https://youtu.be/Qy1YAKZDA7k
https://youtu.be/Qy1YAKZDA7k
https://youtu.be/s94k4H6AE54
https://youtu.be/lBB4stn3exM
https://youtu.be/0WejW9MiTGg
https://youtu.be/QAEZOhE13Wg
https://youtu.be/ddYNq1TxtM0
https://youtu.be/YciBHHeswBM
https://youtu.be/VCJdg7YBbAQ
https://youtu.be/VCJdg7YBbAQ
https://youtu.be/yhzJxftDgms
3/23/2023 Dr. Ritika Saini Unit-I 221
Topic Video Links, Youtube & NPTEL Video