central tendency and correlation coeeficent

1,119 views

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,119
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

central tendency and correlation coeeficent

  1. 1. QTBD 2013 UNIT-1 Measures of Central Tendency Definition:  Average is a measure which represents the huge volume of data into a single numerical value.  An average gives us an idea about the concentration of the values in the central part of the distribution.  Averages are the typical values around which the other distribution concentrates. Types of Measures 1) Arithmetic Mean (or) Average 2) Median 3) Mode 4) Geometric Mean 5) Harmonic Mean Characteristics of Measures of central tendency       It should be easy to understand and easy to calculate. It should be based on all items. It should be capable for further algebraic calculations. It should be rigidly defined. It should not affected by the extreme observations. It should not affected by the fluctuations of the sampling. Demerits of measures of Central Tendency        It can’t be determined by inspection method nor can’t locate by graphically. Arithmetic mean can’t be used for qualitative characteristics, which cannot be measured quantitatively. Ex. Honesty, Intelligence, beauty, etc. Arithmetic mean cannot be used for open ended class-intervals. Ex. below 90 and above 100. Arithmetic mean is affected by extreme values. Arithmetic mean leads to wrong conclusions if the details of the data from which it is computed are given. Arithmetic mean cannot be obtained if the single observation is missing or lost from the remaining values. Arithmetic mean is not suitable measure for extremely asymmetric distribution. Method to calculate Average 1) Direct method. 2) In-direct method (or) Deviation method. 3) Step Deviation method. 1) Direct method : K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 1
  2. 2. QTBD 2013 𝑛 ̅ = ∑ 𝑖=1 𝑋 𝑖 ⁄ 𝑛 𝑋 𝑛 𝑛 ̅ = ∑ 𝑖=1 𝑓𝑖 𝑋 𝑖 ⁄∑ 𝑖=1 𝑓𝑖 𝑋 𝑛 𝑛 ̅ = ∑ 𝑖=1 𝑓𝑖 𝑋 𝑖 ⁄∑ 𝑖=1 𝑓𝑖 𝑋  Raw Data -----------  Discrete Data -----  Continuous Data- 2) Deviation Method :  Raw Data -----------  Discrete Data -----  Continuous Data- 𝑛 ̅ = A + ∑ 𝑖=1 𝑑 𝑖 ⁄ 𝑛 𝑋 𝑛 𝑛 ̅ = A + ∑ 𝑖=1 𝑓𝑖 𝑑 𝑖 ⁄∑ 𝑖=1 𝑓𝑖 𝑋 𝑛 𝑛 ̅ = ∑ 𝑖=1 𝑓𝑖 𝑋 𝑖 ⁄∑ 𝑖=1 𝑓𝑖 𝑋 3) Step-Deviation Method : 𝑛 ̅ = A + ∑ 𝑖=1 ̅𝑖 ⁄ 𝑛 𝑋 𝐶 𝑋 𝑑 𝑛 𝑛 ̅ = A + ∑ 𝑖=1 𝑓𝑓𝑖 ̅𝑖 ⁄∑ 𝑖=1 𝑓𝑖 𝑋 𝑑 𝑛 𝑛 ̅ = A + ∑ 𝑖=1 𝑓𝑖 ̅𝑖 ⁄∑ 𝑖=1 𝑓𝑖 𝑋 𝑑  Raw Data -----------  Discrete Data -----  Continuous Data- 2) Median: Median is defined as “middle most “or “Central value “of the set of the observations, when Observations are arranged in ascending or descending order of their magnitude. It divides the given arranged series into two equal parts. Median is also known as ‘Positional Average “.Whereas mean is known as ‘Calculated average “. When a series consists of even number of terms then median is known as arithmetic mean Of the central items. It is denoted by 𝑀 𝑑. Formulas:  Raw Data ----------- Arrange the given set of data in ascending or descending Order. Case – i) If n is odd then median is the value given by 𝑡ℎ 𝑀 𝑑 = (𝑛 + 1)⁄2 term Where n = No. of observations Case –ii) If n is even number then median is given by 𝑀𝑑 = (𝑛⁄2)+(𝑛+1⁄2) 2 the term  Discrete Data ------ STEP -1: Find the cumulative frequencies of the given data. 𝑛 STEP -2: Find N = ∑ 𝑖=1 𝑓𝑖 STEP -3: Find the cumulative frequency just greater than 𝑁 ⁄2 and the corresponding value of X is known as median value.  Continuous Data--- STEP -1: Find the cumulative frequencies of the given data. 𝑛 STEP -2: Find N = ∑ 𝑖=1 𝑓𝑖 STEP -3: Then value of median is given by 𝑀𝑑 = L + { 𝑁⁄2−𝑚 𝑓 } 𝑋 𝐶 K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 2
  3. 3. QTBD 2013 Where L = Lower limit of the median class F = frequency of the median class M = the cumulative frequency preceding the median class C = width of the class interval 𝑛 N = ∑ 𝑖=1 = sum of the frequencies. 3) MODE: Mode is a value in a series which occurs most frequently. In a frequency distribution mode Is the value which has the maximum frequency. In other words, mode is the value which has the Greatest frequency density in its neighbourhood. Mod e is also known as most frequent value or difficult value or predominant value or most fluctuation value or norm value. FORMULAS:  Raw Data ----------- In this case the value which has maximum frequency is known as mode value.  Discrete Data ------ In this case mode is the value which has maximum frequency corresponding the X  Continuous Data--- STEP -1: Find the cumulative frequencies of the given data. 𝑛 STEP -2: Find N = ∑ 𝑖=1 𝑓𝑖 STEP -3: Then value of median is given by 𝑀𝑂 =L+ 𝑓1−𝑓𝑜 2𝑓1−𝑓𝑜−𝑓2 XC 4) GEOMETRIC MEAN: The geometric mean of n observations is the n th root of the product of the observations. Let X1, X2, X3 ... Xn are given set of n observations then the geometric mean is given by 𝑛 G.M. = √(X1), ( X2), ( X3). . (. Xn ) = {(X1), ( X2), ( X3). . (. Xn )} 1⁄ 𝑛 If n= 2 the the geometric mean mean is the square root of the product of the observations. EXA MPLE: The geometric mean of 4 and 16 2 2 G.M. = √(4). (16) = √64 = 8 If the observations are greater than 2 then the computation of n th root is not suitable, in that case we can take logarithm. Log (G.M.) = log {(X1), ( X2), ( X3). . (. Xn )} 1⁄ 𝑛 = 1⁄ 𝑛 𝑙𝑜𝑔 {(X1), ( X2), ( X3). . (. Xn )} = 1⁄ 𝑛 {log(𝑋1 ). log(𝑋2 ) . log(𝑋13 ) … … . log(𝑋 𝑛 )} FORMULAS:  Raw Data ------------- 𝑛 G.M. = Anti log {(1⁄ 𝑛)(∑ 𝑖=1 log 𝑋 𝑖 )} 𝑛  Discrete Data ------ G.M. = Anti log {(1⁄ 𝑁)(∑ 𝑖=1 𝑓𝑖 log 𝑋 𝑖 )} 𝑛  Continuous Data--- G.M. = Anti log {(1⁄ 𝑁)(∑ 𝑖=1 𝑓𝑖 log 𝑚 𝑖 )} 5) Harmonic Mean: K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 3
  4. 4. QTBD 2013 The harmonic mean is the reciprocal of arithmetic mean of reciprocal of observations. If X1, X2, X3 ... Xn are given set of n observations then the harmonic mean is given by H.M. = 1 𝑛 1⁄ 𝑛(∑ 𝑖=1 𝑋 𝑖 ) FORMULAS:  Raw Data ------------- H.M. =  Discrete Data ------ H.M. =  Continuous Data--- 1 𝑛 1⁄ 𝑛(∑ 𝑖=1 𝑋 𝑖 ) 1 f 1⁄n(∑n i⁄X ) i=1 i 1 H.M. = f 1⁄n(∑n i⁄mi ) i=1 Measures of dispersion Definition: The meaning of dispersion is ‘scateredness’. The measure of scatter of the given data about the average is said to be a measure of dispersion. Characteristics of Good Measure of Dispersion         It should be easy to understand. It should be based on all items. It should be readily comprehensible. Its procedure should be simple. It should be rigidly defined. It should be capable for further algebraic calculations. It should not affected by the extreme observations. It should not affected by the fluctuations of the sampling. Types of Measures 1) Range. 2) Quartile Deviation. 3) Standard Deviation. 4) Mean Deviation. In the above the first two measures are known as ‘positional averages’ and the remaining measures are known as ‘calculated averages’. Formulas: 1) Range : Range is the difference between the values of the extreme values. It is denoted by R.  Raw Data ----- ---- Range = R= (Largest value- Smallest value) = L-S  Discrete Data ----- Range = R= (Largest value- Smallest value) = L-S  Continuous Data - Range = R= (Largest value- Smallest value) = L-S Coefficient of Range K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 4
  5. 5. QTBD 2013 Coefficient of range = 𝐿−𝑆 𝐿+𝑆 2) Quartile deviation : Quartile deviation is denoted by Q.D. If Q1 is the first quartile and Q3 is the third Quartile. Then quartile deviation is as follows 𝑄3−𝑄1 Q.D. = 2 𝑄3−𝑄1  Raw Data ----- ---- Q.D. = 2 𝑄3−𝑄1  Discrete Data ----- Q.D. = 2 𝑄3−𝑄1  Continuous Data - Q.D. = 2 3) Mean Deviation : If X1 , X2 , X3, ...... Xn are n observations and di= Xi – a then the mean deviation is denoted by M.D. And is given by 𝑛 ∑ 𝑖=0 |𝑑𝑖| M.D. = where di = Xi- ̅ 𝑋 ̅ = mean 𝑋 𝑛  Raw Data ----- ---- M.D. =  Discrete Data ----- M.D. =  Continuous Data - M.D. = 𝑛 ∑ 𝑖=0 |𝑑𝑖| where di = Xi- ̅ 𝑋 ̅ = mean 𝑋 where di = Xi- ̅ 𝑋 ̅ = mean 𝑋 mi- ̅ 𝑋 ̅ = mean 𝑋 𝑛 𝑛 ∑ 𝑖=0 𝑓𝑖|𝑑𝑖| 𝑓𝑖 𝑛 ∑ 𝑖=0 𝑓𝑖|𝑑𝑖| where di = 𝑓𝑖 Coefficient of Mean Deviation: Coefficient of Mean Deviation = Mean Deviation 𝑀𝑒𝑎𝑛 4) Standard Deviation : If X1 , X2 , X3, ...... Xn are n observations and di= Xi Is denoted by S.D. and is given by ̅ then the standard deviation 𝑋 2 2 S.D. = √{(∑n di ⁄n) − (∑n di ⁄n) } i=1 i=1 2 2 2 2  Raw Data ----- ---- S.D. = √{(∑n di ⁄n) − (∑n di ⁄n) } i=1 i=1  Discrete Data ----- S.D. = √{(∑n di ⁄n) − (∑n di ⁄n) } i=1 i=1 2 2  Continuous Data - S.D. = √{(∑n di ⁄n) − (∑n di ⁄n) } i=1 i=1 Coefficient of Variation: C.V. = 100 x(𝜎⁄ ̅ ) 𝑋 K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 5
  6. 6. QTBD 2013 PROBLEMS ON MEASURES OF CENTRAL TENDENCY: 1) PROBLEMS ON ARITHMETIC MEAN: a) Direct Method:  Raw Data: 1) Find the average for the following data ̅= 𝑋 Solution: 𝛴𝑋 𝑛 = 620 10 = 62  Discrete Data: 1) Find the Arithmetic mean for the following data X 10 20 30 40 f 5 15 25 20 Solution: X f 5 50 20 15 300 30 25 750 40 20 800 50 10 500 60 5 300 𝞢f = 80 60 5 Xf 10 50 10 𝞢X f =2700 ̅= 𝑋 𝛴𝑓𝑋 𝛴𝑓 = 2700 80 =33.75 b) In-Direct Method or Deviation Method:  Raw Data Problem -1 Family A Income 90 Solution: Calculate the average for the following data B C D E F G 75 60 100 125 50 80 K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR H 120 I 500 J 400 Page 6
  7. 7. QTBD 2013 Family Income 𝑑 𝑖 = 𝑋𝑖 - A A 90 -10 B 75 -25 C 60 -40 D 100 0 E 125 25 F 50 -50 G 80 -20 H 120 20 I 500 400 J 400 300 𝞢𝑑 𝑖 = 600 ̅=A+ 𝑋 Σdi n = 100 + 600 = 100 + 60 = 160 100  Discrete Data: Problem -1 Calculate the average for the following data X 10 20 30 40 50 f 5 15 25 20 10 Solution: X 𝑓𝑖 𝑑 𝑖 = 𝑋𝑖 - A 𝑓𝑖 𝑑 𝑖 10 5 -30 -150 20 15 -20 -300 30 25 -10 -250 40 20 0 0 50 10 10 100 60 5 20 60 5 100 𝞢𝑓𝑖 = 80 ̅=A+ 𝑋 Σ𝑓 𝑖 di = 40 + ⌈ 𝞢𝑓𝑖 𝑑 𝑖 =-500 −500 ⌉= 80 40 -6.25 = 33.75 Σ𝑓 𝑖  Continuous Data : 1) Find the Arithmetic mean for the following data C.I 0-10 10-20 20-30 30-40 40-50 f 1 4 10 22 30 K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR 50-60 35 60-70 10 70-80 7 80-90 1 Page 7
  8. 8. QTBD 2013 Solution: C.I 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 Total f 1 4 10 22 30 35 10 7 1 𝛴𝑓𝑖 =120 𝑓𝑖 𝑚 𝑖 𝑚𝑖 5 15 25 35 45 55 65 75 85 5 60 250 770 1350 1925 650 525 85 𝞢𝑓𝑖 𝑚 𝑖 =5620 𝑑 𝑖 = 𝑚 𝑖A -50 -40 -30 -20 -10 0 10 20 30 𝑓𝑖 𝑑 𝑖 -50 -160 -300 -440 -300 0 100 140 30 𝞢𝑓𝑖 𝑑 𝑖 = -980 𝑓𝑖 ̅𝑖 𝑑 di ̅𝑖 = 𝑑 c -5 -4 -3 -2 -1 0 1 2 3 -5 -16 -30 -44 -30 0 10 14 3 𝞢𝑓𝑖 ̅𝑖 = 𝑑 -98 1) PROBLEMS ON MEDIAN:  Raw Data : Problem -1 Find the median for the following data also calculates 𝑄1 & 𝑄3 values. X 120 170 100 110 180 220 160 Solution: X 120 Arrange the given data in ascending order n=7 𝑄2 Or 𝑚 𝑑 = ( 110 → 𝑄1 =4 120 𝑄1 = ( 160 → 𝑄2 =2 𝑡ℎ 7+1 𝑡ℎ ) 2 term = ( 8 𝑡ℎ term = (2) term term = 160 ⟹𝑚 𝑑 = 160 𝑡ℎ 𝑛+1 𝑡ℎ ) 4 7+1 𝑡ℎ ) 4 term = ( 8 𝑡ℎ term = (4) term term = 110 ⟹𝑄1 = 160 3(𝑛+1) 𝑡ℎ ) 4 𝑄3 = ( 170 𝑛+1 𝑡ℎ ) 2 3(7+1) 𝑡ℎ ) 4 term = ( 24 𝑡ℎ term = ( 4 ) term = 6 𝑡ℎ term = 180 ⟹𝑄1 = 180 180 → 𝑄3 220  Discrete Data: Problem – 1 Find the median for the following data also calculate 𝑄1 & 𝑄3 values. X 10 20 30 40 50 60 f 5 15 25 20 10 5 Solution: K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 8
  9. 9. QTBD 2013 X f ⟹N = 80 C.f N 80 4 N 4 80 2 2 ⟹ = 10 5 5 ⟹ = 20 15 = 20 ⟹ 𝑄1 = 20 =40 ⟹ 𝑀 𝑑 𝑜𝑟 𝑄2 = 30 3N 20 30 25 20 10 5 3(80) 4 ⟹ 𝑄3 = 40 = 60 75 60 = 65 50 4 45 40 ⟹ 80  Continuous Data Problem -1 Find the median for the following data also calculates 𝑄1 & 𝑄3 values. C.I 0-10 10-20 20-30 30-40 40-50 50-60 f 4 6 10 15 8 7 Solution: C.I f C.f N = 50 N 50 4 N 4 50 = 12.5 0-10 4 4 ⟹ = 10-20 6 10 → 𝑚1 ⟹ = 20-30 10 → 𝑓0 20 → 𝑚2 ⟹ 30-40 15 → 𝑓1 35 → 𝑚3 40-50 8 → 𝑓2 43 = 20 + ⌈ 50-60 7 50 = 20 + 2.5 = 22.5 𝑄2 = 𝐿1 + ⌈ 𝑄3 = 𝐿1 + ⌈ (N⁄2)− m2 ⌉Xc f2 3(N⁄4)− m3 f3 ⌉Xc = 30 + ⌈ = 25 2 2 3N 3(50) 4 = 4 𝑄1 = 𝐿1 + ⌈ 25−20 = 40 + ⌈ ⌉ (N⁄4)− m1 f1 12.5−10 10 X 10 15 37.5−35 8 = 37.5 ⌉ ⌉ ⌉Xc X 10 = 30 +3.33 X 10 = 33.33 = 40 +3.125 = 43.125 2) PROBLEMS ON MODE:  Raw Data: Problem -1 Find the mode for the following data 0,6,1,7,2,3,7,6,6,2,6,6,5,6,0 Solution: K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 9
  10. 10. QTBD 2013 X f f 0 II 2 1 I 1 2 II 2 3 I 1 5 I 1 IIII I 6 II 2 6→ 𝑀𝑂 7 ∴ MODE = 6  Discrete Data : Problem -1 Find the mode for the following data Height 57 59 61 62 63 64 65 (in inches) f 3 5 7 10 20 22 24 Solution: Height (in inches) 57 22 65 2 20 64 2 10 63 5 7 62 69 5 61 67 3 59 66 24 f 66 → 𝑀 𝑂 5 67 2 69 2  Continuous Data: Problem -1 Find the mode for the following data K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 10
  11. 11. QTBD 2013 C.I 0-400 400-800 800-1200 1200-1600 1600-2000 2000-2400 2400-2800 2800-3200 f 4 12 40 41 27 13 9 4 Solution: C.I f 0-400 4 400-800 12 800-1200 40 → 𝑓0 L →1200-1600 41 → 𝑓1 1600-2000 27 → 𝑓2 2000=2400 13 2400-2800 9 2800-3200 4 𝑀𝑂 =L+ 𝑓1−𝑓𝑜 2𝑓1−𝑓𝑜−𝑓2 XC 41−40 = 1200 + ⌈2(41)−40−27⌉ = 1200 + 22.6 = 1226.6 Problems on Geometric Mean:  Raw Data: Problem -1 Find the Geometric mean for the following data X 2000 200 20 12 log 𝑋 𝑖 3.3010 2.3010 1.3010 1.0792 Solution: 𝛴 log 𝑋 𝑖 X log 𝑋 𝑖 G.M. = Anti log ⌈ ⌉ 8 0.9030 𝑛 8.8852 Anti log ⌈ 5 ⌉ 2000 3.3.10 = 200 2.3010 = Anti log [1.7770] = 59.8411 20 1.3010 12 1.0792 8 0.9030 𝞢 log 𝑋 𝑖  Discrete Data: Problem -1 Find the geometric mean for the following data X 10 20 30 40 f 15 18 22 16 Solution: K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR 50 12 60 7 Page 11
  12. 12. QTBD 2013 X f log 𝑋 𝑖 f (log 𝑋 𝑖 ) 10 15 1 15 20 18 1.3010 23.418 30 22 1.4771 32.4962 40 16 1.6021 25.6336 50 12 1.6989 20.3868 60 7 1.7781 12.4467 𝞢𝑓𝑖 = 90 Total G.M. = Antilog [ Σfi log Xi 𝑁 𝞢 f (log 𝑋 𝑖 ) = 129.3797 ] = Antilog [129.3797] = Antilog [1.4372] = 27.3652 90  Continuous Data: Problem -1 Find the Geometric mean for the following data. C.I 15-20 20-25 25-30 30-35 35-40 f 4 20 38 24 10 Solution: C.I f 𝑚𝑖 log𝑚 𝑖 f (log 𝑚 𝑖 ) 15-20 4 17.5 1.2430 4.972 20-25 20 22.5 1.3521 27.042 25-30 38 27.5 1.439 54.682 30-35 24 32.5 1.5118 36.2832 35-40 10 37.5 1.5740 15.74 40-50 4 42.5 1.6283 6.5132 𝛴 𝑓𝑖 = 100 𝞢f (log 𝑚 𝑖 ) = 145.2324 40-45 4 𝛴𝑓 log 𝑋 𝑖 ] = Anti log [145.2324] = Anti log [1.4523] = 28.33 100 𝑁 5) Problems on Harmonic Mean:  Raw Data: Problem -1 Calculate harmonic mean for the following data G.M. = [ X Solution: 200 300 20 K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR 12 8 0.8 Page 12
  13. 13. QTBD 2013 X 200 300 20 12 8 0.8 1⁄ 𝑋 𝑖 0.005 0.003 0.05 0.0833 0.125 1.25 𝞢1⁄ 𝑋 = 1.516 H.M. = 𝑛 𝛴 1 (𝑋) 𝑖 = 6 1.516 = 3.95 𝑖  Discrete Data: Problem -1 Calculate harmonic mean for the following data X 24 26 30 42 17 f 2 9 7 14 24 Solution: Σfi 𝑓𝑖 61 X 𝑓𝑖 ⁄𝑋 H.M. = f = = 21.319 𝑖 Σ Xi 2.86 24 2 0.083 i 26 9 0.346 30 7 0.233 42 14 0.333 17 24 1.411 11 5 0.454 𝑓 𝞢𝑓𝑖 = 61 𝞢 𝑖⁄ 𝑋 = 2.86 𝑖 11 5  Continuous Data: Problem-1 Calculate the harmonic mean for the following data C.I 100-110 110-120 120-130 130-140 140-150 f 12 18 25 22 18 Solution: 𝑓𝑖⁄ Σ fi C.I 𝑓𝑖 𝑚𝑖 95 𝑚𝑖 H.M. = fi = 0.7577 100-110 12 105 0.1142 Σ (m ) i 110-120 18 115 0.1565 = 125.379 120-130 25 125 0.2 130-140 22 135 0.1629 140-150 18 145 0.1241 𝞢𝑓𝑖 = 95 𝑓 𝞢 𝑖⁄ 𝑚 𝑖 = 0.7577  Problems on Measures of Dispersion: 1) Problems on Range K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 13
  14. 14. QTBD 2013  Discrete Data: Problem-1 Find the range for the following data X 12 12 14 15 f 6 14 10 7 Solution: Range = L-S = 17-12 = 5 Coefficient of Range = 𝐿−𝑆 𝐿+𝑠 17−12 𝐿−𝑆 𝐿+𝑠 17 3 5 = 17+12 = 29 = 0.1724  Continuous Data Problem-1: Find the range for the following data C.I 0-10 10-20 20-30 30-40 f 5 8 12 20 Solution: Range = L-S = 70-0 = 70 Coefficient of Range = 16 5 = 70−0 70+0 = 70 70 40-50 15 50-60 7 60-70 3 =1 2) Problems on Quartile Deviation:  Raw Data: Problem-1 Find the quartile deviation for the following data S.NO. 1 2 3 4 5 6 7 Marks 25 35 45 17 35 20 55 Solution: 𝑛+1 7+1 8 S.NO. Marks (𝑋 𝑖 ) Ascending order 𝑄1 = = = = 2 𝑛𝑑 term = 20 4 4 4 1 25 17 3(𝑛+1) 3(7+1) 24 𝑄3 = 4 = 4 = 4 = 6 𝑛𝑑 term = 45 2 35 20 → 𝑄1 𝑄 − 𝑄 45−20 3 45 25 Q.D. = 3 1 = = 12.5 2 2 4 17 35 𝑄 − 𝑄 45−20 25 Coefficient of Q.D. = 𝑄3 + 𝑄1 = 45+20 = 65 5 35 35 3 1 6 20 45 → 𝑄3 = 0.3846 7 55 55  Discrete Data: Problem-1 Find the quartile deviation for the following data X 30 20 40 50 10 60 f 15 7 8 7 4 2 Solution: 𝑁 43 X f Ascending order f Cumulative 𝑄1 = 4 = 4 = 10.73≅ 11 frequency (c .f.) ⟹𝑄1 = 20 30 15 10 4 4 3𝑁 3(43) 𝑄3 = 4 = 4 = 32.25≅ 32 20 7 20→ 𝑄1 7 11→ Q.D. class 40 8 30 15 26 ⟹𝑄3 = 400 50 7 40 → 𝑄3 8 34 → Q.D. class 10 4 50 7 41 60 2 60 2 43 Q.D. = 𝑄3 − 𝑄1 2 = 40−20 2 Coefficient of Q.D. = = 10 𝑄3 − 𝑄1 𝑄3 + 𝑄1 40−20 20 = 40+20 = 60 = 0.3334  Continuous Data: Problem-1 Find the quartile deviation for the following data K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 14
  15. 15. QTBD 2013 C.I 0-10 f 4 Solution: C.I 0-10 10-20 𝐿1 →20-30 30-40 𝐿3 →40-50 50-60 60-70 N 4 ( )−m1 𝑄1 = 𝐿1 + [ f1 ( 𝑄3 = 𝐿3 + [ Q.D. = 10-20 8 f 4 8 10 → 𝑓1 16 11 → 𝑓3 7 3 ] XC = 20 + [ 3N )−m3 4 𝑄3 − 𝑄1 2 20-30 10 f3 40-50 11 50-60 7 60-70 3 Cumulative frequency (c.f.) 4 12 → 𝑚1 22 38 → 𝑚3 49 56 59 14.75−17 ] 10 ] XC = 40 + [ 30-40 16 X10 = 20 +[2.75] = 22.75 44.25−38 ] 11 X10 = 40 +[5.68] = 45.68 45.68−22.75 = 11.465 2 𝑄 − 𝑄 45.68−22.75 22.93 Q.D. = 3 1 = = 𝑄3 + 𝑄1 45.68+22.75 68.43 = Coefficient of =0 .3351 3) Problems on Mean Deviation:  Raw Data: Problem-1 Find the mean deviation for the following data X 7 4 10 9 15 12 7 9 7 Solution: Σ𝑋 80 |𝑑 𝑖 | = |𝑋 𝑖 − ̅ | X Ascending 𝑋 ̅ = 𝑖 = = 8.9 X 𝑛 9 Order (𝑋 𝑖 ) Σ |di | 21.1 7 4 4.9 M.D. = = 9 = 2.344 4 7 1.9 n 𝑀.𝐷 2.34 10 7 1.9 Coefficient of M.D. = 𝑀𝑒𝑎𝑛 = 8.9 = 0.26 9 7 1.9 15 9 0.1 12 9 0.1 7 10 1.1 9 12 3.1 7 15 6.1 𝛴𝑋 𝑖 = 80 𝞢 |𝑑 𝑖 | = 21.1  Discrete Data: Problem-1 Find the mean deviation for the following data X 10 15 20 30 f 8 12 15 10 Solution: X 10 f 8 𝑋 𝑖 𝑓𝑖 80 |𝑑 𝑖 | =|𝑋 𝑖 − ̅ | 𝑋 11.6 K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR 40 3 50 2 𝑓𝑖 |𝑑 𝑖 | 92.8 Page 15
  16. 16. QTBD 2013 15 20 30 40 50 12 15 10 3 2 N= 50 M.D. = 180 300 300 120 100 𝞢 𝑋 𝑖 𝑓𝑖 = 1080 Σfi |di | N = 392 50 6.6 1.6 8.4 18.4 28.4 79.2 24 84 55.2 56.8 𝛴𝑓𝑖 |𝑑 𝑖 | = 392 ̅= X Σ f i Xi N = 1080 50 = 21.6 = 7.84 Coefficient of M.D. = 𝑀.𝐷. 𝑀𝑒𝑎𝑛 7.84 = 21.6 = 0.3629  Continuous Data: Problem-1 Find the mean deviation for the following data C.I 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 f 5 8 7 12 28 20 10 10 Solution: |𝑑 𝑖 | =|𝑋 𝑖 − ̅ | C.I f 𝑓𝑖 𝑚 𝑖 𝑓𝑖 |𝑑 𝑖 | 𝑚𝑖 𝑋 0-10 5 5 25 40 200 10-20 8 15 120 30 240 20-30 7 25 175 20 140 30-40 12 35 420 10 120 40-50 28 45 1260 0 0 50-60 20 55 1100 10 200 60-70 10 65 650 20 200 70-80 10 75 750 30 300 N =100 𝞢𝑓𝑖 𝑚 𝑖 = 4500 𝞢𝑓𝑖 |𝑑 𝑖 | = 1400 ̅= X Σ 𝑓 𝑖 𝑚𝑖 M.D. = 𝑁 = 4500 100 Σfi |di | = 45 1400 = 100 = 14 N 𝑀.𝐷. 14 Coefficient of M.D. = 𝑀𝑒𝑎𝑛 = 45 = 0.3111 4) Problems on Standard Deviation:  Raw Data: Problem-1 Find the Standard deviation for the following data X 8 10 12 14 16 18 20 22 Solution: X 8 10 12 14 16 → A 18 20 22 𝑑 𝑖 = 𝑋𝑖 - A -8 -6 -4 -2 0 2 4 6 24 26 𝑑2 𝑖 64 36 16 4 0 4 16 36 K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 16
  17. 17. QTBD 2013 24 26 Σ𝑑 𝑖 𝑛 64 100 𝛴𝑑 𝑖 = 100 ̅ X= 8 10 𝞢𝑑2 = 340 𝑖 10 = 10 = 1 S.D.(𝜎)= √[ Σ𝑑 2 𝑖 𝑛 340 ] − ⌈ ̅ 2 ⌉ = √100 − (12 ) = √3.4 − 1 = √2.4 = 1.5492 X σ 1.5492 C.V. = ̅ X 100 = 1 X 100 = 154.92 X  Discrete Data: Prolem-1 Find the Standard deviation for the following data X 5 15 25 35 45 55 65 f 3 10 20 30 15 12 10 Solution: 𝑑 𝑖 = 𝑋𝑖 - A X f 𝑓𝑖 𝑑 𝑖 𝑓𝑖 𝑑 2 𝑖 5 3 -30 -90 2700 15 10 -20 -200 4000 25 20 -10 -200 2000 35→ A 30 0 0 0 45 15 10 150 1500 55 12 20 240 4800 65 10 30 300 9000 2 𝞢 f = 100 𝞢𝑓𝑖 𝑑 𝑖 = 400 𝞢𝑓𝑖 𝑑 𝑖 = 24,000 ̅ X= Σ𝑓 𝑖 𝑑 𝑖 𝑁 400 = 100 = 4 S.D.(𝜎)= √[ Σfi 𝑑 2 𝑖 𝑁 2400 ] − ⌈ ̅ 2 ⌉ = √( 100 ) − (42 ) = √24 − 16 = √8 = 2.8284 X  Continuous Data: Problem-1 Find the Standard deviation for the following data C.I 0-10 10-20 20-30 30-40 40-50 50-60 f 5 8 7 12 28 20 Solution: C.I f 𝑑 𝑖 = 𝑋𝑖 - A 𝑚𝑖 𝑓𝑖 𝑑 𝑖 0-10 5 5 -40 -200 10-20 8 15 -30 -240 20-30 7 25 -20 -140 30-40 12 35 -10 -120 40-50 28 45 0 0 50-60 20 55 10 200 60-70 10 65 20 200 70=80 10 75 30 K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR 300 60-70 10 70-80 10 𝑓𝑖 𝑑 2 𝑖 8000 7200 2800 1200 0 2000 4000 7000 Page 17
  18. 18. QTBD 2013 𝞢 f = 100 ̅ X =A + Σ𝑓 𝑖 𝑑 𝑖 𝑁 S.D.(𝜎)= √[ 𝞢𝑓𝑖 𝑑 𝑖 = 0 𝞢𝑓𝑖 𝑑 2 = 34,200 𝑖 0 X c = 45 + 100 X 10 = 45 + 0 = 45 Σfi 𝑑2 𝑖 𝑁 34200 ] − ⌈ ̅ 2 ⌉ = √( 100 ) − (452 ) = √342 − 2025 = √1980 X = 44.4972 σ 44.4972 C.V. = ̅ X 100 = X 100 = 98.8827 45 X K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 18
  19. 19. QTBD 2013 PERMUTATIONS: Definition: The each arrangement made by choosing r objects among n is called a ‘Permutation’. The total number of arrangements in 𝑛𝑝 𝑟 . Also written as P (n, r). n.(𝑛−1).(𝑛−2)…..1 𝑛! 𝑛𝑝 𝑟 = n.(𝑛 − 1). (𝑛 − 2) … . (𝑛 − 𝑟 + 1)= = (𝑛−𝑟) (𝑛−𝑟).(𝑛−𝑟+1)….1 ! NOTE: i) P (n, n) = n! ii) P (n, (n-1)) = P (n,n) PERMUTATIONS WITH REPETITIONS: Suppose there are n objects. If repetitions are allowed, then the number of permutations taking r at a time is 𝑛 𝑟 The number of permutations of choosing 𝑟1 of type 1, 𝑟2 of type 2 and the rest are different n! and is 𝑛𝑝 𝑟 = (𝑟1 )!(𝑟2 )! ii. The number of permutations of choosing 𝑟1 of typer 1, 𝑟2 of type 2, 𝑟3 of type 3 and the rest n! are different and is 𝑛𝑝 𝑟 = (𝑟1 )!(𝑟2 )!(𝑟3 )! RESTRICTED PERMUTATIONS: 1. Suppose there are n objects, we have to select r such that particular s objects should not be selected, then the number of permutations is (𝑛 − 𝑆)𝑃𝑟 2. Suppose there are n objects, we have to select r such that particular s objects should be selected, then the number of permutations is (𝑛 − 𝑆)𝑃 𝑟−𝑆 . 𝑟𝑝 𝑆 CIRCULAR PERMUTATIONS: The number of ways of sitting n people in circular seats is (n − 1)! I. COMBINATIONS: Definition: The selection of r different objects selecting if the order is not important among n objects is called a ‘combination’. If we select r objects, then number of possible ways is 𝑛! 𝑛𝐶 𝑟 = C (n, r) = 𝑟! (𝑛−𝑟)! NOTE: i) If the order is important and repetitions are allowed, then we can select r objects among n 𝑛! objects in ways. (𝑛−𝑟)! ii) The number of arranging n stones in r boxes such that there will be one at least one stone in each box is C (r, (n-r)) = C ((n-1), (n-r)) = ( 𝑛 − 1)𝐶(𝑛−𝑟) iii) Suppose the set A = (𝑎1 , 𝑎2 , … . 𝑎 𝑛 ) and𝑟1 , 𝑟2 , … . , 𝑟 𝑛 . The number of permutations of A, where (𝑟1 +𝑟2 +⋯..+𝑛) each element 𝑎 𝑟 is repeated 𝑟𝑖 times as (𝑟1 )! (𝑟2 )!…(𝑟 𝑛 )! REPETITIONS ARE ALLOWED: 1) The number of combinations of r objects among n objects, if the repetitions are allowed and the r is not important is C((n+r-1), (n-1)) = (n+r−1)! 𝑟! (𝑛−𝑟)! K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 19
  20. 20. QTBD 2013 2) The number of ways of distributing n chaklets to r children, so that each child get at lest One is C ((n-1), (n-r)) 3) The number of non-negative integer solutions of 𝑋1 + 𝑋2 +.....+ 𝑋 𝑛 =n such that 𝑋 𝑖 > 0 is C ((n-1), (n-r)) PROBLEMS ON PERMUTATIONS: PROBLEM -1: How many ways can you arrange 9 different books, such that a special book is on 4th place? SOLUTION: There are 9 books, one is on 4th place, so removing 4th place, remaining other 8, can be arrange in 8! ways i.e. 𝑛𝑝 𝑟 = 40,320 ways. PROBLEM-2: How many different eight –digit numbers can be found by arranging the digits 1, 1,1,1,2,3,3,3? SOLUTION: The number of digits = 8 The digit 1 4 times, the digit 2  1 time, the digit 3 3 times n! 8! The number of ways 𝑛𝑝 𝑟 = = (4)!(1)!(3)! = 240 ways. (𝑟1 )!(𝑟2 )!(𝑟3 )! PROBLEMS ON COMBINATIONS: PROBLEM-1: Find the number of permutations of the word CALCULUS. SOLUTION: There are 8 letters in the word. The letter C, L and U repeated twice. 8! So the number of permutations is = 5040 (2)!(2)!(2)! PROBLEM-2: How many possible committees of 6 people can be chosen from 15 men and 10 women, if 3 men and at least 2 women must be there on each committee? SOLUTION: Three women and 3 men = C (15, 3) X C (10, 3) = 54,600. Two women and 4 men = C (15, 4) X C (10, 2) = 61,425. The total number of possible ways = 54,600 + 61,425 = 1, 16,025 BAYE’S THEORM Statement: If an event A will appears only if the combination of any one of n mutually exclusive events 𝐸1 , 𝐸2 , ..... 𝐸 𝑛 . If an event A is appeared then the probability that it was preceded by the particular event 𝐸 𝑖 is obtained. Then P (𝐸 𝑖 / A) = P (𝐸 𝑖 ).P (A/ 𝐸 𝑖 𝑛 ∑ 𝑖=1 P (𝐸 𝑖 ).P (A/ ) 𝐸𝑖 ) PROBLEMS ON BAYE’S THEORM PROBLEM -1 In a bolt factory machines A, B, C manufactures 20 %, 30 %,and 50 % of the their output and 6 %, 3 %, and 2 % are defectives. A bolt is drawn at random and found to be defective. Find the probabilities that it is manufactured by i) Machine A ii) Machine B iii) Machine C. SOLUTION: Let A = The event that the bolt is manufactured by Machine A. B = The event that the bolt is manufactured by Machine B. C = The event that the bolt is manufactured by Machine C. D = The event that the drawn bolt is defective. P (A) = The probability that the bolt is manufactured by Machine A = P (B) = The probability that the bolt is manufactured by Machine B = K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR 20 100 30 100 Page 20
  21. 21. QTBD 2013 P (C) = The probability that the bolt is manufactured by Machine C = 50 100 P (D/A) = If the bolt is manufactured by Machine A, then the probability that the drawn bolt is defective = 6 100 P (D/B) = If the bolt is manufactured by Machine B, then the probability that the drawn bolt is Defective = 3 100 P (D/C) = If the bolt is manufactured by Machine C, then the probability that the drawn bolt is Defective = i) 2 100 If the drawn bolt is defective, then the probability that it is from machine P (A/D) = = = ii) P(A).P(D/A) D A D B D C P(A).P( )+P(B).P( )+ P(C).P( ) = 120⁄10000 120⁄10000+90⁄10000+100⁄10000 12⁄1000 0.012 (12+9+10)⁄1000 = 0.031 20 6 ).( ) 100 100 20 6 30 3 50 2 (100).(100)+(100).(100)+(100).(100) ( = 12⁄1000 12⁄10000+9⁄1000+10⁄1000 = = 0.3871 If the drawn bolt is defective, then the probability that it is from machine P (B/D) = = P(B).P(D/B) D A D B D C P(A).P( )+P(B).P( )+ P(C).P( ) = 90⁄10000 30 3 ).( ) 100 100 20 6 30 3 50 2 (100).(100)+(100).(100)+(100).(100) ( 0.009 120⁄10000+90⁄10000+100⁄10000 = (0.012)+(0.009)+(0.01) = 0.009 0.031 = 0.2903 iii) If the drawn bolt is defective, then the probability that it is from machine P (C/D) = = P(C).P(D/C) D A D B D C P(A).P( )+P(B).P( )+ P(C).P( ) = 100⁄10000 120⁄10000+90⁄10000+100⁄10000 50 2 ).( ) 100 100 20 6 30 3 50 2 (100).(100)+(100).(100)+(100).(100) ( 0.01 = (0.012)+(0.09)+(0.01) = 0.01 0.031 = 0.3226 PROBLEM -2 Urn A contains 3 red and 5 white marbles. Urn B contains 2 red and 1 white marbles and Urn C contains 2 red and 3 white marbles. An Urn is selected at random and a marble is drawn from the urn. If the marble is red, what is the probability that it came from Urn A? SOLUTION: Let A = The event of choosing the Urn A. B = The event of choosing the Urn B. P (A) = The probability of selecting 1st urn = 1⁄3 P (B) = The probability of selecting 2nd urn = 1⁄3 P (A) = The probability of selecting 3rd urn = 1⁄3 P (R/A) =The probability of selecting 1 red ball from the urn A = 𝑚⁄ 𝑛 = 3 𝐶 ⁄ 8 𝐶 = 3⁄8 1 1 P (R/B) =The probability of selecting 1 red ball from the urn B = 𝑚⁄ 𝑛 = 2 𝐶 ⁄ 3 𝐶 = 2⁄3 1 1 P (R/C) =The probability of selecting 1 red ball from the urn C = 𝑚⁄ 𝑛 = 2 𝐶 ⁄ 5 𝐶 = 2⁄5 1 1 From the baye’s theorem we have P (A/R) = If the marble is red, then the probability that is came from urn A K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 21
  22. 22. QTBD 2013 = = 𝑃 (𝐴).𝑃(𝑅/𝐴) (1⁄3).(3⁄8) = 𝑃 (𝐴).𝑃(𝑅/𝐴)+ 𝑃 (𝐵).𝑃(𝑅/𝐵)+ 𝑃 (𝐶).𝑃(𝑅/𝐶) (1⁄3).(3⁄8)+(1⁄3).(2⁄3)+(1⁄3).(2⁄5) 1⁄8 0.125 0.125 1⁄8+2⁄9+2⁄15 = 0.125+0.2224+0.1334 = 0.4808 = 0.2601 BINOMIAL DISTRIBUTION: Definition: A random variable X is said to follow Binomial Distribution if it assumes non-negative values and its probability mass function (p.m.f) is follows P (X=x) = (nCx) px q (n-x) ; x= 0,1,2,3....., n ; q=1-p =o ; Otherwise Examples: 1) The number of heads obtained in 3 tosses of a coin 2) The number of defectives in a lot of 10 items 3) The number of boys in a family of 4 children POISSON DISTRIBUTION: Definition: A random variable X is said to follow Poisson distribution if it assumes non- negative values and its probability mass function (p.m.f.) is given by P (X,𝜆) = P (X) = 𝑒 −𝜆 𝜆 𝑋 ; X = 0, 1, 2,.... 𝑋! ; 𝜆 >0 =0 ; otherwise It is denoted by X~ P (𝜆) Examples: 1) The typing mistakes per page in a book 2) The number of accidents on a road in a particular time 3) The number of telephone calls received by an operator EXPONENTIAL DISTRIBUTION Definition: A continuous random variable X is said to follow exponential distribution with parameter 𝜃 if its probability density function is given by f(X) = 𝜃. 𝑒 −𝜃𝑋 ; X≥0; 𝜃 > 0 =0 ; otherwise NORMAL DISTRIBUTION Definition: A random variable X is said to have a Normal distribution with parameters µ and 𝞼 if its probability density function is given by f(X; µ, 𝞼) = 1 2 1 σ . √π (𝑋−µ)2 )} σ exp {(2) . ( =o STANDARD NORMAL VARIATE 1 2 σ.√ 1 ∞ ∫ 𝑒 −(𝑧 π −∞ (𝑋−µ) ; 𝞼>0 𝜎 in the p.d.f. of the normal distribution 2 ⁄2) ∞ ᵩ (Z) = σ .2√π ∫−∞ 𝑒 −(𝑧 2⁄2) where ; - ∞< µ< ∞ ; otherwise If X~ N (µ,𝜎 2 ) the if we put Z = f(X; µ, 𝞼) = ;- ∞< X < ∞ ; - ∞< Z < ∞ ᵩ (Z) = The p.d.f. of standard normal variate. PROBLEMS ON BINOMIAL DISTRIBUTION: PROBLEM -1 The probability of a defective bolt is 0.2. Find i) Mean ii) Standard Deviation for the distribution of bolts of 400. K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 22
  23. 23. QTBD 2013 SOLUTION: Given that n= Number of trials = 400 P= Probability of success = Probability of getting a defective bolt = 0.2 Q = 1-P = 1-0.2 = 0.8 Mean = np = 400(0.2) = 80 ii) Variance = npq standard deviation =√ 𝑛𝑝𝑞 i) =√400(0.2)(0.8) =√64 =√8 PROBLEMS ON POISSON DISTRIBITION: PROBLEM -1 Average number of accidents on any day on a national highway is 1.8. Determine the probability that the numbers of accidents are i) At least one ii) At most one. Solution: Given that mean = λ= 1.8 The mean of Poisson distribution is P (X) = i) 𝑒 −𝜆 𝜆 𝑋 𝑋! = 𝑒 −1.8 1.8 𝑋 →1 𝑋! The probability that the number of accidents are at least one is P (X≥1) = 1- p(X<1) = 1- p(X=0) = 1-[ ii) 𝑒−1.8 1.80 ] = 1-(𝑒 −1.8 0! ) = 1- 0.1653 = 0.8347 The probability that the number of accidents are at most one is P (X ≤ 1) = P (X =0) + P (X=1) = [ 𝑒−1.8 1.80 𝑒−1.8 1.81 ]+[ ] = 𝑒 −1.8 0! 1! + 𝑒 −1.8 (1.8) = 𝑒 −1.8 (1+1.8) = (0.1653). (2.8) = 0.4628 PROBLEMS ON EXPONENTIAL DISTRIBUTION: PROBLEM -1 The time taken by a person while speaking over a telephone is exponential distribution with mean 4 minutes. Find i) The probability that he speaks for more than 6 minutes but less than 7 minutes. ii) Out of 6 calls he makes, what is the probability that exactly 2 calls taken him more than 3 minutes. iii) How many calls out of 100 are expected to take more than 3 minutes each? Solution: Let t= the time taken (in minutes) per call. Given that X ~ exponential distribution with mean 4 minutes. 1 f(X) = 1⁄4. 𝑒 − ⁄4 𝑋 ; X≥0;𝜃 >0 →1 i) P (The time taken for one call is between 6 and 7 minutes) 7 = P (6<X<7) = ∫ 6 1⁄ ) 7 4 ] 6 = [−𝑒 −( ii) 𝑓(𝑥)𝑑𝑥 = 7⁄ ) 4 = [−𝑒 −( 𝑋 𝑒 − ⁄4 = 7 ∫ 1⁄4 . 6 6⁄ ) 4 ] + 𝑒 −( 7 −( 𝑋⁄4 ) 1⁄ ∫7 𝑒− 𝑋⁄4 .dx = 1⁄ [ 𝑒 4 6 4 –(1⁄ ] 4) 6 = 0.04936 P (The time taken for 2 calls is more than 3 minutes) ∞ ∞ 𝑋 ∞ 𝑋 = P(X>3) = P (3<X<∞) = ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 1⁄4 . 𝑒 − ⁄4 =1⁄4 ∫ 𝑒 − ⁄4 .dx 3 3 3 ∞ 𝑋 𝑒 −( ⁄4 ) = 1⁄4 [ 1 ] –( ⁄4) 3 = [−𝑒 −∞ + 𝑒 −3⁄4 ] = [0 + 𝑒 −3⁄4 ] = 0.4724 Expected number of calls out of 100 that will be longer 3 minutes each K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 23
  24. 24. QTBD 2013 =100XP(X>3) = 100(0.4724) = 47.24 PROBLEMS ON NORMAL DISTRIBUTION: Problem -1 If X is a Normal variate with mean 30 and standard deviation 5. Find the probabilities that i) 26≤X≤40 ii) X≥45 Solution: Given that Mean = µ = 30 and S.D. =𝞼=5 i) When X = 26 ⟹Z= When X= 40 ⟹Z= X−µ σ X−µ σ = = 26−30 −4 = 5 5 40−30 −10 = 5 5 = -(0.8) = - 𝑍1 = 2 = 𝑍2 ∴ P (26≤X≤40) = P (-0.8≤Z≤2) = P ( 𝑍2 ) + P ( 𝑍1 ) = P (2) + P (-0.8) (From the normal table we have P (2) =0.4772 & P (0.8) = 0.2881) ⟹ P (26≤X≤40) =0.7653 =0.4772 + 0.2881 = 0.7653 ii) When X=45 ⟹ Z= X−µ σ = 45−30 15 =5 5 = 3 = 𝑍1 ∴ P (X≥45) = P (𝑍1 ≥ 3) = 0.5 – P (𝑍1 ≤ 3) = 0.5- 0.49865 JOINT PROBABILITY MASS FUNCTION: Definition: Let XY are 2 random variables defined on same probability space S. W.r.to 2 image sets X(S) = , { 𝑥1 , 𝑥2 … . . 𝑥 𝑖 , … . 𝑥 𝑛 } and Y(S) ={ 𝑦1 , 𝑦2 , … . . , 𝑦 𝑗 , … . 𝑦 𝑚 }. Then the product of sets X(S). Y(S) = { 𝑥1 , 𝑥2 … . . , 𝑥 𝑖 , … . 𝑥 𝑛 } X{ 𝑦1 , 𝑦2 , … . . , 𝑦 𝑗 , … . 𝑦 𝑚 } . The probability of the ordered pair (𝑥 𝑖 , 𝑦 𝑖 ) is defined as P(X =𝑥 𝑖 , Y=𝑦 𝑗 ). Then the above product of sets defined on a probability space and it is given by. 𝑝 𝑖𝑗 = P(X =𝑥 𝑖 , Y=𝑦 𝑗 ) = 𝑃 𝑋𝑌 (x, y) = P (𝑥 𝑖 , 𝑦 𝑗 ) Then P (𝑥 𝑖 , 𝑦 𝑗 ) is known as joint Probability mass function of X & Y. The values of P (𝑥 𝑖 , 𝑦 𝑗 ) can be represented in the following table. X Y 𝑦3 ......... 𝑦 𝑗 .......... Total 𝑦1 𝑦2 𝑦𝑚 𝑝1𝑗 𝑥1 𝑝11 𝑝12 𝑝13 𝑝1𝑚 𝑝1. 𝑝2𝑗 𝑥2 𝑝21 𝑝22 𝑝23 𝑝1𝑚 𝑝2. 𝑝3𝑗 𝑥3 𝑝31 𝑝32 𝑝33 𝑝3𝑚 𝑝3. . . . 𝑝 𝑖𝑗 𝑥𝑖 𝑝 𝑖1 𝑝 𝑖2 𝑝 𝑖3 𝑝 𝑖𝑚 𝑝 𝑖. . . . 𝑝 𝑛𝑗 𝑥𝑛 𝑝 𝑛1 𝑝 𝑛2 𝑝 𝑛3 𝑝 𝑛𝑚 𝑝 𝑛. Total 𝑝.1 𝑝.2 𝑝.3 𝑝.𝑗 𝑝.𝑚 𝑛 ∑ 𝑖=1 𝑚 ∑ 𝑗=1 𝑝 𝑖𝑗 = 1 Marginal probability mass function: Definition: Let (X,Y) be a bi-variate random variable and P (X,Y) be the probability mass function of a bi-variate random variable (X,Y). K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 24
  25. 25. QTBD 2013 The Marginal probability mass function of X is denoted by P (X) or PX (x) and is given by P (X) = PX (X=x) = P (X= xi ∩ Y = y1) + P (X= xi ∩ Y = y2) +....+ P (X= xi ∩ Y = yj) +....+ P (X= xi ∩ Y = ym) 𝑚 = P (xi, y1) + P (xi, y2) +.....+P (xi, yj) +....+ P (xi, ym) = Pi1+ Pi2 +....+ Pij+.....+ Pim = ∑ 𝑗=1 𝑝 𝑖𝑗 𝑚 =∑ 𝑗=1 P (xi , yj ) = 𝑃𝑖. = 𝑃 𝑋 (x) The Marginal probability mass function of Y is denoted by P (Y) or PY (y) and is given by P (Y) = PY (Y=𝑦 𝑗 ) = P (X= x1 ∩ Y = yj) + P (X= x2 ∩ Y = yj) +....+ P (X= xi ∩ Y = yj) +....+ P (X= xn ∩Y = yj) 𝑚 = P (x1, yj) + P (x2, yj) +.....+P (xi, yj) +....+ P (xn, yj) = Pi1+ Pi2 +....+ Pij+.....+ Pim = ∑ 𝑗=1 𝑝 𝑖𝑗 𝑚 =∑ 𝑗=1 P (xi , yj ) = 𝑃𝑖. = 𝑃 𝑋 (x) Matrix Definition: A system of mn numbers (real or complex) arranged in the form of an ordered set of m rows, each row consisting of an ordered set of n numbers between [ ] 𝑜𝑟 ( ) 𝑜𝑟 | | is called a matrix of order of type mXn. Each of mn numbers consisting of mXn matrix is called an element of the matrix. A = 𝑎11 𝑎12 ...... 𝑎1𝑛 = [ 𝑎 𝑖𝑗 ] 𝑚𝑋𝑛 where 1≤i≤m ; 1≤j≤n 𝑎21 𝑎22 ...... 𝑎2𝑛 : : : : : : 𝑎 𝑚1 𝑎 𝑚2 .... 𝑎 𝑚𝑛 In relation to matrix we call the numbers as scalars. Operations of Matrices: Equal matrices: Definition: Two matrices A = [ 𝑎 𝑖𝑗 ] and B= [ 𝑏 𝑖𝑗 ] are said to be equal if and only if i) A and B are of the same type ii) 𝑎 𝑖𝑗 = 𝑏 𝑖𝑗 for every i & j Multiplication of a matrix by a scalar Definition: Let A be a matrix. The matrix obtained by multiplying every element of A by k, a scalar is called the product of A by k and is denoted by kA or Ak If A = [ 𝑎 𝑖𝑗 ] 𝑚𝑋𝑛 then Ka = [k 𝑎 𝑖𝑗 ] 𝑚𝑋𝑛 = k [ 𝑎 𝑖𝑗 ] 𝑚𝑋𝑛 = kA Properties: i) OA = O (Null matrix ), (-1) A = (-A) called the negative of A ii) 𝑘1 (𝑘2 A) = (𝑘1 𝑘2 ) A = 𝑘2 (𝑘1 𝐴) where 𝑘1 𝑘2 are scalars. iii) kA = O ⟹ A = O if k≠0 iv) 𝑘1 𝐴 = 𝑘2 𝐴 and A is not a null matrix ⟹𝑘1 = 𝑘2 Addition of matrices: Definition: Le A = [ 𝑎 𝑖𝑗 ] 𝑚𝑋𝑛 and B= [ 𝑏 𝑖𝑗 ] 𝑚𝑋𝑛 be 2 matrices. The matrix C = [ 𝐶 𝑖𝑗 ] 𝑚𝑋𝑛 Where 𝐶 𝑖𝑗 =𝑎 𝑖𝑗 + 𝑏 𝑖𝑗 is called the sum of matrices A & B is denoted by A+B Thus [ 𝑎 𝑖𝑗 ] 𝑚𝑋𝑛 + [ 𝑏 𝑖𝑗 ] 𝑚𝑋𝑛 = [𝑎 𝑖𝑗 + 𝑏 𝑖𝑗 ] 𝑚𝑋𝑛 = [ 𝑎 𝑖𝑗 ] 𝑚𝑋𝑛 + [ 𝑏 𝑖𝑗 ] 𝑚𝑋𝑛 Differences of matrices: Definition: If A&B are matrices of the same type then A + (-B) is A-B. Matrix Multiplication: Definition: let A = [ 𝑎 𝑖𝑘 ] 𝑚𝑋𝑛 and B= [ 𝑏 𝑘𝑗 ] 𝑛𝑋𝑝 be 2 matrices. The matrix C = [ 𝐶 𝑖𝑗 ] 𝑛𝑋𝑝 𝑛 Where 𝐶 𝑖𝑗 = ∑ 𝑘=1 𝑎 𝑖𝑘 𝑏 𝑘𝑗 is called the product of the matrices A&B in that order we can write C = A+B Types of Matrices: K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 25
  26. 26. QTBD 2013 1) Square Matrix: If A = [ 𝑎 𝑖𝑗 ] 𝑚𝑋𝑛 and m=n , then A is called a square matrix. A square matrix A of order (nXn) is sometimes called as a “n-rowed matrix A”. 1 1 Example: A=[ ] is called 2nd order matrix. 2 2 2) Rectangular Matrix: A matrix which in not a square matrix is called a rectangular matrix. 1 −1 2 Example: A = [ ] is a (2X3) matrix. 2 3 4 3) Row Matrix: A matrix of order (1Xm) is called a row matrix. Example: A = [ 1 2 3 ](1𝑋3) 4) Column Matrix: A matrix of order (nX1) is called a column matrix. 1 Example: A= 1 2(3𝑋1) 5) Unit Matrix: If A = [ 𝑎 𝑖𝑗 ] 𝑚𝑋𝑛 such that 𝑎 𝑖𝑗 = 1 for i = j and 𝑎 𝑖𝑗 = 0 for i ≠ j, then A is called a unit matrix. It is denoted by In Example: I2 1 0 = 0 1 1 0 0 I3 = 0 1 0 0 0 1 6) Null Matrix (or) Zero Matrix: If A = [ 𝑎 𝑖𝑗 ] 𝑚𝑋𝑛 such that 𝑎 𝑖𝑗 = 0 for ∀ i&j , then A is called a Zero matrix or a null matrix. It is denoted by O. 0 0 0 Example: O= 0 0 0(2𝑋3) Definitions: 1) Diagonal Elements Definition: In a matrix A = [ 𝑎 𝑖𝑗 ] 𝑚𝑋𝑛 , the elements 𝑎 𝑖𝑗 of A for which i =j (i.e. 𝑎11 , 𝑎22 ,..., 𝑎 𝑛𝑛 ) are called diagonal elements of A. 2) Principle Diagonal Definition: The line along which the diagonal elements line is called the principle diagonal of A. 3) Diagonal Matrix Definition: A square matrix all of whose elements except those leading diagonal are zero is called diagonal matrix. If 𝑑1 , 𝑑2, ....., 𝑑 𝑛 are diagonal elements of a diagonal matrix A, then A is written as A = diag (𝑑1 , 𝑑2, ....., 𝑑 𝑛 ) 3 0 0 Example: A = diag (3, 1,-2) = 0 1 1 0 0 −2 4) Scalar Matrix: Definition: A diagonal matrix whose leading elements are equal is called a’’ scalar matrix’’. 3 0 0 Example: A=0 3 0 0 0 3 K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 26
  27. 27. QTBD 2013 CURVE FITTING Types of Cure Fitting:  Fitting of Straight Line  Fitting of Second degree parabola  Fitting of Exponential Curve  Fitting of Power Curve 1) Fitting of Straight Line: Let us consider the fitting of a straight line Y=a+bX →① To a set of n points (𝑥 𝑖 , 𝑦 𝑗 ); i=1,2,....,n. The equation 1 represents a family of straight lines for a different values of arbitrary constants a and b. The problem is to determine a and b so that the line is the line of the best fit. The best fit can be obtained with Legend’s principle of least square. Which consists in minimising the sum of squares of the deviations the actual values of y from their estimated values is given by the line of best fit. Let 𝑝 𝑖 (𝑥 𝑖 , 𝑦 𝑖 ) be any general point in the scatter diagram. Draw 𝑝 𝑖 𝑀 ⊥ to X axis meeting the line in 𝐻 𝑖 . Since 𝐻 𝑖 lies on straight line its ordinate is a +b 𝑋 𝑖 . Hence the co-ordinates of 𝐻 𝑖 are [𝑥 𝑖 , (a +b 𝑋 𝑖 ) ] 𝑝 𝑖 𝐻 𝑖 = 𝑝 𝑖 𝑀 - 𝐻 𝑖 M = 𝑦 𝑖 - (a +b 𝑋 𝑖 ) ⟹ 𝑒 𝑖 =𝑦 𝑖 - (a +b 𝑋 𝑖 ) →② Here 𝑒 𝑖 is called error of estimate or “residual” of 𝑦 𝑖 . According to the principle of least square, we have to determine a & b so that n 2 𝑛 E =∑i=1 ei = ∑ 𝑖=1(𝑦 𝑖 − a − b 𝑋 𝑖 )2 is minimum →③ From the principle of maxima and minima, the partial derivatives of E w.r.to a & b and equating them to zero. i.e. ⟹ ⟹ dE da =0 dE da =0⟹ dE =0 db dE ⟹ da 𝑛 ∑ 𝑖=1(𝑦 𝑖 − a − b 𝑋 𝑖 )2 = 0 𝑛 ⟹ 2 ∑ 𝑖=1(𝑦 𝑖 − a − b 𝑋 𝑖 )2−1 (-1) = 0 𝑛 𝑛 𝑛 𝑛 ⟹ ∑ 𝑖=1(𝑦 𝑖 − a − b 𝑋 𝑖 )1 = 0 ⟹∑ 𝑖=1 𝑦 𝑖 –∑ 𝑖=1 𝑎 - b ∑ 𝑖=1 𝑥 𝑖 = 0 𝑛 𝑛 𝑛 𝑛 ⟹∑ 𝑖=1 𝑦 𝑖 – n.a- b ∑ 𝑖=1 𝑥 𝑖 = 0 ⟹∑ 𝑖=1 𝑦 𝑖 = n.a +b ∑ 𝑖=1 𝑥 𝑖 → ④ ⟹ dE db =0 ⟹ dE db 𝑛 ∑ 𝑖=1(𝑦 𝑖 − a − b 𝑋 𝑖 )2 = 0 𝑛 ⟹ 2 ∑ 𝑖=1(𝑦 𝑖 − a − b 𝑋 𝑖 )2−1 (-𝑋 𝑖 ) = 0 𝑛 𝑛 𝑛 𝑛 ⟹ ∑ 𝑖=1(𝑦 𝑖 − a − b 𝑋 𝑖 )1 (−𝑥 𝑖 ) = 0 ⟹∑ 𝑖=1(𝑥 𝑖 ). (𝑦 𝑖 ) –𝑎 ∑ 𝑖=1(𝑋 𝑖 ) - b ∑ 𝑖=1 𝑥 2 = 0 𝑖 𝑛 𝑛 𝑛 ⟹∑ 𝑖=1(𝑥 𝑖 ). ( 𝑦 𝑖 ) – 𝑎 ∑ 𝑖=1(𝑋 𝑖 ) - b ∑ 𝑖=1 𝑥 2 = 0 𝑖 𝑛 𝑛 𝑛 ⟹∑ 𝑖=1(𝑥 𝑖 ). (𝑦 𝑖 ) = 𝑎 ∑ 𝑖=1(𝑋 𝑖 ) +b ∑ 𝑖=1 𝑥 2 → ⑤ 𝑖 Normal Equitation’s: The Normal equations for straight line equation are 𝑛 𝑛 𝑛 𝑛 𝑛 ∑ 𝑖=1 𝑦 𝑖 = n.a +b ∑ 𝑖=1 𝑥 𝑖 → ④ ∑ 𝑖=1(𝑥 𝑖 ). ( 𝑦 𝑖 ) = 𝑎 ∑ 𝑖=1(𝑋 𝑖 ) +b ∑ 𝑖=1 𝑥 2 → ⑤ 𝑖 After solving these Normal equations we get the values of a & b with these values of a & b, put these values in equation 1, then it is called line of Best fit to the given set of points (𝑥 𝑖 , 𝑦 𝑖 ) I=1,2....,n The given set of on n points is ̂ =𝑎 +𝑏 X 𝑌 ̂ ̂ K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 27
  28. 28. QTBD 2013 2. FITTING OF SECOND DEGREE PARABOLA:Let Y=a+bXi+cXi2① be a 2nd degree parabola to be fitted to the given set of observations (Xi,Yi) (i=1,2,3,………..,n) According to principle of least square technique to determine the constants a, b, c consider the residual. ei=𝑦 𝑖 - ̂ 𝑦 ② ei=yi-(a+bXi+cXi2) ̂ = a+bXi+cXi2 𝑦 ③ Taking summation & squaring on both sides to eq (3). ④ 𝑛 E=∑ 𝑒 2 = ∑ 𝑖=1(𝑦 𝑖 − 𝑎 − 𝑏𝑥 𝑖 − 𝑐𝑥 2 )2 𝑖 𝑖 Taking partial derivatives w.r.to parameters a, b, c and equating them to ‘0’ then we get “normal equations” The normal equations for the second degree parabola are 𝑛 𝑑𝐸 𝑑 𝑑 = 0 (𝐸) = [∑(𝑦 𝑖 − 𝑎 − 𝑏𝑥 𝑖 − 𝑐𝑥 2 )2 = 0 𝑖 𝑑𝑎 𝑑𝑎 𝑑𝑎 𝑖=1 𝑛 𝑛 2∑ 𝑖=1(𝑦 𝑖 − 𝑎 − 𝑏𝑥 𝑖 − 𝑐𝑥 2 )2(-1) =0 ∑ 𝑖=1(𝑦 𝑖 − 𝑎 − 𝑏𝑥 𝑖 − 𝑐𝑥 2 )2 = 0 𝑖 𝑖 𝑛 𝑛 𝑛 ∑ 𝑖=1 𝑦 𝑖 − ∑ 𝑖=1 𝑎 − 𝑏 ∑ 𝑖=1 𝑥 𝑖 − 𝑐 ∑2 𝑥 2 = 0 𝑖=1 𝑖 𝑛 𝑛 ∑ 𝑖=1 𝑦 𝑖 = 𝑛𝑎 − 𝑏 ∑ 𝑖=1 𝑥 𝑖 − 𝑐 ∑2 𝑥 2 ⑤ 𝑖=1 𝑖 𝑛 𝑑𝐸 𝑑 = 0 [∑(𝑦 𝑖 − 𝑎 − 𝑏𝑥 𝑖 − 𝑐𝑥 2 )2 = 0 𝑖 𝑑𝑏 𝑑𝑏 𝑖=1 𝑛 2[∑ 𝑖=1(𝑦 𝑖 − 𝑎 − 𝑏𝑥 𝑖 − 𝑐𝑥 2 )(−𝑥 𝑖 )] = 0 𝑖 𝑛  ∑(𝑦 𝑖 − 𝑎 − 𝑏𝑥 𝑖 − 𝑐𝑥 2 )(𝑥 𝑖 ) = 0 𝑖 𝑖=1 𝑛 𝑛 𝑛 𝑛 ∑ 𝑖=1 𝑥 𝑖 𝑦 𝑖 − 𝑎 ∑ 𝑖=1 𝑥 𝑖 − 𝑏 ∑ 𝑖=1 𝑥 2 − 𝑐 ∑ 𝑖=1 𝑥 3 = 0 𝑖 𝑖 𝑛 𝑛 𝑛 𝑛 ∑ 𝑖=1 𝑥 𝑖 𝑦 𝑖 = 𝑎 ∑ 𝑖=1 𝑥 𝑖 − 𝑏 ∑ 𝑖=1 𝑥 2 − 𝑐 ∑ 𝑖=1 𝑥 3 ⑥ 𝑖 𝑖 𝑛 𝑑𝐸 𝑑 = 0 ∑(𝑦 𝑖 − 𝑎 − 𝑏𝑥 𝑖 − 𝑐𝑥 2 )2 = 0 𝑖 𝑑𝑐 𝑑𝑐 𝑖=1 𝑛 2[∑ 𝑖=1(𝑦 𝑖 − 𝑎 − 𝑏𝑥 𝑖 − 𝑐𝑥 2 )(−𝑥 2 )] = 0 𝑖 𝑖 K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 28
  29. 29. QTBD 2013 𝑛 ∑ 𝑖=1(𝑦 𝑖 − 𝑎 − 𝑏𝑥 𝑖 − 𝑐𝑥 2 )(𝑥 2 ) = 0 𝑖 𝑖 𝑛 𝑛 𝑛 𝑛 ∑ 𝑖=1 𝑥 2 𝑦 𝑖 − 𝑎 ∑ 𝑖=1 𝑥 2 − 𝑏 ∑ 𝑖=1 𝑥 3 − 𝑐 ∑ 𝑖=1 𝑥 4 = 0 𝑖 𝑖 𝑖 𝑖 𝑛 𝑛 𝑛 𝑛 ∑ 𝑖=1 𝑥 2 𝑦 𝑖 = 𝑎 ∑ 𝑖=1 𝑥 2 − 𝑏 ∑ 𝑖=1 𝑥 3 − 𝑐 ∑ 𝑖=1 𝑥 4 ⑦ 𝑖 𝑖 𝑖 𝑖 NORMAL EQUATIONS OF SECOND DEGREE PARABOLA 𝑛 𝑛 ∑ 𝑖=1 𝑦 𝑖 = 𝑛𝑎 − 𝑏 ∑ 𝑖=1 𝑥 𝑖 − 𝑐 ∑2 𝑥 2 𝑖=1 𝑖 𝑛 𝑛 𝑛 𝑛 ∑ 𝑖=1 𝑥 𝑖 𝑦 𝑖 = 𝑎 ∑ 𝑖=1 𝑥 𝑖 − 𝑏 ∑ 𝑖=1 𝑥 2 − 𝑐 ∑ 𝑖=1 𝑥 3 𝑖 𝑖 𝑛 𝑛 𝑛 𝑛 ∑ 𝑖=1 𝑥 2 𝑦 𝑖 = 𝑎 ∑ 𝑖=1 𝑥 2 − 𝑏 ∑ 𝑖=1 𝑥 3 − 𝑐 ∑ 𝑖=1 𝑥 4 𝑖 𝑖 𝑖 𝑖 After solving these normal equations we get the estimated values of a,b,c. substituting these estimated values in eq(1) then resulting equation is called “best fit” for the given set of data. ̂ = ̂ + ̂ 𝑥 + 𝑐̂ 𝑥 2 𝑌 𝑎 𝑏 3. FITTING OF EXPONENTIAL CURVE Y = abx Let ① Y=abx Taking logarithm on both sides we get log(𝑦) = log(𝑎. 𝑏 𝑥 ) = log 𝑎 + log 𝑏 𝑥 = log 𝑎 + 𝑥 log(𝑏) [∵log x m=m log x U=A+Bx ∵ log (m. n) = log m + log n] ② Where U=log y, A=log a, B=log b This is a linear equation in x and U The normal equations for estimating A & B are ∑ 𝑈 = 𝑛𝐴 + 𝐵 ∑ 𝑥 ③ ∑ 𝑥𝑈 = 𝐴 ∑ 𝑥 + 𝐵 ∑ 𝑥 2 ④ After solving these normal equations we get the A & B values. Finally we get a, b values as follows a=Anti log (A) K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 29
  30. 30. QTBD 2013 b=Anti log (B) Substitute these a & b values in eq ① then we get “best fit” to the given set of ‘n’ points. The best fit of the required equations is ̂ = ̂𝑏 𝑥 𝑦 𝑎̂ 4. FITTING OF EXPONENTIAL CURVE Y = aebx Let Y=aebx →① Taking logarithm on both sides to eq(1) ,then we get Log y=log[aebx]log y=log a + log ebx log y=log a + bx log e log y=log a +x [b log e] U=A+Bx 2 Where U=log y, A=log a, B=b log e This is a linear equation in x and U The normal equations are:∑ 𝑈 = 𝑛𝐴 + 𝐵 ∑ 𝑥 ③ ∑ 𝑥𝑈 = 𝐴 ∑ 𝑥 + 𝐵 ∑ 𝑥 2  ④ From these we find A and B are consequently 𝐵 𝐵 𝐵  𝑏 = log 𝑒 = 0.4343 a=Anti log (A) and B=b[log e]log 𝑒 = 𝑏 The best fit to the given set of ‘n’ points is ̂ = ̂𝑒 ̂𝑏 𝑥 𝑦 𝑎 5. FITTING OF A POWER CURVE Y=axb Let y=axb ① Taking logarithm on both sides to eq(1), then we get Log y=log[axb]log y=log a+ log [𝑥 𝑏 ]log y=log a+ b log x log y=log a+ log x U=A+ Bv ② Where U=log y, A=log a, v=log x K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 30
  31. 31. QTBD 2013 This is a linear equation in v and U The normal equations are ∑ 𝑈 = 𝑛𝐴 + 𝑏 ∑ 𝑣 ③ ∑ 𝑈𝑣 = 𝐴 ∑ 𝑣 + 𝑏 ∑ 𝑣 2 ④ From these we find A and B consequently a=Anti log (A) b=B The best fit to the given set of ‘n’ points is ̂ = ̂(𝑥 ̂𝑏 ) 𝑦 𝑎 1. PROBLEMS ON FITTING OF STRAIGHT LINE: Problem – 1 Fit a straight line to the following data. X 1 Y 2.4 Solution: 2 3 3 3.6 4 4 6 5 8 6 X Y 𝑋2 XY The straight line equation is 1 2.4 1 2.4 2 3.0 4 6.0 3 3.6 9 10.8 Y=a+bX →① The normal equations for straight line are 𝑛 𝑛 ∑ 𝑖=1 𝑦 𝑖 = n.a +b ∑ 𝑖=1 𝑥 𝑖 →② 4 4.0 16 16.0 6 5.0 36 30.0 8 6.0 64 48.0 ∑X ∑Y = 24 = 24 ∑ 𝑋 2 ∑ 𝑛 𝑛 𝑛 ∑ 𝑖=1(𝑥 𝑖 ). ( 𝑦 𝑖 ) = 𝑎 ∑ 𝑖=1(𝑋 𝑖 ) +b ∑ 𝑖=1 𝑥 2 → ③ 𝑖 From the above table we have ∑X 2 𝑌 = 24 ∑ 𝑌 = 24 24 =6 (a) + b (24) = 130 = 113.2 113.2=a (24) + b (130) ∑ 𝑋 2 =130 ∑ 𝑋 2 = 113.2 →④X4 →⑤ 24 (a) + 96 (b) = 96 24 (a) + 130 (b) = 113.2 34 (b) = 17.2 ⟹b = 17.2 34 ⟹b = 0.5059 Substitute b in eq 4 ⟹6 (a) + 24 (0.5059) = 24 ⟹6 (a) + 12.1416 = 24 ⟹6 (a) = 11.8584 ⟹a= 11.8584 6 ⟹6 (a) = 24 – 12.1416 ⟹a =1.9764 ∴ a = 1.9764 & b = 0.5059 K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 31
  32. 32. QTBD 2013 Hence the required equation of straight line is Y=a+bX ⟹Y = 1.9764 + (0.5059) X Problems on second degree parabola: Problem -1 X 0 Y 1 Solution: Fit a parabola of second degree to the following data. 1 1.8 2 1.3 3 2.5 4 6.3 X Y 𝑋2 𝑋3 𝑋4 XY 𝑋2 𝑌 0 1 0 0 0 0 0 1 1.8 1 1 1 1.8 1.8 2 1.3 4 8 16 2.6 5.2 3 2.5 9 27 81 7.5 22.5 4 6.3 16 64 256 25.2 100.8 ∑X ∑Y ∑ 𝑋2 ∑ 𝑋3 ∑ X4 ∑ XY ∑ X2Y = 12.9 =30 From the table we have =100 =354 =37.1 =130.3 =10 ∑ 𝑋=10 ∑ 𝑌 =12.9 ∑ 𝑋 2 =30 ∑ 𝑋 3 =100 ∑ 𝑋 4 =354 ∑ 𝑋 2 𝑌 =130.3 The second degree parabola equation is Y=a+bXi+cXi2① The normal equations for 2nd degree parabola are 𝑛 𝑛  ∑ 𝑖=1 𝑦 𝑖 = 𝑛𝑎 − 𝑏 ∑ 𝑖=1 𝑥 𝑖 − 𝑐 ∑2 𝑥 2 𝑖=1 𝑖 ② 𝑛 𝑛 𝑛 𝑛 ∑ 𝑖=1 𝑥 𝑖 𝑦 𝑖 = 𝑎 ∑ 𝑖=1 𝑥 𝑖 − 𝑏 ∑ 𝑖=1 𝑥 2 − 𝑐 ∑ 𝑖=1 𝑥 3 ③ 𝑖 𝑖 𝑛 𝑛 𝑛 𝑛 ∑ 𝑖=1 𝑥 2 𝑦 𝑖 = 𝑎 ∑ 𝑖=1 𝑥 2 − 𝑏 ∑ 𝑖=1 𝑥 3 − 𝑐 ∑ 𝑖=1 𝑥 4 ④ 𝑖 𝑖 𝑖 𝑖 ⟹12.9 =5 (a) + b (10) + c (30) ⑤ ⟹37.1=a (10) + b (30) + c (100) ⑥ ⟹130.3 =a (30) +b (100) + c (354) ⑦ K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 32
  33. 33. QTBD 2013 From ⑤ & ⑥we have From ⑥&⑦ we have 5 (a) + b (10) + c (30) =12.9 X 2 10 (a) + 30 (b) + 100 (c) = 37.1 X3 a (10) + b (30) + c (100) =37.1 30 (a) + 100 (b) + 354 (c) = 130.3 10 (a) + 20 (b) + 60 (c) =25.8 30 (a) + 90 (b) + 300 (c) = 111.3 10 (a) + 30 (b) + 100 (c) =37.1 30 (a) + 100 (b) + 354 (c) = 130.3 -10 (b) – 40 (b) = - (11.3) - 10 (b) – 54 (c) = - 19 ⟹10 (b) + 40 (c) = 11.3 ⑧ 10 (b) + 54 (c) = 19  ⑨ From ⑧ & ⑨ substituting c = 0.55 in eq ⑥ 10 (b) + 40 (c) = 11.3 ⟹ 10 (b) + 40 (0.55) = 11.3 10 (b) + 54 (c) = 19 ⟹ 10 (b) + 22 = 11.3 14 (c) = 7.7 ⟹ 10 (b) = 11.3 - 22 7.7 ⟹ c= 14 = 0.55 ⟹ c= 0.55 ⟹ 10 (b) = - 10.7 ⟹b= − 10.7 = 10 - 1.07 Substituting b = - 1.07 & c= 0.55 in eq ⑤ 5 (a) + 10 (-1.07) + 30 (0.55) = 12.9 ⟹ 5 a – 10.7 + 16.5 = 12.9 ⟹ 5 a = 23.6 – 16.5 ⟹a= ∴ a = 1.42 ⟹ 5 a = 7.1 b = - (1.07) 7.1 = 5 1.42 ⟹ 5 a = 12.9 + 10.7-16.5 ⟹ a = 1.42 c = 0.55 Thus the required equation of the second degree parabola is Y = ̂ +𝑏 X + 𝑐̂ 𝑋 2 𝑎 ̂ ⟹ ̂ = 1.42 – 1.047 (X) + 0.55 (𝑋 2 ) 𝑌 PROBLEMS ON POWER CURVE Y = a xb: Problem – 1 X 1 Y 6.2 Solution: X 1 2 3 4 5 For given data fit a power curve of the type Y = a xb 2 8.3 Y 6.2 8.3 15.4 33.1 65.2 3 15.4 4 33.1 5 65.2 𝑈 𝑖 = log Y 0.7924 0.99191 1.1875 1.5198 1.8142 6 127.4 𝑉𝑖 = log X 0 0.3010 0.4771 0.6020 0.6990 K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR 𝑉𝑖2 0 0.0906 0.2276 0.3624 0.4886 𝑈 𝑖 𝑉𝑖 0 0.2766 0.5665 0.9149 1.2681 Page 33
  34. 34. QTBD 2013 6 Total 127.4 2.1052 0.7781 0.6054 1.6380 𝛴 𝑈 𝑖 =8.3382 𝛴 𝑉 𝑖 =2.8572 𝛴 𝑉2 =1.7746 𝛴 𝑈 𝑖 𝑉 𝑖 =4.6641 𝑖 ① Let power curve be Y = a xb Taking logarithm on both sides, then we get log y=log[a xb] log y=log a+ log x U=A+ B v ② The normal equations are ∑ 𝑈 = 𝑛𝐴 + 𝑏 ∑ 𝑣 ③ ∑ 𝑈𝑣 = 𝐴 ∑ 𝑣 + 𝑏 ∑ 𝑣 2 ④ ⑤ 8.3382 = 6 (A) + B (2.8572) 4.6641 = A (2.8572) + B (1.7746) ⑥ Solving these equations we get A B 1 2.8572 -8.3382 6 2.8572 1.7746 -4.6641 2.8572 1.7746 A B 1 = = [(2.8572)(−4.6641)]−[(−8.3382)(1.7746)] [(− 8.3382)(2.8572)]−[6(−4.6641)] [6(1.7746)−(2.8572)(2.8572) ⟹ ⟹ 𝐴 −13.326+14.7970 𝐴 1.471 = 𝐵 4.1607 = = 𝐵 −23.8239+27.9846 1 2.484 ⟹A = = 1.471 2.484 1 10.6476−8.1636 = 0.5921 ⟹B = 4.1607 2.484 = 1.675 ⟹ ̂ = Anti log (A) = Anti log (0.5921) = 3.9093 ⟹𝑎 = 3.9093 𝑎 ̂ ̂ ⟹𝑏 = B = 1.675 ⟹ ̂ = 1.675 𝑏 Substituting ̂ & ̂ in equation ① we get the best fit of power curve 𝑎 𝑏 Hence for the given data, the fitted power curve is ̂ ⟹ ̂ = ̂ 𝑋 ̂𝑏 ⟹𝑌 = (3.9093) 𝑋 (1.675) 𝑌 𝑎 PROBLEMS ON EXPONENTIAL CURVE Y = a ebx Problem -2 X Y Fit an exponential curve of the form Y = a ebx for the following data 1 1.4 2 4.1 3 13.2 4 39.3 K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR 5 125 6 303 Page 34
  35. 35. QTBD 2013 Solution: U = log Y 0.1461 0.6128 1.1206 1.5944 2.0969 2.4814 𝑋2 1 4 9 16 25 36 XU 0.1461 1.2256 3.3618 6.3776 10.4845 14.8884 𝛴 U= 8.0522 X 1 2 3 4 5 6 𝛴 𝑋2 = 91 𝛴XU =36.484 Y 1.4 4.1 13.2 39.3 125 303 𝛴X=21 Y= a The exponential curve is ebx →① Taking logarithm on both side log y=log a + log ebx ⟹log y=log[a ebx] log y=log a +x [b log e] log y=log a + b x log e U=A +B X ② Where U=log y, A=log a, B=b log e The normal equations are:- ③ ∑ 𝑈 = 𝑛𝐴 + 𝐵 ∑ 𝑥 ∑ 𝑥𝑈 = 𝐴 ∑ 𝑥 + 𝐵 ∑ 𝑥 2  ④ From the table we have 𝛴 U= 8.0522 𝛴 𝑋2 = 91 𝛴X=21 8.0522 = 6 (A) + B (21) →⑤ 36.484 = 21(A) + B (91) 𝛴XU =36.484 →⑥ A B 1 21 -18.0522 6 21 91 -36.484 21 91 𝐴 𝐵 1 ⟹(−766.164+732.7502) = (−169.0962+218.904 ) = (546− 441) ⟹ 𝐴 −33.4138 = 𝐵 49.8078 = 1 105 −33.4138 ⟹A= 105 ̂ = Anti log (A) = Anti log (-0.3182) = 0.4806 𝑎 = - 0.3182 ⟹B = ̂= 𝑏 𝐵 𝑒 𝑙𝑜𝑔10 K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR 𝐵 49.8078 105 = 0.4343 = = 0.4744 0.4743 0.4343 = 1.0921 Page 35
  36. 36. QTBD 2013 Substituting ̂ = 0.4806 & ̂ = 1.0921 in equation ①, then we get the best fit of the given 𝑎 𝑏 curve. Hence for the given data the fitted exponential curve is ̂ ⟹ ̂= ̂ 𝑒𝑏 𝑌 𝑎 𝑒 (1.0921)𝑋 ⟹ ̂ = (0.4806) 𝑌 PROBLEMS ON FITTING OF EXPONENTIAL CURVE Y = abx Problem -1 Fit an exponential curve of the form Y = abx for the following data X 1 Y 1.0 Solution: Let 2 1.2 Y=abx 3 1.8 4 2.5 ① 5 3.6 6 4.7 7 6.6 8 9.1 Taking logarithm on both sides we get log(𝑦) = log(𝑎. 𝑏 𝑥 ) = log 𝑎 + log 𝑏 𝑥 = log 𝑎 + 𝑥 log(𝑏) ② U=A+Bx Where U=log y, A=log a, B=log b The normal equations for estimating A & B are ∑ 𝑈 = 𝑛𝐴 + 𝐵 ∑ 𝑥 ③ ∑ 𝑥𝑈 = 𝐴 ∑ 𝑥 + 𝐵 ∑ 𝑥 2 ④ X Y U = log Y XU 𝑋2 1 1.0 0 0 1 2 1.2 0.0792 0.1584 4 3 1.8 0.2553 0.7659 9 4 2.5 0.3979 1.5916 16 5 3.6 0.5563 2.7815 25 6 4.7 0.6721 4.0326 36 7 6.6 0.8195 5.7365 49 8 9.1 0.9590 7.6720 64 𝛴𝑋 = 36 𝛴 Y = 30.5 𝛴 U = 3.7393 𝛴 XU = 22.7385 𝛴 𝑋 2 =204 From the above table we have 𝛴𝑋 = 36 𝛴 Y = 30.5 𝛴 U = 3.7393 𝛴 XU = 22.7385 K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR 𝛴 𝑋 2 =204 Page 36
  37. 37. QTBD 2013 3.7393 = 8 (A) + B (36) ⑥ X 36 ⟹288 (A) + 1296 (B) = 134.6148 22.7385 = A (36) + B (204) ⑦ X 8 ⟹288 (A) + 1632 (B) = 181.908 336 (B) ⟹B= 47.2932 336 = 0.1407 = 47.2932 ⟹ B = 0.1408 Substituting B in equation ⑥ ⟹8 (A) + 36 (0.1408) = 3.7393 ⟹ 8(A) + 5.0688 = 3.7393 ⟹ 8(A) = 33.7393-5.0652 ⟹ 8 (A) = 1.3295 ⟹A = 1.3295 8 = 0.1662 ⟹ A = 0.1662 ⟹ a = Anti log (A) ⟹a = Anti log (0.1662) =0.6821 ⟹ a = 0.6821 ⟹ b = Anti log (B) ⟹ b = Anti log (0.1408) = 1.383 ⟹ b = 1.383 K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 37
  38. 38. QTBD 2013 CORRELATION  Uni- variate Distribution  Bi-variate Distribution  Multi – variate Distribution 1. Uni – variate Distribution: The distribution involving only one variable is called “uniVariate distribution “. Example: The heights of certain group of persons. 2. Bi – variate Distribution: The distribution involving only 2 variables is called “ BiVariate distribution “. Example: The heights and weights of certain group of persons. 3. Multi- variate Distribution: The distribution involving 2 or more than variables is called “Multi – variate distribution “.  Correlation:  Definition 1 If the change in one variable effects a change in the other variable, then Variables are said to be “correlated variables”.  Definition 2 Correlation is an analysis of the ‘co-variation’ between 2 or more variables.  Types of Correlation:  Positive Correlation (or) Direct Correlation  Negative Correlation (or) Inverse Correlation  Perfect Correlation 1) Positive Correlation:  Definition 1 If the variables deviate in same direction then the variables are to be “Positive correlation”.  Definition 2 In another words, if the increase in the value of one variable is accompanied by increase in the value of other value or a decrease in the value of one variable is accompanied by the decrease in the other variable, then the variables are said to be “Directly correlated variables”. Examples: 1) Price & Supply of goods. 2) Income & Expenditures of a group of persons. 2) Negative Correlation:  Definition 1 If the variables deviate in opposite direction then the variables are to be “Negative correlation”.  Definition 2 In another words, if the increase in the value of one variable is accompanied by decrease in the value of other value or a decrease in the value of one variable is accompanied by the increase in the other variable, then the variables are said to be “Directly correlated variable”. Examples: 1) Volume & pressure of a perfect gas. 2) Price & Demand of goods. 3) Perfect Correlation:  Definition: If the deviation in one variable is followed by a corresponding and proportional deviation in the other variable, then the variables are said to be “perfectly correlated variables”.  Linear Correlation:  Definition: If the ‘ratio’ of the change is ‘uniform’, then there will be “linear correlation” between the variables. If we plot these on the graph then we get a ‘straight line’. Example: We can see that ‘ratio of the change between the variables is same. K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 38
  39. 39. QTBD 2013 A B 2 3 7 9 12 15 17 21  Linear Correlation:  Definition: The amount of change of one variable does not bear a constant ratio of the amount of change in the other variables, and then the correlation is called “Non- linear correlation”. Non-linear correlation is also called ‘Curvy linear correlation’.  Uses (or) Applications of Correlation: 1) Correlation is a measure of extent of relation between 2 variables. 2) By using the correlation coefficient we can predict the future. 3) Correlation coefficient will contribute the economic behaviour. 4) By using the correlation coefficient we can find the value of variable if the value of another variable has given.  Perfect Linear Correlation: Definition: If the all points lie exactly on the “straight line”, then the correlation is said to be “perfect linear correlation”.  Perfect Positive Correlation: Definition: If the correlation is linear and the line runs from lower left hand corner to the upper right hand corner. Then the correlation is called “perfect positive correlation “. It is denoted by r = +1 or r = -1.  Perfect Negative Correlation: Definition: If the correlation is linear and the line runs from upper left hand corner to lower right hand corner. Then the correlation is called “perfect negative correlation.  No Correlation: If the plotted points lie scattered all over graph paper, then there is no correlation between 2 variables. And the variables are said to be “Statistically independent”. If r = 0 the variables X & Y are said to be “Independent”. Perfect +ve correlation No correlation No correlation . . . . . . . . . . . Perfect – ve correlation . K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 39
  40. 40. QTBD 2013  1) 1) 2)  a) Methods of Studying Correlation: There are 2 different methods for finding out the relationship between the Variables. Graphical Method 2) Mathematical Method Graphical Method: a) Scatter Diagram b) Scatter gram Mathematical Method: a) Karl Pearson’s Correlation Coefficient. b) Spearman’s Rank Correlation. c) Coefficient of Concurrent Deviation. d) Methods of Least Squares. Mathematical Method: Karl Pearson’s Correlation Coefficient: As a measure of ‘intensity’ or ‘degree’ of linear relationship between 2 variables, Karl Pearson’s, a British Bio-metrician, developed a formula called “correlation coefficient”. Correlation coefficient 2 variables X & Y, usually denoted by r (x, y) or 𝑟 𝑋𝑌 and is given by 𝑐𝑜𝑣 (𝑥,𝑦) r (x, y) = 𝑟 𝑋𝑌 = →1 √ 𝑥 .√ 𝑦 ̅ ̅ ̅𝑌 1 ̅𝑌 Where Cov (X,Y) = E{ (X-E(x) (Y-E(Y) } =E { (X-𝑋) (Y-𝑌) } = E(XY) -𝑋 ̅ = 𝑛 (𝛴XY) -𝑋 ̅ ̅ V(X)= E {(𝑋 − 𝐸(𝑋))2 } = E { 𝑋 2 - ̅ 2 } = E(𝑋 2 ) – E(𝑋 2 ) = 𝑋 ̅ V(Y)= E {(𝑌 − 𝐸(𝑌))2 } = E { 𝑌 2 - ̅ 2} = E(𝑌 2 ) – E(𝑌 2 ) = 𝑌 r (x, y) = 1 𝑛 1 𝑛 𝞢𝑋 2 - ̅ 2 𝑋 𝞢𝑌 2 - ̅ 2 𝑌 1⁄ (∑ 𝑥𝑦)−(𝑋) (𝑌) ̅ ̅ ̅̅̅ ̅̅ 𝑛 ̅ ̅ √1⁄ 𝑛 ∑ 𝑋 2 −(𝑋)2 .√1⁄ 𝑛 ∑ 𝑌 2 −(𝑌)2  Properties of Correlation Coefficient: 1) Limits for correlation coefficient lies between -1 & +1. i.e. -1 ≤ r (x, y) ≤ +1. 2) Correlation coefficient is independent of change of origin & scale. 3) Two independent variables are un-correlated. Its converse need not be true.  Regression: Definition: “Regression Analysis” is a mathematical measure of average relationship between 2 or more variables in terms of the original units of the data. In regression Analysis there 2 types of variables, dependent variable & independent variable. The variable whose value is ‘influenced’ or is to be ‘predicted’ is called ‘Dependent variable’ The variable which ‘influences’ or is used for ‘prediction’ is called “independent variable”.  Lines of Regression: The line of regression is the line which gives the best estimate to the of one variable for any specific value of the other variable. Thus the line of regression is the line of ‘best fit’, Which can be obtained by using “principle of least square “technique.  Linear Regression: If the points in the scatter diagram are a straight line, then it is called “linear Regression”.  Non-Linear Regression: K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 40
  41. 41. QTBD 2013 If the points in the scatter diagram is a curve, then is is called “non-linear Regression” or “curvy-linear regression”.  Curve of Regression: If the variables in a bi- variate distribution are related, we find that the points in the Scatter diagram will cluster round some curve is called “curve of regression”. Let us suppose that in the bi- variate distribution (x, y) i= 1, 2, ...., n where X= independent variable Y = dependent variable. Let the line of the regression Y on X be Y=a+bX→1 According to the principle of least squares, the normal equations for estimating a & b are 𝑛 𝑛 𝑛 𝑛 𝑛 ∑ 𝑖=1 𝑦 𝑖 = n.a +b ∑ 𝑖=1 𝑥 𝑖 →2 ∑ 𝑖=1(𝑥 𝑖 ). ( 𝑦 𝑖 ) = 𝑎 ∑ 𝑖=1(𝑋 𝑖 ) +b ∑ 𝑖=1 𝑥 2 →3 𝑖  Regression Equations: 1) Regression Equation Y on X 2) Regression Equation X on y  Regression Equation Y on X: Since b is the ‘slope’ of the line of regression of Y on X. And since the line of Regression passes through the point (𝑥̅ ,𝑦 ), and its equation is ̅ Y - ̅ = b (X - 𝑥̅ ) 𝑦  𝜎 ⟹ Y - ̅ = r [ 𝑥⁄ 𝜎 𝑦 ] (X - 𝑥̅ ) 𝑦 𝜎 Where 𝑏 𝑦𝑥 = r [ 𝑦⁄ 𝜎 𝑥 ] = The regression coefficient Y on X Regression Equation Y on X: The regression equation X on Y is given by 𝜎 (X - 𝑥̅ )= b (Y - ̅ ) 𝑦 ⟹(X - 𝑥̅ ) = r [ 𝑥⁄ 𝜎 𝑦 ] Y - ̅ 𝑦 𝜎 Where 𝑏 𝑥𝑦 = r [ 𝑥⁄ 𝜎 𝑦 ] = The regression coefficient Y on X r = correlation coefficient r = correlation coefficient  Regression Coefficients: The slope of the regression is called “coefficient of regression”. The coefficient of regression Y on X indicates the change in the value of variable Y corresponding to a unit change in the value of variable x and is given by 𝜎 𝜎 𝑏 𝑦𝑥 = r [ 𝑦⁄ 𝜎 𝑥 ] = The regression coefficient Y on X ⟹𝑏 𝑦𝑥 = r [ 𝑦⁄ 𝜎 𝑥 ] Similarly, the coefficient of regression X on Y indicates the change in the value of Variable X corresponding to a unit change in the value of variable Y and is given by 𝜎 𝜎 𝑏 𝑥𝑦 = r [ 𝑥⁄ 𝜎 𝑦 ] = The regression coefficient Y on X ⟹𝑏 𝑥𝑦 = r [ 𝑥⁄ 𝜎 𝑦 ]  Properties of Regression Coefficient: 1) The Geometric mean (G.M.) of regression coefficient is equals to the correlation coefficient.√(𝑏 𝑥𝑦 ). (𝑏 𝑦𝑥 ) = r 2) If one of the regression coefficients is greater than the unity, then other must be less than unity. i.e. 𝑏 𝑥𝑦 ≤ 1 ⟹𝑏 𝑦𝑥 ≥ 1 3) Arithmetic Mean (A.M.) of the regression coefficients is equals to the correlation coefficient.1⁄2 [𝑏 𝑥𝑦 +𝑏 𝑦𝑥 ] ≥ r 4) Regression coefficient is independent of change of origin but not scale. 5) The angle between 2 regression lines are K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 41
  42. 42. QTBD 2013 1− 𝑟 2 𝜃 = tan−1 { 𝑟 𝜎 𝑥2. 𝜎 𝑦2 } 𝜎 𝑥2+ 𝜎 𝑦2 . PROBLEMS ON CORRELATION COEFFICIENT: Problem -1 Calculate the correlation coefficient for the following heights (in inches) of father(X) And their sons (Y) X 65 66 67 67 68 69 70 72 Y 67 68 65 68 72 72 69 71 Solution: X Y XY 𝑋2 𝑌2 65 67 4225 4489 4355 66 68 4356 4624 4488 67 65 4489 4225 4355 67 68 4489 4624 4556 68 72 4624 5184 4896 69 72 4761 5184 4968 70 69 4900 4761 4830 72 71 5184 5041 5112 𝞢 X = 544 𝞢𝑋 2 = 37028 𝞢 Y = 552 From the above table we have 𝞢 X = 544 𝞢 Y = 552 ̅= 𝑋 ΣX n 544 = 8 ̅= 𝑌 = 68 𝞢𝑋 2 = 37028 ΣY n 𝞢𝑌 2 = 38132 = 352 8 𝞢𝑌 2 = 38132 𝞢 XY = 37560 𝞢 XY = 37560 = 69 The correlation coefficient is given by r (x, y) = = 𝑐𝑜𝑣 (𝑥,𝑦) √ 𝑥 .√ 𝑦 = 1⁄ (∑ 𝑥𝑦)−(𝑋) (𝑌) ̅ ̅ ̅̅̅ ̅̅ 𝑛 ̅ ̅ √1⁄ 𝑛 ∑ 𝑋 2 −(𝑋)2 .√1⁄ 𝑛 ∑ 𝑌 2 −(𝑌)2 4695−4692 √(4628.5−4624).(4766.5−4761) = 3 √(4.5).(5.5) 37560 − (68)(69) 8 = √ = 37028 38132 − 682 .√ − 692 8 8 3 √(24.75) = 3 4.9749 = 0.6030 ∴ r (x, y) = 0.6030 Problem -2 Calculate the correlation coefficient for the following heights (in inches) of father(X) And their sons (Y) X 65 66 67 67 68 69 70 72 Y 67 68 65 68 72 72 69 71 Solution: X Y U =X-68 Y=Y-69 UV 𝑈2 𝑉2 65 67 -3 -2 9 4 6 66 68 -2 -1 4 1 2 67 65 -4 -4 1 16 4 67 68 -1 -1 1 1 1 68 72 0 3 0 9 0 69 72 1 3 1 9 3 70 69 2 0 4 0 0 72 71 4 2 16 4 8 𝞢X=544 𝞢Y=552 𝞢U=0 𝞢V=0 𝞢𝑈 2 =36 K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR 𝞢𝑉 2 =44 𝞢UV=24 Page 42
  43. 43. QTBD 2013 The correlation coefficient is r (U,V) = ΣU ̅ ⟹𝑈 = 𝑛 𝐶𝑂𝑉(𝑈,𝑉) 𝜎 𝑈 .𝜎 𝑉 →① 0 ̅ = = 0 ⟹𝑈 =0 8 1 ̅ 𝑉 ⟹Cov (U, V) = UV – (𝑈, ̅ ) n ⟹𝜎 𝑈 2 = ̅ ⟹𝑉 = 1 24 =8 36 𝞢𝑈 2 - ̅ 2 = = 4.5-0 =4.5 𝑈 ∴ r (U,V) = √4.5 − √ 5.5 = √24.75 = 4.9749 𝑛 0 = =0 ⟹ ̅ =0 𝑉 8 ⟹ Cov (U, V) =3 - (0) (0) =3-0 =3 8 n 1 44 ⟹ 𝜎 𝑉 2 = 𝞢𝑉 2 - ̅ 2 = 8 = 5.5 -0 =5.5 𝑉 n 3 3 3 ΣV ⟹𝜎 𝑈 2 = 4.5 ⟹ 𝜎 𝑉 2 = 5.5 = 0.6030 ⟹r (U,V) =0.6030 PROBLEMS ON REGRESSION LINES Problem -1 Price indices of cotton and wool are given below for the 12 months of a year. Obtain The equations of lines of regression between the indices Price index 78 77 85 88 87 82 81 77 76 83 97 93 Of cotton (X) Price Index 84 82 82 85 89 90 88 92 83 89 98 99 of wool (Y) Solution: X Y U = X-84 V = Y-88 UV 𝑈2 𝑉2 78 84 -6 -4 36 16 24 77 82 -7 -6 49 36 42 85 82 +1 -6 1 36 -6 88 85 +4 -3 16 9 -12 87 89 +3 +1 9 1 3 82 90 -2 +2 4 4 -4 81 88 -3 0 9 0 0 77 92 -7 +4 49 16 -28 76 83 -8 -5 64 25 40 83 97 93 89 98 99 -1 +13 +9 +1 +10 +11 𝞢 X = 1004 𝞢 Y=1061 𝞢U=- 4 𝞢V = +5 ΣX ⟹ ̅= 𝑋 𝑛 ΣU ̅ ⟹𝑈 = 𝑛 r (U,V) = = = 1004 12 −4 12 ΣY ̅ ̅ =83.67 ⟹𝑋 = -83.67 ⟹𝑌 = ̅ = -0.34 ⟹𝑈 = -0.34 𝐶𝑂𝑉(𝑈,𝑉) 𝜎 𝑈 .𝜎 𝑉 𝑛 ΣV ̅ ⟹𝑉 = 𝑛 = 1 169 81 𝞢𝑈 2 =488 1061 = 12 5 12 1 100 121 𝞢𝑉 2 =365 -1 130 99 𝞢UV =287 ̅ =88.42 ⟹𝑋 = 88.42 ̅ = 0.42 ⟹𝑉 = 0.42 →① 1 ̅ 𝑉 287 Cov (U, V) = UV – (𝑈, ̅ ) = 12 - (0.34)(0.42) =23.92 – 0.14 =23.78 n 2 𝜎𝑈 = 𝜎𝑉 2 = 1 n 1 n 𝞢𝑈 - ̅ 2 = 𝑈 2 𝞢𝑉 2 - ̅ 2 = 𝑉 488 12 365 12 – (0.34)2 = 40.67 – 0.110 = 40.56 – (0.42)2 = 30.42 – 0.18 = 30.24 K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 43
  44. 44. QTBD 2013 r (U,V) = 23.78 √6.37 √5.50 = 23.78 (6.37)(5.50) = 23.78 35.03 = 0.6788 The regression equation Y on X is ⟹Y- ̅=r[ 𝑦 5.50 𝜎𝑥 ⁄ 𝜎 𝑦 ] (X - 𝑥̅ ) ⟹ (Y- 88.42)= 0.68(6.37 ) ⟹( Y-88.42) = 0.68(0.86)(X-83.67) ⟹ (Y-88.42) = (0.59) (X-83.67) 6.37 𝜎 ⟹(X - 𝑥̅ ) = r [ 𝑥⁄ 𝜎 𝑦 ] (Y - ̅ ) ⟹ (X-83.67) = 0.68 ( 𝑦 )⟹(X- 83.67) = 0.68 (1.16)(Y-88.42) 5.50 ⟹ (X- 83.67) = (0.79) (Y-88.42) K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 44

×