Upcoming SlideShare
×

# QT1 - 06 - Normal Distribution

1,280 views

Published on

Class notes used in Quantitative Techniques - I course at Praxis Business School, Calcutta

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
1,280
On SlideShare
0
From Embeds
0
Number of Embeds
66
Actions
Shares
0
56
0
Likes
0
Embeds 0
No embeds

No notes for slide

### QT1 - 06 - Normal Distribution

1. 1. Normal Distributions Q U A N T T E C H I N T E U Q I A S E V I T 1 0 S
2. 2. Continuous Random Variable <ul><li>Discrete Variables </li></ul><ul><ul><li>Bernoulli </li></ul></ul><ul><ul><li>Binomial </li></ul></ul><ul><ul><li>Poisson </li></ul></ul><ul><li>Random variables can take a fixed number of values, like ... </li></ul><ul><ul><li>Integers </li></ul></ul><ul><ul><li>Range of integers </li></ul></ul><ul><ul><li>Any arbitrary FINITE set </li></ul></ul><ul><li>For each value there is a probability of occurrence </li></ul><ul><li>Continuous </li></ul><ul><ul><li>Exponential </li></ul></ul><ul><ul><li>Normal </li></ul></ul><ul><li>Random value can take any value, including fractional values, in a range. </li></ul><ul><ul><li>This is potentially INFINITE </li></ul></ul><ul><ul><li>Between 1 and 2 </li></ul></ul><ul><ul><li>1.1, 1.2, 1.3 .... 1.9, 2.0 </li></ul></ul><ul><ul><li>1.1, 1.11, 1.12, 1.13 .. </li></ul></ul><ul><ul><li>1.111, 1.112, 1.113 ... </li></ul></ul><ul><ul><li>1.1111, 1.1112, 1.1113 .. </li></ul></ul>
3. 3. Probability of a Continuous Variable <ul><li>Since an infinite number of random variables are possible, in ANY given range, the probability of any single variable is ZERO! </li></ul><ul><ul><li>P( X = x) = 0 !! </li></ul></ul><ul><li>However the probability of the variable lying in a range is not zero </li></ul><ul><ul><li>P(x < X <= x+ d x) = f(x) </li></ul></ul><ul><ul><li>Where d x is small range </li></ul></ul><ul><li>Example : </li></ul><ul><ul><li>Suppose Z is the mm of rain that falls in Calcutta </li></ul></ul><ul><ul><li>Probability (Z = 144.5 mm ) = 0 </li></ul></ul><ul><ul><li>Probability (Z between 144.49 and 144.51) = non zero </li></ul></ul>
4. 4. Distribution Functions <ul><li>Discrete Random Variable </li></ul><ul><ul><li>Bernoulli, Binomial, Poisson </li></ul></ul><ul><li>p(N = n) is defined </li></ul><ul><li>S P(N = n i ) = 1 </li></ul><ul><li>Continuous Random Variable </li></ul><ul><ul><li>Exponential, Normal </li></ul></ul><ul><li>p(X = x) = 0 </li></ul><ul><li>f(x) is defined as </li></ul><ul><ul><li>P(x < X <= x+dx) </li></ul></ul><ul><ul><li>Were dx is a very small number </li></ul></ul><ul><li>= 1 </li></ul>The sum of the probabilities of ALL possible values of the random variable MUST be equal to 1
5. 5. Density Function <ul><li>Exponential Variable </li></ul><ul><li>P(x) = l r . e -( l r x) </li></ul><ul><li>Normal Distribution </li></ul><ul><li>Normal Distribution </li></ul><ul><li>The equation looks very complex .. but it is rarely used in this form. </li></ul><ul><li>We use a tool to calculate the values ... </li></ul><ul><ul><li>Spreadsheets allow us to calculate values of P(x) in terms of x, m , s </li></ul></ul><ul><ul><li>Printed tables are available that show the values </li></ul></ul>
6. 6. What do these graphs mean ? <ul><li>Probability of X lying in the range x -> x+ d x when the underlying distribution has the following parameters </li></ul><ul><ul><li>A : m = 0.0, s = 0.1 </li></ul></ul><ul><ul><li>B : m = 00.0, s = 0.4 </li></ul></ul><ul><ul><li>C : m = 0.75, s = 0.1 </li></ul></ul><ul><ul><li>D : m = 0.75, s = 0.4 </li></ul></ul>
7. 7. Characteristics of a Normal Distribution <ul><li>The curve has a single peak. </li></ul><ul><ul><li>It is Unimodal </li></ul></ul><ul><ul><li>Has a bell shape </li></ul></ul><ul><li>The mean of a normally distributed population lies at the centre of the of the normal curve </li></ul><ul><ul><li>Because of the symmetry of the normal distribution, the median and the mode is also at the centre. </li></ul></ul><ul><ul><li>The mean, median and mode are equal in value </li></ul></ul><ul><li>The two tails of the normal distribution extend indefinitely and (theoretically) should never touch the horizontal axis </li></ul>
8. 8. Three very similar terms ! <ul><li>Distribution </li></ul><ul><ul><li>A name which describes the nature of the underlying population from where a random variable is selected </li></ul></ul><ul><ul><ul><li>Bernoulli, Binomial, Poisson </li></ul></ul></ul><ul><ul><ul><li>Exponential, Normal </li></ul></ul></ul><ul><li>Density Function </li></ul><ul><ul><li>f(x) = probability P(x < X <= x+ d x) </li></ul></ul><ul><ul><li>Probability that x lies in the small range x to x + d x </li></ul></ul><ul><li>Distribution Function </li></ul><ul><ul><li>F(x) = probability P (X <= x ) </li></ul></ul><ul><ul><li>Probability that X is less than or equal to x </li></ul></ul>
9. 9. From Density to Distribution <ul><li>Area under the curve of the density function to the left of the red line ( at some value of x) </li></ul><ul><li>IS EQUAL TO </li></ul><ul><li>Value of the distribution function at the same value of x </li></ul>
10. 10. Area under the Density curve <ul><li>Area under the curve of the density function </li></ul><ul><li>represents the </li></ul><ul><li>fraction of the observations that lie below ( less than) the corresponding value in the distribution function </li></ul><ul><li>The final value of the distribution function is 1 </li></ul><ul><ul><li>Probability of all values being below this value is 1 </li></ul></ul><ul><ul><li>Certainty </li></ul></ul>
11. 11. Area under the Density curve <ul><li>Area to left of A </li></ul><ul><ul><li>Values less than m </li></ul></ul><ul><ul><li>50% total area </li></ul></ul><ul><ul><li>50% of all observations are below this value </li></ul></ul><ul><li>Area to left of B </li></ul><ul><ul><li>Values less than m – s </li></ul></ul><ul><ul><li>16% of total area </li></ul></ul><ul><ul><li>16% of all observations are below this value </li></ul></ul><ul><li>Area to left of C </li></ul><ul><ul><li>Values less than m – 2s </li></ul></ul><ul><ul><li>2.25% of total area </li></ul></ul><ul><ul><li>2.25% of all observations are below this value </li></ul></ul>m =1.5 s 2 s A B C 2.25% 16%
12. 12. The 2 s limit <ul><li>95.5 % of all values are expected to </li></ul><ul><ul><li>Lie in region around the mean value m in range that lies between </li></ul></ul><ul><ul><li>Lower bound : m – 2s </li></ul></ul><ul><ul><li>Upper bound : m + 2s </li></ul></ul><ul><li>Similarly </li></ul><ul><ul><li>68% in the region between m – s to m + s </li></ul></ul><ul><ul><li>99.7% in the region between m – 3s to m + 3s </li></ul></ul>m =1.5 2 s A C1 2.25% C2 2 s 2.25%
13. 13. Problem Type A <ul><li>Number of blue shirts sold at a store </li></ul><ul><ul><li>Average 30 </li></ul></ul><ul><ul><li>Std Deviation 8 </li></ul></ul><ul><li>Number of defects in a batch of 1000 </li></ul><ul><ul><li>Average 50 </li></ul></ul><ul><ul><li>Std Deviation 10 </li></ul></ul><ul><li>What is the probability that </li></ul><ul><ul><li>On a given date, the number of shirts sold will be </li></ul></ul><ul><ul><ul><li>Less than or equal to 20 </li></ul></ul></ul><ul><ul><ul><li>More than 35 </li></ul></ul></ul><ul><ul><ul><li>Between 32 - 27 </li></ul></ul></ul><ul><ul><li>In any given batch the number of defects will be </li></ul></ul><ul><ul><ul><li>Less than 40 </li></ul></ul></ul><ul><ul><ul><li>More than 60 </li></ul></ul></ul><ul><ul><ul><li>Between 55 and 60 </li></ul></ul></ul>
14. 14. Solution to Shirt Problem P( x <= 20) = 0.11 P (x > 35) = 1 – 0.73 = 0.27 P (27 < x <= 32) = 0.6 – 0.35 = 0.25
15. 15. Problem Type B <ul><li>Demand for blue shirts at a store </li></ul><ul><ul><li>Average 30 </li></ul></ul><ul><ul><li>Std Deviation 8 </li></ul></ul><ul><li>Number of defects in a batch of 1000 </li></ul><ul><ul><li>Average 50 </li></ul></ul><ul><ul><li>Std Deviation 10 </li></ul></ul><ul><ul><li>Batch is rejected if number of defects is 52 </li></ul></ul><ul><li>What is the probability </li></ul><ul><ul><li>Of a 'stock-out' on any day if the store stocks 40 shirts </li></ul></ul><ul><li>How many shirts should be stocked </li></ul><ul><ul><li>So that the probability of stock out is 5% or less </li></ul></ul><ul><li>What is the probability of a batch being rejected ? </li></ul><ul><li>To what average level of defects should the production be improved to ensure that probability of rejection is 5% or less </li></ul>
16. 16. Solution to Shirt Problem
17. 17. Mostly 2 Categories of Problems <ul><li>Normal Distribution </li></ul><ul><ul><li>Mean is known </li></ul></ul><ul><ul><li>Std Deviation is known </li></ul></ul><ul><li>Given : A Value </li></ul><ul><ul><li>Find Probability of Variable being less than or equal to this value </li></ul></ul><ul><li>Given : A Probability </li></ul><ul><ul><li>Find a value such that the probability of the variable being less than or equal to this value is equal to given probability </li></ul></ul><ul><li>Use Formula from Spreadsheet .. </li></ul><ul><ul><li>Since mathematical formula is very complex </li></ul></ul><ul><li>P = NORMDIST( V , m,s ) </li></ul><ul><li>V = NORMINV( P , m,s ) </li></ul>
18. 18. Plus the issue of “range” <ul><li>What is the probability of the random variable falling between 1 and 4 ? </li></ul><ul><ul><li>Area under the density curve between the two red lines </li></ul></ul><ul><ul><li>Difference between the corresponding values in the distribution function as shown as the gap between the two green lines </li></ul></ul>
19. 19. Before the Era of Spreadsheets <ul><li>Given a normal distribution N( m,s ) that has mean = m and standard deviation is s </li></ul><ul><li>Calculation of either </li></ul><ul><ul><li>P = NORMDIST( X , m,s ) or </li></ul></ul><ul><ul><li>X = NORMINV( P , m,s ) </li></ul></ul><ul><li>Can only be done through a computer </li></ul><ul><li>OR ... </li></ul><ul><li>By Looking up printed tables that list values of P and V together </li></ul><ul><li>But for each combination of m and s there would be a different table ! </li></ul><ul><li>How many tables would you need to keep at hand ? </li></ul><ul><li>Strangely enough ... ONLY ONE ! </li></ul>
20. 20. Standard Normal Distribution <ul><li>Suppose you have a variable X that follows a Normal distribution N( m,s ) </li></ul><ul><li>Define a second variable Z such that </li></ul><ul><ul><li>Z = (X – m )/ s </li></ul></ul><ul><li>Then Z will follow a normal distribution N(0,1) where m z = 0 and s z = 1 </li></ul><ul><li>So we can calculate .. </li></ul><ul><ul><li>P = NORMDIST( X , m,s ) </li></ul></ul><ul><ul><li>= NORMDIST( Z ,0 ,1 ) </li></ul></ul><ul><ul><li>AND </li></ul></ul><ul><ul><li>Z = NORMINV( P ,0 ,1 ) </li></ul></ul><ul><ul><li>& since Z = (X – m ) / s </li></ul></ul><ul><ul><li>X = s * NORMINV( P ,0 ,1 ) + m </li></ul></ul><ul><li>If we have just ONE table that lists P and X values of the normal distribution N(0,1) </li></ul>
21. 21. Identical Distribution Functions