1.
Measures of Distribution Shape,Relative Location, and Detecting OutliersDistribution Shape z-Scores Empirical Rule Detecting Outliers 2
2.
Distribution Shape: Skewness An important measure of the shape of a distribution is called skewness. The formula for the skewness of sample data is 3 n xi − x Skewness = ∑ s (n − 1)( n − 2) Skewness can be easily computed using statistical software. 3
3.
Distribution Shape: Skewness Symmetric (not skewed) • Skewness is zero. • Mean and median are equal. .35 Skewness = 0 .30 Relative Frequency .25 .20 .15 .10 .05 0 4
4.
Distribution Shape: Skewness Moderately Skewed Left • Skewness is negative. • Mean will usually be less than the median. .35 Skewness = − .31 .30 Relative Frequency .25 .20 .15 .10 .05 0 5
5.
Distribution Shape: Skewness Moderately Skewed Right • Skewness is positive. • Mean will usually be more than the median. .35 Skewness = .31 .30 Relative Frequency .25 .20 .15 .10 .05 0 6
6.
Distribution Shape: Skewness Highly Skewed Right • Skewness is positive (often above 1.0). • Mean will usually be more than the median. .35 Skewness = 1.25 .30 Relative Frequency .25 .20 .15 .10 .05 0 7
9.
z-Scores The z-score is often called the standardized value. The z-score is often called the standardized value. It denotes the number of standard deviations a data It denotes the number of standard deviations a data value xii is from the mean. value x is from the mean. xi − x zi = s Excel’s STANDARDIZE function can be used to Excel’s STANDARDIZE function can be used to compute the z-score. compute the z-score. 10
10.
z-Scores An observation’s z-score is a measure of the relative location of the observation in a data set. A data value less than the sample mean will have a z-score less than zero. A data value greater than the sample mean will have a z-score greater than zero. A data value equal to the sample mean will have a z-score of zero. 11
12.
Empirical Rule When the data are believed to approximate a bell-shaped distribution with moderate skew … The empirical rule can be used to determine the The empirical rule can be used to determine the percentage of data values that must be within a percentage of data values that must be within a specified number of standard deviations of the specified number of standard deviations of the mean. mean. The empirical rule is based on the normal The empirical rule is based on the normal distribution, which we will discuss later. distribution, which we will discuss later. 13
13.
Empirical RuleFor data having a bell-shaped distribution, approximately 68.26% of the values are within of the values are within +/- 1 standard deviation of its mean. of its mean. 95.44% values are within of the values are within of the +/- 2 standard deviations of its mean. of its mean. 99.72% values are within of the values are within of the +/- 3 standard deviations its mean. of its mean. of 14
15.
Detecting Outliers An outlier is an unusually small or unusually large value in a data set. A data value with a z-score less than -3 or greater than +3 might be considered an outlier. It might be: • an incorrectly recorded data value • a data value that was incorrectly included in the data set • a data value that has occurred by chance 16
16.
Detecting Outliers Example: Apartment Rents • The most extreme z-scores are -1.20 and 2.27 • Using |z| > 3 as the criterion for an outlier, there are no outliers in this data set. Standardized Values for Apartment Rents-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35 0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45 1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27 17
17.
Exploratory Data Analysis Exploratory data analysis is looking at methods Exploratory data analysis is looking at methodsto summarize data.to summarize data. For now we simply sort the data values into ascending For now we simply sort the data values into ascendingorder and identify the five-number summary and thenorder and identify the five-number summary and thenconstruct a box plot..construct a box plot 18
18.
Five-Number Summary1 Smallest Value2 First Quartile3 Median4 Third Quartile5 Largest Value 19
20.
Box Plot A box plot is a graphical summary of data that is A box plot is a graphical summary of data that is based on a five-number summary. based on a five-number summary. A key to the development of a box plot is the A key to the development of a box plot is the computation of the median and the quartiles Q11 and computation of the median and the quartiles Q and Q33.. Q Box plots provide another way to identify outliers. Box plots provide another way to identify outliers. They also tell us whether the data are skewed. They also tell us whether the data are skewed. 21
21.
Box Plot Example: Apartment Rents • A box is drawn with its ends located at the first and third quartiles (Q1 & Q3). • A vertical line is drawn in the box at the location of the median (second quartile). 400 425 450 475 500 525 550 575 600 625 Q1 = 445 Q3 = 525 Q2 = 475 22
22.
Box Plot Limits are located (not drawn) using the interquartile range (IQR = Q3-Q1): they are 1.5IQR below Q1 and 1.5 IQR above Q3. Data outside these limits are considered outliers. The locations of each outlier is shown with the symbol * . 23
23.
Box Plot Example: Apartment Rents • The lower limit is located 1.5(IQR) below Q1. Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325 • The upper limit is located 1.5(IQR) above Q3. Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(80) = 645 • There are no outliers (values less than 325 or greater than 645) in the apartment rent data. 24
24.
Box Plot Example: Apartment Rents • Whiskers (dashed lines) are drawn from the ends of the box to the smallest and largest data values inside the limits. 400 425 450 475 500 525 550 575 600 625 Smallest value Largest value inside limits = 425 inside limits = 615 25
25.
Box Plot An excellent graphical technique for making comparisons among two or more groups. 26
26.
Measures of AssociationBetween Two Variables Thus far we have examined numerical methods used Thus far we have examined numerical methods used to summarize the data for one variable at a time. to summarize the data for one variable at a time. Often a manager or decision maker is interested in Often a manager or decision maker is interested in the relationship between two variables.. the relationship between two variables Two descriptive measures of the relationship Two descriptive measures of the relationship between two variables are covariance and correlation between two variables are covariance and correlation coefficient.. coefficient 27
27.
Covariance The covariance is a measure of the linear association The covariance is a measure of the linear association between two variables. between two variables. Positive values indicate a positive relationship. Positive values indicate a positive relationship. Negative values indicate a negative relationship. Negative values indicate a negative relationship. 28
28.
CovarianceThe covariance is computed as follows:The covariance is computed as follows: (x1 − µ x )(y1 − µ y ) + L + (x N − µ x )(y N − µ y ) σ xy = for N populations (x1 − x)(y1 − y) + L + (x n − x)(y n − y) s xy = for n−1 samples 29
29.
Correlation Coefficient Correlation is a measure of linear association. Correlation is a measure of linear association. There are also other types of associations not captured There are also other types of associations not captured by correlation. by correlation. 30
30.
Correlation Coefficient The correlation coefficient is computed as follows: The correlation coefficient is computed as follows: sxy σ xy rxy = ρ xy = sx s y σ xσ y for for samples populations (x1 − µ x ) + L + (x N − µ x ) 2 2 (y1 − µ y )2 + L + (y N − µ y )2σ = 2 σ2= x N y N (x1 − x)2 + L + (x n − x) 2 (y1 − y) + L + (y n − y) 2 2s2 = x s = 2 n−1 y n−1 31
31.
Correlation Coefficient The coefficient can take on values between -1 and +1. The coefficient can take on values between -1 and +1. Values near -1 indicate a strong negative linear Values near -1 indicate a strong negative linear relationship.. relationship Values near +1 indicate a strong positive linear Values near +1 indicate a strong positive linear relationship.. relationship The closer the correlation is to zero, the weaker the The closer the correlation is to zero, the weaker the relationship. relationship. 32
32.
Correlation A Positive Relationship: correlation close to 1 y x 33
33.
Correlation A Negative Relationship: correlation close to -1 y x 34
34.
CorrelationNo Apparent Relationship: Correlation near 0 y x 35
35.
Covariance and Correlation Coefficient Example: Golfing Study A golfer is interested in investigating the relationship, if any, between driving distance and 18-hole score. Average Driving Average Distance (yds.) 18-Hole Score 277.6 69 259.5 71 269.1 70 267.0 70 255.6 71 272.9 69 36
36.
Covariance and Correlation Coefficient Example: Golfing Study x y ( xi − x ) ( yi − y ) ( xi − x )( yi − y ) 277.6 69 10.65 -1.0 -10.65 259.5 71 -7.45 1.0 -7.45 269.1 70 2.15 0 0 267.0 70 0.05 0 0 255.6 71 -11.35 1.0 -11.35 272.9 69 5.95 -1.0 -5.95Average 267.0 70.0 Total -35.40Std. Dev. 8.2192 .8944 37
37.
Covariance and Correlation Coefficient Example: Golfing Study • Sample Covariance sxy = ∑ (x − x )(y − y ) = − 35.40 = i i − 7.08 n− 1 6−1 • Sample Correlation Coefficient sxy −7.08 rxy = = = -.9631 sx sy (8.2192)(.8944) So, increasing driving distance decreases score, and the relation is really strong. 38
38.
Random Variables A random variable is a numerical description of the A random variable is a numerical description of the outcome of an experiment. outcome of an experiment. A discrete random variable may assume either a A discrete random variable may assume either a finite number of values or an infinite sequence of finite number of values or an infinite sequence of values. values. A continuous random variable may assume any A continuous random variable may assume any numerical value in an interval or collection of numerical value in an interval or collection of intervals. intervals. 39
39.
Random Variables Question Random Variable x TypeFamily x = Number of dependents Discretesize reported on tax returnDistance from x = Distance in miles from Continuoushome to store home to the store siteOwn dog x = 1 if own no pet; Discreteor cat = 2 if own dog(s) only; = 3 if own cat(s) only; = 4 if own dog(s) and cat(s) 40
40.
Discrete Probability Distributions The probability distribution for a random variable The probability distribution for a random variable describes how probabilities are distributed over describes how probabilities are distributed over the values of the random variable. the values of the random variable. 41
41.
Discrete Probability Distributions The probability distribution is defined by a The probability distribution is defined by a probability function,, denoted by ff((x), which provides probability function denoted by x), which provides the probability for each value of the random variable. the probability for each value of the random variable. The required conditions for a discrete probability The required conditions for a discrete probability function are: function are:>0 (probabilities are not negative)(x) = 1 (sum of all probabilities =1) Remember that any probability is a number between 0 and 1. 42
42.
Expected Value The expected value,, or mean, of a random variable The expected value or mean, of a random variable is a measure of its central location. is a measure of its central location. E(x) = µ = Σxf(x) The expected value does not have to be a value the The expected value does not have to be a value the random variable can assume. random variable can assume. 43
43.
Variance and Standard Deviation The variance summarizes the variability in the The variance summarizes the variability in the values of a random variable. values of a random variable. Var(x) = σ 2 = Σ(x - µ)2f(x) The standard deviation,, σ,, is defined as the positive The standard deviation σ is defined as the positive square root of the variance. square root of the variance. 44
44.
Binomial Probability Distribution Four Properties of a Binomial Experiment 1. The experiment consists of a sequence of n identical trials.2. Two outcomes, success and failure, are possible on each trial. 3. The probability of a success, denoted by p, does not change from trial to trial. 4. The trials are independent. 45
45.
Binomial Probability Distribution Our interest is in the number of successes occurring in the n trials. 46
46.
Binomial Probability Distribution Binomial Probability Function n x f ( x ) = p (1 − p )( n − x ) x where: x = the number of successes p = the probability of a success on one trial n = the number of trials f(x) = the probability of x successes in n trials n n! = ( 1 × 2 × 3L × n ) ÷= x (n − x )! x ! ( 1 × 2 × 3L × (n − x ) ) ( 1 × 2 × 3L × x )= `n choose x’ = number of ways x people can be chosen out of n 47
47.
Binomial Probability Distribution Binomial Probability Function n x f ( x ) = p (1 − p)( n − x ) x Probability of a particularNumber of experimental sequence of trial outcomes outcomes providing exactly with x successes in n trialsx successes in n trials These values are available in Table 5 of our textbook. 48
48.
Binomial Probability Distribution Example: IIT Entrance It is known that about 10% of the examinees taking the IIT entrance qualify. Thus, for any examinee chosen at random, there is a probability of 0.1 that the person will qualify. Choosing 3 examinees at random, what is the probability that exactly 1 of them will qualify? 49
49.
Binomial Probability Distribution Example: IIT Entrance Using the p = .10, n = 3, x = 1 probability function n! f ( x) = p x (1 − p ) (n − x ) x !( n − x )! 3! f (1) = (0.1)1 (0.9)2 = 3(.1)(.81) = .243 1!(3 − 1)! You can just check the binomial probability table in textbook for n= 3, p = 0.1, x = 1. Just f(1) if Or, in Excel, use ‘=BINOMDIST(1,3,0.1,FALSE)’ FALSE, f(0)+f(1) if TRUE 50
50.
Binomial Probability Distribution Expected Value E(x) = µ = np Variance Var(x) = σ 2 = np(1 − p) Standard Deviation σ = np(1 − p ) 51
51.
Binomial Probability Distribution Example: Evans Electronics • Expected Value E(x) = np = 3(.1) = .3 employees out of 3 • Variance Var(x) = np(1 – p) = 3(.1)(.9) = .27 • Standard Deviation σ = 3(.1)(.9) = .52 employees 52
52.
Poisson Probability Distribution A Poisson distributed random variable is often A Poisson distributed random variable is often useful in estimating the number of occurrences useful in estimating the number of occurrences over a specified interval of time or space over a specified interval of time or space It is a discrete random variable that may assume It is a discrete random variable that may assume an infinite sequence of values (x = 0, 1, 2, .. .. .. ). an infinite sequence of values (x = 0, 1, 2, ). 53
53.
Poisson Probability Distribution Examples of a Poisson distributed random variable: Examples of a Poisson distributed random variable: the number of defects in 14 pages of a book the number of defects in 14 pages of a book the number of customers arriving at the post the number of customers arriving at the post office in one hour office in one hour Bell Labs used the Poisson distribution to model the Bell Labs used the Poisson distribution to model the arrival of phone calls. arrival of phone calls. 54
54.
Poisson Probability Distribution Two Properties of a Poisson Experiment 1. The probability of an occurrence is the same 1. The probability of an occurrence is the same for any two time intervals of equal length. for any two time intervals of equal length. 2. The occurrence or nonoccurrence in any time 2. The occurrence or nonoccurrence in any time interval is independent of the occurrence or interval is independent of the occurrence or nonoccurrence in any other time interval. nonoccurrence in any other time interval. 55
55.
Poisson Probability DistributionPoisson Probability Function µ xe−µ f ( x) = x! where: x = the number of occurrences in an interval f(x) = the probability of x occurrences in an interval µ = mean number of occurrences in an interval e = 2.71828 These values are available in Table 7 of our textbook. 56
56.
Poisson Probability DistributionPoisson Probability Function Since there is no stated upper limit for the number of occurrences, the probability function f(x) is applicable for values x = 0, 1, 2, … without limit.In practical applications, x will eventually becomelarge enough so that f(x) is very small and negligible. 57
57.
Poisson Probability Distribution Example: Mercy Hospital Patients arrive at the emergency room of Mercy Hospital at the average rate of 6 per hour on weekend evenings. What is the probability of 4 arrivals in 30 minutes on a weekend evening? 58
58.
Poisson Probability Distribution Using the Example: Mercy Hospital probability function µ = 6/hour = 3/half-hour, x = 4 3 4 (2.71828)−3 f (4) = = .16801 4! Or, simply check the table for Poisson probabilities in the book for μ = 3, x = 4. Just f(4) if FALSE, In Excel, use ‘=POISSON(4,3,FALSE)’ f(0)+f(1)+...+f(4) if TRUE 59
59.
Poisson Probability Distribution Example: Mercy Hospital Poisson Probabilities 0.25 0.20 Probability 0.15 actually, the sequence 0.10 continues: 11, 12, … 0.05 0.00 0 1 2 3 4 5 6 7 8 9 10 Number of Arrivals in 30 Minutes 60
60.
Poisson Probability Distribution A property of the Poisson distribution is that A property of the Poisson distribution is that the mean and variance are equal. the mean and variance are equal. µ=σ2 61
61.
Continuous Probability Distributions A continuous random variable can assume any value in an interval on the real line or in a collection of intervals. It is not possible to talk about the probability of the random variable assuming a particular value. Instead, we talk about the probability of the random variable assuming a value within a given interval. 62
62.
We denote the ‘density function’ by f(x). Also f ( x ) ≥ 0; ∫ f ( x )dx = 1 E( X ) = ∫ xf ( x )dx Var ( X ) = ∫ ( x − E( X ) ) f ( x )dx 2 63
63.
Area as a Measure of Probability The area under the graph of f(x) and probability are identical. This is valid for all continuous random variables. The probability that x takes on a value between some lower value x1 and some higher value x2 can be found by computing the area under the graph of f(x) over the interval from x1 to x2. 64
64.
Normal Probability Distribution The normal probability distribution is the most important distribution for describing a continuous random variable. It is used in a wide variety of applications including: • Heights of people • Test scores • Rainfall amounts • Scientific measurements For a large number of similar variables that are unrelated, sum and average are approximately normal. 65
65.
Normal Probability DistributionNormal Probability Density Function 1 − ( x − µ )2 /2σ 2 f (x) = e σ 2π where: µ = mean σ = standard deviation π = 3.14159 e = 2.71828 66
66.
Normal Probability Distribution Characteristics The distribution is symmetric; its skewness measure is zero. x 67
67.
Normal Probability Distribution Characteristics The highest point on the normal curve is at the mean, the middle point. x 68
68.
Normal Probability Distribution Characteristics The mean can be any numerical value: negative, zero, or positive. x -10 0 25 69
69.
Normal Probability Distribution Characteristics The standard deviation determines the width of the curve: larger values result in wider, flatter curves. σ = 15 σ = 25 x 70
70.
Normal Probability Distribution Characteristics Probabilities for the normal random variable are given by areas under the curve. The total area under the curve is 1 (.5 to the left of the mean and .5 to the right). .5 .5 x 71
71.
Normal Probability Distribution Characteristics (basis for the empirical rule) 68.26% of values of a normal random variable 68.26% are within +/- 1 standard deviation of its mean. +/- 1 standard deviation 95.44% of values of a normal random variable 95.44% are within +/- 2 standard deviations of its mean. +/- 2 standard deviations 99.72% of values of a normal random variable 99.72% are within +/- 3 standard deviations of its mean. +/- 3 standard deviations 72
72.
Normal Probability Distribution Characteristics (basis for the empirical rule) 99.72% 95.44% 68.26% µ x µ – 3σ µ – 1σ µ + 1σ µ + 3σ µ – 2σ µ + 2σ 73
73.
Standard Normal Probability Distribution Characteristics A random variable having a normal distribution A random variable having a normal distribution with a mean of 0 and a standard deviation of 1 is with a mean of 0 and a standard deviation of 1 is said to have a standard normal probability said to have a standard normal probability distribution.. distribution 74
74.
Standard Normal Probability Distribution Characteristics The letter z is used to designate the standard normal random variable. σ=1 z 0 75
75.
Standard Normal Probability Distribution Converting to the Standard Normal Distribution x−µ z= σ 76
76.
Example: DemandThe daily demand of the new ipad in a store seemsto follow a normal distribution with an average of15 and a standard deviation of 6.The manager, who does not want to keep more than20 ipads in his store at a time, would like to knowthe probability of a stockout, i.e. that the demand ina day will exceed 20. P(x > 20) = ? 77
77.
Solving for the Stockout ProbabilityStep 1: Convert x to the standard normal distribution. z = (x - µ)/σ = (20 - 15)/6 = .83 78
78.
Step 2: Find the area under the standard normal curve to the left of z = .83.Cumulative Probability Table for Standard Normal Distribution z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 . . . . . . . . . . . .5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224 .6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549 .7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852 .8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133 .9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389 . . . . . . . . . . . P(z < .83) These values are available in Table 1 of our textbook. 79
79.
Just f(0.83) if FALSE, area upto 0.83 ifIn Excel, use ‘=NORMDIST(0.83,0,1,TRUE)’ TRUEIn fact, you can straightaway use ‘=NORMDIST(20,15,6,TRUE)’ P(X ≤ 20) with μ = 15, σ = 6 80
80.
Standard Normal Probability Distribution Solving for the Stockout Probability Step 3: Compute the area under the standard normal Step 3: Compute the area under the standard normal curve to the right of z = .83. curve to the right of z = .83. P(z > .83) = 1 – P(z < .83) = 1- .7967 = .2033 Probability of a stockout P(x > 20) 81
81.
Standard Normal Probability Distribution Solving for the Stockout Probability Area = 1 - .7967 Area = .7967 = .2033 z 0 .83 These values are available in Table 1 of our textbook. 82
82.
Standard Normal Probability Distribution If the manager of wants the probability of a stockout during replenishment lead-time to be no more than . 05, what should the reorder point be? --------------------------------------------------------------- (Hint: Given a probability, we can use the standard normal table in an inverse fashion to find the corresponding z value. Give it a try.) 83
Be the first to comment