SlideShare a Scribd company logo
1 of 103
Statistics for Six Sigma
Why a 6 sigma practioner needs to
Know about Statistics
› To be able to effectively conduct 6 sigma
investigation. Without the use of statistics it
would be very difficult to make decisions
based on the data collected .
› To further develop critical and analytic thinking
skills.
› To act as an informed investigator.
› To know how to properly analyze information
› To know how to draw conclusions about
populations based on sample information
Key Definitions
› A population (universe) is the collection of
things under consideration
› A sample is a portion of the population
selected for analysis
› A parameter is a summary measure computed
to describe a characteristic of the population
› A statistic is a summary measure computed to
describe a characteristic of the sample
Population and Sample
Population Sample
Use parameters to
summarize features
Use statistics to
summarize features
Inference on the population from the sample
Statistical Methods
› Descriptive statistics
– Collecting and describing data
› Inferential statistics
– Drawing conclusions and/or making decisions
concerning a population based only on sample data
Descriptive Statistics
› Collect data
– e.g. Survey
› Present data
– e.g. Tables and graphs
› Characterize data
– e.g. Sample mean = i
X
n

Why We Need Data
› To provide input to survey
› To provide input to study
› To measure performance of service or production
process
› To evaluate conformance to standards
› To assist in formulating alternative courses of action
› To satisfy curiosity
Data Sources
Primary
Data Collection
Secondary
Data Compilation
Observation
Experimentation
Survey
Print or Electronic
Statistical Inquiry
Primary and Secondary Data
The difference between the primary and the secondary
data is only one of degree of detachment with the
original source. The data which is primary in the hands
of one may become secondary in the hands of others.
For example, if it is desired to conduct an investigation
into the working conditions or workers of textile mills,
the facts collected by the investigators directly from the
workers themselves would be termed as the primary
data. But if the information is obtained from a report
prepared by the labour department of the Government,
will be called secondary data.
Types of Data
Categorical
(Qualitative)
Discrete Continuous
Numerical
(Quantitative)
Data
Key Terms
› Measures of central tendency: statistical measurements
such as the mean, median or mode that indicate how
data groups toward the center.
› Measures of variation or dispersion: statistical
measurement such as the range and standard deviation
that indicate how data is dispersed or spread.
Measures of Central Tendency
› Find the mean
› Find the median
› Find the mode
0
1
2
3
4
5
6
7
8
9
› Mean: the arithmetic average of a set of data or sum of
the values divided by the number of values.
› Median: the middle value of a data set when the values
are arranged in order of size.
› Mode: the value or values that occur most frequently in a
data set.
Key Terms
Find the mean of a data set.
1. Find the sum of the values.
2. Divide the sum by the total number of values.
Mean = sum of values
number of values
Here’s an example.
Sales figures for the last week for the Western
region have been as follows:
› Monday Rs 4,200
› Tuesday Rs 3,980
› Wednesday Rs 2,400
› Thursday Rs 3,100
› Friday Rs 4,600
› What is the average daily sales figure?
› Rs 3,656
Try these examples.
› Mileage for the new salesperson has been 243, 567, 766,
422 and 352 this week. What is the average number of
miles traveled?
› 470 miles daily
› Prices from different suppliers of 500 sheets of copier
paper are as follows: Rs 399, Rs 475, Rs 375 and Rs
425. What is the average price?
› Rs 419
Find the median.
› Arrange the values in the data set from smallest to largest
(or largest to smallest) and select the value in the middle.
› If the number of values is odd, it will be exactly in the
middle.
› If the number of values is even, identify the two middle
values. Add them together and divide by two.
Here is an example.
› A recent survey of the used car market for the particular
model John was looking for yielded several different
prices. Find the median price.
› $9,400, $11,200, $5,900, $10,000, $4,700, $8,900,
$7,800 and $9,200.
› Arrange from highest to lowest:
$11,200, $10,000, $9,400, $9,200, $8,900, $7,800,
$5,900 and $4,700.
› Calculate the average of the two middle values.
› $9050 is the median price.
Try this example.
› Five local moving companies quoted the following
prices to Bob’s Best Company: $4,900, $3800, $2,700,
$4,400 and $3,300. Find the median price.
› $3,800
Find the mode.
› Find the mode in a data set by counting the
number of times each value occurs.
› Identify the value or values that occur most
frequently.
› There may be more than one mode if the same
value occurs the same number of times as
another value.
› If no one value appears more than once, there is
no mode.
Find the mode in this data set.
› Results of a placement test in mathematics
included the following scores:
65, 80, 90, 85, 95, 85, 80, 70 and 80.
› Which score occurred the most frequently?
› 80 is the mode. It appeared three times.
› Range: the difference between the highest and lowest
values in a data set. (also called the spread)
› Deviation from the mean: the difference between a value
of a data set and the mean.
› Standard variation: a statistical measurement that shows
how data is spread above and below the mean.
Key Terms
› Variance: a statistical measurement that is the
average of the squared deviations of data from
the mean. The square root of the variance is
the standard deviation.
› Square root: the quotient of number which is the
product of that number multiplied by itself. The
square root of 81 is 9. (9 x 9 = 81)
› Normal distribution: a characteristic of many
data sets that shows that data graphs into a
bell-shaped curve around the mean.
Key Terms
Find the range in a data set
› Find the highest and lowest values.
› Find the difference between the two.
› Example: The grades on the last exam were 78, 99,
87, 84, 60, 77, 80, 88, 92, and 94.
The highest value is 99.
The lowest value is 60.
The difference or the range is 39.
Find the standard deviation
› The deviation from the mean of a data value is
the difference between the value and the mean.
› Get a clearer picture of the data set by
examining how much each data point differs or
deviates from the mean.
Deviations from the mean
› When the value is smaller than the mean, the
difference is represented by a negative number
indicating it is below or less than the mean.
› Conversely, if the value is greater than the
mean, the difference is represented by a positive
number indicating it is above or greater than the
mean.
Find the deviation from the mean.
› Find the mean of a set of data.
› Mean = Sum of data values
Number of values
› Find the amount that each data value deviates or is
different from the mean.
› Deviation from the mean = Data value - Mean
Here’s an example.
› Data set: 38, 43, 45, 44
› Mean = 42.5
› First value: 38 – 42.5 = -4.5 below the mean
› Second value: 43 – 42.5 = 0.5 above the mean
› Third value: 45 – 42.5 = 2.5 above the mean
› Fourth value: 44 – 42.5 = 1.5 above the mean
Interpret the information
› One value is below the mean and its deviation is
-4.5.
› Three values are above the mean and the sum of
those deviations is 4.5.
› The sum of all deviations from the mean is zero.
This is true of all data sets.
› We have not gained any statistical insight or new
information by analyzing the sum of the
deviations from the mean.
Average deviation
› Average deviation =
Sum of deviations = 0 = 0
Number of values n
Find the standard deviation
of a set of data.
› A statistical measure called the standard
deviation uses the square of each deviation
from the mean.
› The square of a negative value is always
positive.
› The squared deviations are averaged (mean)
and the result is called the variance.
› The square root is taken of the variance so that
the result can be interpreted within the context
of the problem.
› This formula averages the values by dividing the
number of values (n).
› Several calculations are necessary and are best
organized in a table.
Find the standard deviation
of a set of data.
1. Find the mean.
2. Find the deviation of each value from the mean.
3. Square each deviation.
4. Find the sum of the squared deviations.
5. Divide the sum of the squared deviations by the
number of values in the data set. This amount is
called the variance.
6. Find the standard deviation by taking the square
root of the variance.
Find the standard deviation
of a set of data.
Standard Deviation
Standard deviation measures variation of values
from the mean, using the following formula:
   (x – x )2
n
Where  = sum of, X = observed values, X bar
(X with a line over the top) = arithmetic mean,
and n = number of observations.
Standard Deviation (Contd..)
Average difference between any value in a series of values
and the mean of all the values in that series. This statistic is
a measure of the variation in a distribution of values.
If we plot enough values, we’ll likely find that the
distribution of values forms some variant of a bell-shaped
curve. This curve can assume various shapes. However, in
a normal curve, statisticians have determined that about
68.2% of the values will be within 1 standard deviation of
the mean, about 95.5% will be within 2 standard deviations,
and 99.7% will be within 3 standard deviations.
Standard Deviation (Contd..)
Specification limit
One of two values (lower and upper)
that indicate the boundaries of
acceptable or tolerated values for a
process.
Draw and interpret
a bar graph
› Write an appropriate title.
› Make appropriate labels for bars and scale. The
intervals should be equally spaced and include the
smallest and largest values.
› Draw horizontal or vertical bars to represent the
data. Bars should be of uniform width.
› Make additional notes as appropriate to aid
interpretation.
38
Here’s an example.
Sales Volume
2001-2004
0 10 20 30 40 50
Product 1
Product 2
Product 3
Thousands of Units
2004
2003
2002
2001
Interpret and draw
a line graph
› Write an appropriate title.
› Make and label appropriate horizontal and
vertical scales, each with equally spaced
intervals. Often, the horizontal scale represents
time.
› Use points to locate data on the graph.
› Connect data points with line segments or a
smooth curve.
40
Here’s an example.
First Semester Sales
0
20
40
60
80
100
Jan Feb Mar Apr May Jun
Thousands
of
$
Judy Denise Linda Wally
Interpret and draw
a circle graph (Pie-Graph).
› Write an appropriate title.
› Find the sum of values in the data set.
› Represent each value as a fraction or decimal
part of the sum of values.
› For each fraction, find the number of degrees in
the sector of the circle to be represented by the
fraction or decimal. (100% = 360°)
› Label each sector of the circle as appropriate.
Here’s an example.
Local Daycare Market Share
43%
35%
16%
6%
Teddy Bear
La La Land
Little Gems
Other
Make and interpret a frequency distribution.
› Identify appropriate intervals for the data.
› Tally the data for the intervals.
› Count the number in each interval.
0
10
20
30
40
50
60
70
80
90
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
East
West
North
Key Terms
› Class intervals: special categories for grouping
the values in a data set.
› Tally: a mark that is used to count data in class
intervals.
› Class frequency: the number of tallies or
values in a class interval.
› Grouped frequency distribution: a compilation
of class intervals, tallies, and class frequencies
of a data set.
HISTOGRAM
Histogram is a graphical representation of a frequency
distribution which is a summary of variation in a
product or process.
Dr W.A.Shewart, a physicist from Bell Laboratories
explained about variations in 1931 in his publication of
“Economic Control of Quality of Manufactured
product”.
Histogram is basically a graphical presentation of a
series of measurements grouped into continuous
classes or intervals.
45
40
35
30
25
20
15
10
5
0 0
AGE OF CRIMINAL
10-15
15- 20
20-25
30-35
35-40
40-45
45-50
No
of
Crimes
50-55
DISTRIBUTION
› Location (Process level or centering)
› Spread or dispersion (Range of values from smallest
to largest)
› Shape (Pattern of variation, whether symmetrical or
skewed etc.)
While individual measured values may all be different, as a
group they tend to exhibit a pattern. This is called
distribution which can be described by:
Distribution of Data
› Normal distributions › Skewed distribution
Spread
A
Original Process
Increase in spread
with same location
A -
B -
B
Change in process variation
Pattern is skewed
Original symmetrical
pattern
B –
A -
Shape
Change in pattern of variation
In the figure Change in pattern of
variation the Original pattern (A) is
symmetrical but the new pattern (B) is
skewed. Even though the centering is
the same, the shapes or patterns are
different.
STABILITY
If the process characterised by distribution remains unchanged over a period
of time, then the process is said to be Stable and Repeatable. This can be
understood from the following depiction of process over a period of time, see
the figure below:
Stable and repeatable process
This pattern results when only common causes are present in the process.
Time
Target
COMMON CAUSES
The common causes are minute and many and are
individually not measurable. The pattern resulting from
the influence of common causes is called “State of
statistical control” or sometimes, just “In control”.
It is called statistical because the variation can be
described by statistical laws. It only common causes are
present and do not change, the output of a process is
predictable.
The advantages of maintaining a state of statistical
control are:
› Variation (inherent) is restricted to common causes.
› Since variability exhibits a regularity in its pattern,
process is repeatable.
› Since process is repeatable, quality of future
production can be predicted.
However, process level and variation may change due
to influence of causes additional to common causes.
Such causes are called special causes.
Special Causes
Examples of special causes are changes in setting, operator, material
input, etc. When they occur, they make the (overall) process distribution
change. Unless they are arrested, they will continue to affect the process
output in unpredictable ways as shown below:
Original
process
Time
Shift in process level
Shift in process
level and variation
Increase in variation
Unstable Process
Changes in process pattern due to
special causes can be either
detrimental or beneficial. When
detrimental, they need to be
identified and eliminated. When
beneficial, they need to be
perpetuated by making them a
permanent part of the process.
PROCESS CONTROL
This is the state where only common causes
are present. The proof of this situation is
when the pattern of variation conforms to the
statistical normal distribution.
It involves continuous monitoring of the
process for special causes and eliminating
them. Evidence of special causes is provided
by systematic patterns in process variability.
PROCESS CAPABILITY
A process should not only be in control but
also satisfactory in the sense that all the
production should meet specification
requirements.
This ability of a process to produce within the
variation permitted by tolerance is called
process capability.
LSL USL
Process not in control and not capable
Process is capable but not in control
because process level is not properly
centered
Process is in control but not satisfactory
Process is in control (stable) and capable
Process with reference to
specification limits
The above can be used to classify a process based on capability and control.
© Wiley 2007
Process Capability
› Product Specifications
– Preset product or service dimensions, tolerances
– e.g. bottle fill might be 16 oz. ±.2 oz. (15.8oz.-16.2oz.)
– Based on how product is to be used or what the customer expects
› Process Capability – Cp and Cpk
– Assessing capability involves evaluating process variability relative to
preset product or service specifications
– Cp assumes that the process is centered in the specification range
– Cpk helps to address a possible lack of centering of the process
6σ
LSL
USL
width
process
width
ion
specificat
Cp








 


3σ
LSL
μ
,
3σ
μ
USL
min
Cpk
Process capability…. (contd.)
The goal of Six Sigma is to reduce the standard deviation
of your process variation to the point that six standard
deviations (six sigma) can fit within your specification limits.
“The capability index (CP) of a process is usually expressed
as process width (the difference between USL & LSL)
divided by six times the standard deviation (six sigma) of
the process:
CP = USL – LSL / 6
The higher your CP, the less variation in your process.
There’s a second process capability index, CPK. In essence, this splits
the process capability of CP into two values.
CPK = the lesser of these two calculations:
USL – mean / 3 or mean – LSL / 3
In addition to the lower and upper specification limits, there’s another
pair of limits that should be plotted for any process – the lower control
limit (LCL) and the upper control limit (UCL). These values mark the
minimum and maximum inherent limits of the process, based on data
collected from the process. If the control limits are within the
specification limits or align with them, then the process is considered to
be capable of meeting the specifications. If either or both of the control
limits are outside the specification limits, then the process is considered
incapable of meeting the specifications.
Process capability…. (contd.)
Relationship between Process Variability
and Specification Width
› Three possible ranges for Cp
– Cp = 1, as in Fig. (a), process
variability just meets specifications
– Cp ≤ 1, as in Fig. (b), process not
capable of producing within specifications
– Cp ≥ 1, as in Fig. (c), process
exceeds minimal specifications
› One shortcoming, Cp assumes that the
process is centered on the specification
range
› Cp=Cpk when process is centered
Computing the Cp Value at Cocoa Fizz: three bottling machines are being
evaluated for possible use at the Fizz plant. The machines must be capable of
meeting the design specification of 15.8-16.2 oz. with at least a process capability
index of 1.0 (Cp≥1)
› The table below shows the information
gathered from production runs on each
machine. Are they all acceptable?
› Solution:
– Machine A
– Machine B
Cp=
– Machine C
Cp=
Machine σ USL-LSL 6σ
A .05 .4 .3
B .1 .4 .6
C .2 .4 1.2
1.33
6(.05)
.4
6σ
LSL
USL
Cp 


Computing the Cpk Value at Cocoa Fizz
› Design specifications call for a
target value of 16.0 ±0.2 OZ.
(USL = 16.2 & LSL = 15.8)
› Observed process output has now
shifted and has a µ of 15.9 and a
σ of 0.1 oz.
› Cpk is less than 1, revealing that
the process is not capable
.33
.3
.1
Cpk
3(.1)
15.8
15.9
,
3(.1)
15.9
16.2
min
Cpk









 


When we start making efforts, many of the chance causes, which were
persisting, now start disappearing and improvement start coming in. This will
help to reduce the present spread of  3 ‘’ to lesser and lesser span as
shown in the picture below:
(- 6)
Spec. Limit (T)
(+ 6)
Spec. Mean
(3) (3)
Six Sigma concept also professes similar idea
with certain approach changes.
With Six Sigma strategy an organisation can
achieve an incredible level of efficiency i.e. the
defects level can be brought down to a level of 3.4
parts per million.
±6 Sigma versus ± 3 Sigma
› Motorola coined “Six-sigma” to
describe their higher quality efforts
back in 1980’s
› Six-sigma quality standard is now a
benchmark in many industries
(including services)
– Before design, marketing ensures
customer product characteristics
– Operations ensures that product
design characteristics can be met
by controlling materials and
processes to 6σ levels
– Other functions like finance and
accounting use 6σ concepts to
control all of their processes
› PPM Defective for ±3σ versus
±6σ quality
Expected Defects listed for six processes with Cp values ranging from 1.00 to 2.00
The relationship between x and y
› Correlation: is there a relationship between 2 variables?
› Regression: how well a certain independent variable predict
dependent variable?
Correlation
› statistical technique that is used to measure and describe
a relationship between two variables (X and Y).
Scatter Diagram is a graphical representation of
relationship between two variables. It can be between
a Cause and Effect and between two causes. It also
reveals the nature of relationship between two
variables and their approximate strength.
Dr.Buxton developed printed graph paper which
spurred the uses of Scatter diagram. In 1837 J.F.W.
Herschat, an Englishman, used Scatter diagram.
In 1950s Dr.K.Ishikawa popularised the use of Scatter
diagram.
SCATTER DIAGRAM
SCATTER DIAGRAM (Contd..)
Let us say that we are interested in find out the orientation angle (measure)
before and after lapping in quartz crystal unit.
Plot the data on the graph.
If the emerging picture is something like this we say that there is a positive
relationship or positive correlation.
90
80
70
60
70
50
40
30
20
10
10 20 30 40 50 60 70 80
90
A
N
G
L
E
A
F
T
E
R
L
A
P
P
I
N
G
ANGLE BEFORE LAPPING
Some examples of series of positive
correlation are:
 Heights and weights;
 Household income and expenditure;
 Amount of rainfall and yield of crops.
SCATTER DIAGRAM (Contd..)
If the picture is slightly spread like this then we say that there is a possibility
of positive correlation
90
80
70
60
70
50
40
30
20
10
10 20 30 40 50 60 70 80
90
A
N
G
L
E
A
F
T
E
R
L
A
P
P
I
N
G
ANGLE BEFORE LAPPING
SCATTER DIAGRAM (Contd..)
If it is like this we can say that there is ‘no correlation’ between them.
90
80
70
60
70
50
40
30
20
10
10 20 30 40 50 60 70 80
90
A
N
G
L
E
A
F
T
E
R
L
A
P
P
I
N
G
ANGLE BEFORE LAPPING
SCATTER DIAGRAM (Contd..)
Some times emerging diagram can be like this, then we can say that there is
a possibility of negative correlation.
90
80
70
60
70
50
40
30
20
10
10 20 30 40 50 60 70 80
90
A
N
G
L
E
A
F
T
E
R
L
A
P
P
I
N
G
ANGLE BEFORE LAPPING
SCATTER DIAGRAM (Contd..)
If it is like this, we can say that there is a negative relationship or negative
correlation between the two variables
90
80
70
60
70
50
40
30
20
10
10 20 30 40 50 60 70 80
90
A
N
G
L
E
A
F
T
E
R
L
A
P
P
I
N
G
ANGLE BEFORE LAPPING
Some examples of series of negative
correlation are:
 Volume and pressure of perfect gas;
 Current and resistance [keeping the
voltage constant] (R =V / I) ;
 Price and demand of goods.
SCATTER DIAGRAM (Contd..)
Sometimes we may have Scatter like this also i.e. positive correlation upto
certain level and then negative.
90
80
70
60
70
50
40
30
20
10
10 20 30 40 50 60 70 80
90
A
N
G
L
E
A
F
T
E
R
L
A
P
P
I
N
G
ANGLE BEFORE LAPPING
SCATTER DIAGRAM (Contd..)
It can be vice versa also i.e. negative correlation upto a particular level and
then positive.
90
80
70
60
70
50
40
30
20
10
10 20 30 40 50 60 70 80
90
A
N
G
L
E
A
F
T
E
R
L
A
P
P
I
N
G
ANGLE BEFORE LAPPING
The Coefficient of Correlation
One of the most widely used statistics is the coefficient of
correlation ‘r’ which measures the degree of association
between the two values of related variables given in the data
set.
• It takes values from + 1 to – 1.
• If two sets or data have r = +1, they are said to be
perfectly correlated positively .
• If r = -1 they are said to be perfectly correlated
negatively; and if r = 0 they are uncorrelated.
Regression
› Correlation tells you if there is an association between x and y
but it doesn’t describe the relationship or allow you to predict
one variable from the other.
› To do this we need REGRESSION!
Regression
› Is the statistical technique for finding the best-fitting straight
line for a set of data.
› To find the line that best describes the relationship for a
set of X and Y data.
Regression Analysis
› Question asked: Given one variable, can we predict
values of another variable?
› Examples: Given the weight of a person, can we predict
how tall he/she is; given the IQ of a person, can we
predict their performance in statistics; given the
basketball team’s wins, can we predict the extent of a
riot. ...
Best-fit Line
= ŷ, predicted value
› Aim of linear regression is to fit a straight line, ŷ = ax + b, to data that gives best
prediction of y for any value of x
› This will be the line that
minimises distance between
data and fitted line, i.e.
the residuals
intercept
ε
ŷ = ax + b
ε = residual
error
= y i , true value
slope
Regression Equation
Suppose we have a sample of size ‘n’ and it has two sets
of measures, denoted by x and y. We can predict the values
of ‘y’ given the values of ‘x’ by using the equation, called the
regression equation.
y* = a + bx
where the coefficients a and b are given by
The symbol y* refers to the predicted value of y from a given
value of x from the regression equation.
Example
› Local tennis club charges $5 per hour plus an annual
membership fee of $25.
› Compute the total cost of playing tennis for 10 hours
per month.
(predicted cost) Y = (constant) bX + (constant) a
When X = 10
Y= $5(10 hrs) + $25
Y = 75
When X = 30
Y= $5(30 hrs) + $25
Y = $175
Why Learn Probability?
› Nothing in life is certain. In everything we do, we
gauge the chances of successful outcomes, from
business to medicine to the weather
› A probability provides a quantitative description of
the chances or likelihoods associated with various
outcomes
› It provides a bridge between descriptive and
inferential statistics
Population Sample
Probability
Statistics
Probabilistic vs Statistical Reasoning
› Suppose I know exactly the proportions of car makes in
California. Then I can find the probability that the first car
I see in the street is a Ford. This is probabilistic reasoning
as I know the population and predict the sample
› Now suppose that I do not know the proportions of car
makes in California, but would like to estimate them. I
observe a random sample of cars in the street and then I
have an estimate of the proportions of the population.
This is statistical reasoning
What is Probability?
› We measure “how often” using
Relative frequency = f/n
Sample
And “How often”
= Relative frequency
Population
Probability
• As n gets larger,
Basic Concepts
› An experiment is the process by which an
observation (or measurement) is obtained.
› An event is an outcome of an experiment,
usually denoted by a capital letter.
– The basic element to which probability is applied
– When an experiment is performed, a particular event
either happens, or it doesn’t!
Experiments and Events
› Experiment: Record an age
– A: person is 30 years old
– B: person is older than 65
› Experiment: Toss a die
– A: observe an odd number
– B: observe a number greater than 2
Basic Concepts
› Two events are mutually exclusive if, when one
event occurs, the other cannot, and vice versa.
•Experiment: Toss a die
–A: observe an odd number
–B: observe a number greater than 2
–C: observe a 6
–D: observe a 3
Not Mutually
Exclusive
Mutually
Exclusive
B and C?
B and D?
Basic Concepts
› An event that cannot be decomposed is
called a simple event.
› Denoted by E with a subscript.
› Each simple event will be assigned a
probability, measuring “how often” it
occurs.
› The set of all simple events of an
experiment is called the sample space,
S.
Example
›The die toss:
›Simple events: Sample space:
1
2
3
4
5
6
E1
E2
E3
E4
E5
E6
S ={E1, E2, E3, E4, E5, E6}
S
•E1
•E6
•E2
•E3
•E4
•E5
Basic Concepts
› An event is a collection of one or more simple
events.
•The die toss:
–A: an odd number
–B: a number > 2
S
A ={E1, E3, E5}
B ={E3, E4, E5,E6}
B
A
•E1
•E6
•E2
•E3
•E4
•E5
The Probability
of an Event
› The probability of an event A measures “how
often” A will occur. We write P(A).
› Suppose that an experiment is performed n
times. The relative frequency for an event A is
n
f
n

occurs
A
times
of
Number
n
f
A
P
n
lim
)
(


• If we let n get infinitely large,
The Probability
of an Event
› P(A) must be between 0 and 1.
– If event A can never occur, P(A) = 0. If event A
always occurs when the experiment is performed,
P(A) =1.
› The sum of the probabilities for all simple
events in S equals 1.
• The probability of an event A is
found by adding the probabilities of
all the simple events contained in A.
– Suppose that 10% of the U.S.
population has red hair. Then for a
person selected at random,
Finding Probabilities
› Probabilities can be found using
– Estimates from empirical studies
– Common sense estimates based on equally likely
events.
P(Head) = 1/2
P(Red hair) = .10
• Examples:
–Toss a fair coin.
Using Simple Events
› The probability of an event A is equal to the
sum of the probabilities of the simple events
contained in A
› If the simple events in an experiment are
equally likely, you can calculate
events
simple
of
number
total
A
in
events
simple
of
number
)
( 

N
n
A
P A
Example
Toss a fair coin twice. What is the
probability of observing at least one head?
H
1st Coin 2nd Coin E
P(Ei)
H
T
T
H
T
HH
HT
TH
TT
1/4
1/4
1/4
1/4
P(at least 1 head)
= P(E1) + P(E2) + P(E3)
= 1/4 + 1/4 + 1/4 = 3/4

More Related Content

Similar to Statistics for 6 Sigma.pptx

STATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptxSTATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptxMuhammadNafees42
 
Engineering Statistics
Engineering Statistics Engineering Statistics
Engineering Statistics Bahzad5
 
Data Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionData Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionDerek Kane
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptxIndhuGreen
 
Standard deviation
Standard deviationStandard deviation
Standard deviationM K
 
Basic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptxBasic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptxAnusuya123
 
Describing quantitative data with numbers
Describing quantitative data with numbersDescribing quantitative data with numbers
Describing quantitative data with numbersUlster BOCES
 
MMW (Data Management)-Part 1 for ULO 2 (1).pptx
MMW (Data Management)-Part 1 for ULO 2 (1).pptxMMW (Data Management)-Part 1 for ULO 2 (1).pptx
MMW (Data Management)-Part 1 for ULO 2 (1).pptxPETTIROSETALISIC
 
CABT Math 8 measures of central tendency and dispersion
CABT Math 8   measures of central tendency and dispersionCABT Math 8   measures of central tendency and dispersion
CABT Math 8 measures of central tendency and dispersionGilbert Joseph Abueg
 
Statistics And Correlation
Statistics And CorrelationStatistics And Correlation
Statistics And Correlationpankaj prabhakar
 
A General Manger of Harley-Davidson has to decide on the size of a.docx
A General Manger of Harley-Davidson has to decide on the size of a.docxA General Manger of Harley-Davidson has to decide on the size of a.docx
A General Manger of Harley-Davidson has to decide on the size of a.docxevonnehoggarth79783
 
ANALYSIS ANDINTERPRETATION OF DATA Analysis and Interpr.docx
ANALYSIS ANDINTERPRETATION  OF DATA Analysis and Interpr.docxANALYSIS ANDINTERPRETATION  OF DATA Analysis and Interpr.docx
ANALYSIS ANDINTERPRETATION OF DATA Analysis and Interpr.docxcullenrjzsme
 
introduction to biostat, standard deviation and variance
introduction to biostat, standard deviation and varianceintroduction to biostat, standard deviation and variance
introduction to biostat, standard deviation and varianceamol askar
 
1.0 Descriptive statistics.pdf
1.0 Descriptive statistics.pdf1.0 Descriptive statistics.pdf
1.0 Descriptive statistics.pdfthaersyam
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsAnand Thokal
 
Topic-1-Review-of-Basic-Statistics.pptx
Topic-1-Review-of-Basic-Statistics.pptxTopic-1-Review-of-Basic-Statistics.pptx
Topic-1-Review-of-Basic-Statistics.pptxJohnLester81
 
Basic knowledge on statistics
Basic knowledge on statisticsBasic knowledge on statistics
Basic knowledge on statisticsSubodh Khanal
 
3. Statistical Analysis.pptx
3. Statistical Analysis.pptx3. Statistical Analysis.pptx
3. Statistical Analysis.pptxjeyanthisivakumar
 
ANALYSIS OF DATA ANALYSIS TOOLS IN RESEARCH PPT
ANALYSIS OF DATA ANALYSIS TOOLS IN RESEARCH  PPTANALYSIS OF DATA ANALYSIS TOOLS IN RESEARCH  PPT
ANALYSIS OF DATA ANALYSIS TOOLS IN RESEARCH PPTsweetymitra4
 

Similar to Statistics for 6 Sigma.pptx (20)

STATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptxSTATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
 
Engineering Statistics
Engineering Statistics Engineering Statistics
Engineering Statistics
 
Data Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionData Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model Selection
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 
data
datadata
data
 
Standard deviation
Standard deviationStandard deviation
Standard deviation
 
Basic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptxBasic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptx
 
Describing quantitative data with numbers
Describing quantitative data with numbersDescribing quantitative data with numbers
Describing quantitative data with numbers
 
MMW (Data Management)-Part 1 for ULO 2 (1).pptx
MMW (Data Management)-Part 1 for ULO 2 (1).pptxMMW (Data Management)-Part 1 for ULO 2 (1).pptx
MMW (Data Management)-Part 1 for ULO 2 (1).pptx
 
CABT Math 8 measures of central tendency and dispersion
CABT Math 8   measures of central tendency and dispersionCABT Math 8   measures of central tendency and dispersion
CABT Math 8 measures of central tendency and dispersion
 
Statistics And Correlation
Statistics And CorrelationStatistics And Correlation
Statistics And Correlation
 
A General Manger of Harley-Davidson has to decide on the size of a.docx
A General Manger of Harley-Davidson has to decide on the size of a.docxA General Manger of Harley-Davidson has to decide on the size of a.docx
A General Manger of Harley-Davidson has to decide on the size of a.docx
 
ANALYSIS ANDINTERPRETATION OF DATA Analysis and Interpr.docx
ANALYSIS ANDINTERPRETATION  OF DATA Analysis and Interpr.docxANALYSIS ANDINTERPRETATION  OF DATA Analysis and Interpr.docx
ANALYSIS ANDINTERPRETATION OF DATA Analysis and Interpr.docx
 
introduction to biostat, standard deviation and variance
introduction to biostat, standard deviation and varianceintroduction to biostat, standard deviation and variance
introduction to biostat, standard deviation and variance
 
1.0 Descriptive statistics.pdf
1.0 Descriptive statistics.pdf1.0 Descriptive statistics.pdf
1.0 Descriptive statistics.pdf
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Topic-1-Review-of-Basic-Statistics.pptx
Topic-1-Review-of-Basic-Statistics.pptxTopic-1-Review-of-Basic-Statistics.pptx
Topic-1-Review-of-Basic-Statistics.pptx
 
Basic knowledge on statistics
Basic knowledge on statisticsBasic knowledge on statistics
Basic knowledge on statistics
 
3. Statistical Analysis.pptx
3. Statistical Analysis.pptx3. Statistical Analysis.pptx
3. Statistical Analysis.pptx
 
ANALYSIS OF DATA ANALYSIS TOOLS IN RESEARCH PPT
ANALYSIS OF DATA ANALYSIS TOOLS IN RESEARCH  PPTANALYSIS OF DATA ANALYSIS TOOLS IN RESEARCH  PPT
ANALYSIS OF DATA ANALYSIS TOOLS IN RESEARCH PPT
 

Recently uploaded

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

Statistics for 6 Sigma.pptx

  • 2. Why a 6 sigma practioner needs to Know about Statistics › To be able to effectively conduct 6 sigma investigation. Without the use of statistics it would be very difficult to make decisions based on the data collected . › To further develop critical and analytic thinking skills. › To act as an informed investigator. › To know how to properly analyze information › To know how to draw conclusions about populations based on sample information
  • 3. Key Definitions › A population (universe) is the collection of things under consideration › A sample is a portion of the population selected for analysis › A parameter is a summary measure computed to describe a characteristic of the population › A statistic is a summary measure computed to describe a characteristic of the sample
  • 4. Population and Sample Population Sample Use parameters to summarize features Use statistics to summarize features Inference on the population from the sample
  • 5. Statistical Methods › Descriptive statistics – Collecting and describing data › Inferential statistics – Drawing conclusions and/or making decisions concerning a population based only on sample data
  • 6. Descriptive Statistics › Collect data – e.g. Survey › Present data – e.g. Tables and graphs › Characterize data – e.g. Sample mean = i X n 
  • 7. Why We Need Data › To provide input to survey › To provide input to study › To measure performance of service or production process › To evaluate conformance to standards › To assist in formulating alternative courses of action › To satisfy curiosity
  • 8. Data Sources Primary Data Collection Secondary Data Compilation Observation Experimentation Survey Print or Electronic
  • 9. Statistical Inquiry Primary and Secondary Data The difference between the primary and the secondary data is only one of degree of detachment with the original source. The data which is primary in the hands of one may become secondary in the hands of others. For example, if it is desired to conduct an investigation into the working conditions or workers of textile mills, the facts collected by the investigators directly from the workers themselves would be termed as the primary data. But if the information is obtained from a report prepared by the labour department of the Government, will be called secondary data.
  • 10. Types of Data Categorical (Qualitative) Discrete Continuous Numerical (Quantitative) Data
  • 11. Key Terms › Measures of central tendency: statistical measurements such as the mean, median or mode that indicate how data groups toward the center. › Measures of variation or dispersion: statistical measurement such as the range and standard deviation that indicate how data is dispersed or spread.
  • 12. Measures of Central Tendency › Find the mean › Find the median › Find the mode 0 1 2 3 4 5 6 7 8 9
  • 13. › Mean: the arithmetic average of a set of data or sum of the values divided by the number of values. › Median: the middle value of a data set when the values are arranged in order of size. › Mode: the value or values that occur most frequently in a data set. Key Terms
  • 14. Find the mean of a data set. 1. Find the sum of the values. 2. Divide the sum by the total number of values. Mean = sum of values number of values
  • 15. Here’s an example. Sales figures for the last week for the Western region have been as follows: › Monday Rs 4,200 › Tuesday Rs 3,980 › Wednesday Rs 2,400 › Thursday Rs 3,100 › Friday Rs 4,600 › What is the average daily sales figure? › Rs 3,656
  • 16. Try these examples. › Mileage for the new salesperson has been 243, 567, 766, 422 and 352 this week. What is the average number of miles traveled? › 470 miles daily › Prices from different suppliers of 500 sheets of copier paper are as follows: Rs 399, Rs 475, Rs 375 and Rs 425. What is the average price? › Rs 419
  • 17. Find the median. › Arrange the values in the data set from smallest to largest (or largest to smallest) and select the value in the middle. › If the number of values is odd, it will be exactly in the middle. › If the number of values is even, identify the two middle values. Add them together and divide by two.
  • 18. Here is an example. › A recent survey of the used car market for the particular model John was looking for yielded several different prices. Find the median price. › $9,400, $11,200, $5,900, $10,000, $4,700, $8,900, $7,800 and $9,200. › Arrange from highest to lowest: $11,200, $10,000, $9,400, $9,200, $8,900, $7,800, $5,900 and $4,700. › Calculate the average of the two middle values. › $9050 is the median price.
  • 19. Try this example. › Five local moving companies quoted the following prices to Bob’s Best Company: $4,900, $3800, $2,700, $4,400 and $3,300. Find the median price. › $3,800
  • 20. Find the mode. › Find the mode in a data set by counting the number of times each value occurs. › Identify the value or values that occur most frequently. › There may be more than one mode if the same value occurs the same number of times as another value. › If no one value appears more than once, there is no mode.
  • 21. Find the mode in this data set. › Results of a placement test in mathematics included the following scores: 65, 80, 90, 85, 95, 85, 80, 70 and 80. › Which score occurred the most frequently? › 80 is the mode. It appeared three times.
  • 22. › Range: the difference between the highest and lowest values in a data set. (also called the spread) › Deviation from the mean: the difference between a value of a data set and the mean. › Standard variation: a statistical measurement that shows how data is spread above and below the mean. Key Terms
  • 23. › Variance: a statistical measurement that is the average of the squared deviations of data from the mean. The square root of the variance is the standard deviation. › Square root: the quotient of number which is the product of that number multiplied by itself. The square root of 81 is 9. (9 x 9 = 81) › Normal distribution: a characteristic of many data sets that shows that data graphs into a bell-shaped curve around the mean. Key Terms
  • 24. Find the range in a data set › Find the highest and lowest values. › Find the difference between the two. › Example: The grades on the last exam were 78, 99, 87, 84, 60, 77, 80, 88, 92, and 94. The highest value is 99. The lowest value is 60. The difference or the range is 39.
  • 25. Find the standard deviation › The deviation from the mean of a data value is the difference between the value and the mean. › Get a clearer picture of the data set by examining how much each data point differs or deviates from the mean.
  • 26. Deviations from the mean › When the value is smaller than the mean, the difference is represented by a negative number indicating it is below or less than the mean. › Conversely, if the value is greater than the mean, the difference is represented by a positive number indicating it is above or greater than the mean.
  • 27. Find the deviation from the mean. › Find the mean of a set of data. › Mean = Sum of data values Number of values › Find the amount that each data value deviates or is different from the mean. › Deviation from the mean = Data value - Mean
  • 28. Here’s an example. › Data set: 38, 43, 45, 44 › Mean = 42.5 › First value: 38 – 42.5 = -4.5 below the mean › Second value: 43 – 42.5 = 0.5 above the mean › Third value: 45 – 42.5 = 2.5 above the mean › Fourth value: 44 – 42.5 = 1.5 above the mean
  • 29. Interpret the information › One value is below the mean and its deviation is -4.5. › Three values are above the mean and the sum of those deviations is 4.5. › The sum of all deviations from the mean is zero. This is true of all data sets. › We have not gained any statistical insight or new information by analyzing the sum of the deviations from the mean.
  • 30. Average deviation › Average deviation = Sum of deviations = 0 = 0 Number of values n
  • 31. Find the standard deviation of a set of data. › A statistical measure called the standard deviation uses the square of each deviation from the mean. › The square of a negative value is always positive. › The squared deviations are averaged (mean) and the result is called the variance.
  • 32. › The square root is taken of the variance so that the result can be interpreted within the context of the problem. › This formula averages the values by dividing the number of values (n). › Several calculations are necessary and are best organized in a table. Find the standard deviation of a set of data.
  • 33. 1. Find the mean. 2. Find the deviation of each value from the mean. 3. Square each deviation. 4. Find the sum of the squared deviations. 5. Divide the sum of the squared deviations by the number of values in the data set. This amount is called the variance. 6. Find the standard deviation by taking the square root of the variance. Find the standard deviation of a set of data.
  • 34. Standard Deviation Standard deviation measures variation of values from the mean, using the following formula:    (x – x )2 n Where  = sum of, X = observed values, X bar (X with a line over the top) = arithmetic mean, and n = number of observations.
  • 35. Standard Deviation (Contd..) Average difference between any value in a series of values and the mean of all the values in that series. This statistic is a measure of the variation in a distribution of values. If we plot enough values, we’ll likely find that the distribution of values forms some variant of a bell-shaped curve. This curve can assume various shapes. However, in a normal curve, statisticians have determined that about 68.2% of the values will be within 1 standard deviation of the mean, about 95.5% will be within 2 standard deviations, and 99.7% will be within 3 standard deviations.
  • 36. Standard Deviation (Contd..) Specification limit One of two values (lower and upper) that indicate the boundaries of acceptable or tolerated values for a process.
  • 37. Draw and interpret a bar graph › Write an appropriate title. › Make appropriate labels for bars and scale. The intervals should be equally spaced and include the smallest and largest values. › Draw horizontal or vertical bars to represent the data. Bars should be of uniform width. › Make additional notes as appropriate to aid interpretation.
  • 38. 38 Here’s an example. Sales Volume 2001-2004 0 10 20 30 40 50 Product 1 Product 2 Product 3 Thousands of Units 2004 2003 2002 2001
  • 39. Interpret and draw a line graph › Write an appropriate title. › Make and label appropriate horizontal and vertical scales, each with equally spaced intervals. Often, the horizontal scale represents time. › Use points to locate data on the graph. › Connect data points with line segments or a smooth curve.
  • 40. 40 Here’s an example. First Semester Sales 0 20 40 60 80 100 Jan Feb Mar Apr May Jun Thousands of $ Judy Denise Linda Wally
  • 41. Interpret and draw a circle graph (Pie-Graph). › Write an appropriate title. › Find the sum of values in the data set. › Represent each value as a fraction or decimal part of the sum of values. › For each fraction, find the number of degrees in the sector of the circle to be represented by the fraction or decimal. (100% = 360°) › Label each sector of the circle as appropriate.
  • 42. Here’s an example. Local Daycare Market Share 43% 35% 16% 6% Teddy Bear La La Land Little Gems Other
  • 43. Make and interpret a frequency distribution. › Identify appropriate intervals for the data. › Tally the data for the intervals. › Count the number in each interval. 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North
  • 44. Key Terms › Class intervals: special categories for grouping the values in a data set. › Tally: a mark that is used to count data in class intervals. › Class frequency: the number of tallies or values in a class interval. › Grouped frequency distribution: a compilation of class intervals, tallies, and class frequencies of a data set.
  • 45. HISTOGRAM Histogram is a graphical representation of a frequency distribution which is a summary of variation in a product or process. Dr W.A.Shewart, a physicist from Bell Laboratories explained about variations in 1931 in his publication of “Economic Control of Quality of Manufactured product”. Histogram is basically a graphical presentation of a series of measurements grouped into continuous classes or intervals.
  • 46. 45 40 35 30 25 20 15 10 5 0 0 AGE OF CRIMINAL 10-15 15- 20 20-25 30-35 35-40 40-45 45-50 No of Crimes 50-55
  • 47. DISTRIBUTION › Location (Process level or centering) › Spread or dispersion (Range of values from smallest to largest) › Shape (Pattern of variation, whether symmetrical or skewed etc.) While individual measured values may all be different, as a group they tend to exhibit a pattern. This is called distribution which can be described by:
  • 48. Distribution of Data › Normal distributions › Skewed distribution
  • 49. Spread A Original Process Increase in spread with same location A - B - B Change in process variation Pattern is skewed Original symmetrical pattern B – A - Shape Change in pattern of variation
  • 50. In the figure Change in pattern of variation the Original pattern (A) is symmetrical but the new pattern (B) is skewed. Even though the centering is the same, the shapes or patterns are different.
  • 51. STABILITY If the process characterised by distribution remains unchanged over a period of time, then the process is said to be Stable and Repeatable. This can be understood from the following depiction of process over a period of time, see the figure below: Stable and repeatable process This pattern results when only common causes are present in the process. Time Target
  • 52. COMMON CAUSES The common causes are minute and many and are individually not measurable. The pattern resulting from the influence of common causes is called “State of statistical control” or sometimes, just “In control”. It is called statistical because the variation can be described by statistical laws. It only common causes are present and do not change, the output of a process is predictable.
  • 53. The advantages of maintaining a state of statistical control are: › Variation (inherent) is restricted to common causes. › Since variability exhibits a regularity in its pattern, process is repeatable. › Since process is repeatable, quality of future production can be predicted. However, process level and variation may change due to influence of causes additional to common causes. Such causes are called special causes.
  • 54. Special Causes Examples of special causes are changes in setting, operator, material input, etc. When they occur, they make the (overall) process distribution change. Unless they are arrested, they will continue to affect the process output in unpredictable ways as shown below: Original process Time Shift in process level Shift in process level and variation Increase in variation Unstable Process
  • 55. Changes in process pattern due to special causes can be either detrimental or beneficial. When detrimental, they need to be identified and eliminated. When beneficial, they need to be perpetuated by making them a permanent part of the process.
  • 56. PROCESS CONTROL This is the state where only common causes are present. The proof of this situation is when the pattern of variation conforms to the statistical normal distribution. It involves continuous monitoring of the process for special causes and eliminating them. Evidence of special causes is provided by systematic patterns in process variability.
  • 57. PROCESS CAPABILITY A process should not only be in control but also satisfactory in the sense that all the production should meet specification requirements. This ability of a process to produce within the variation permitted by tolerance is called process capability.
  • 58. LSL USL Process not in control and not capable Process is capable but not in control because process level is not properly centered Process is in control but not satisfactory Process is in control (stable) and capable Process with reference to specification limits The above can be used to classify a process based on capability and control.
  • 59. © Wiley 2007 Process Capability › Product Specifications – Preset product or service dimensions, tolerances – e.g. bottle fill might be 16 oz. ±.2 oz. (15.8oz.-16.2oz.) – Based on how product is to be used or what the customer expects › Process Capability – Cp and Cpk – Assessing capability involves evaluating process variability relative to preset product or service specifications – Cp assumes that the process is centered in the specification range – Cpk helps to address a possible lack of centering of the process 6σ LSL USL width process width ion specificat Cp             3σ LSL μ , 3σ μ USL min Cpk
  • 60. Process capability…. (contd.) The goal of Six Sigma is to reduce the standard deviation of your process variation to the point that six standard deviations (six sigma) can fit within your specification limits. “The capability index (CP) of a process is usually expressed as process width (the difference between USL & LSL) divided by six times the standard deviation (six sigma) of the process: CP = USL – LSL / 6 The higher your CP, the less variation in your process.
  • 61. There’s a second process capability index, CPK. In essence, this splits the process capability of CP into two values. CPK = the lesser of these two calculations: USL – mean / 3 or mean – LSL / 3 In addition to the lower and upper specification limits, there’s another pair of limits that should be plotted for any process – the lower control limit (LCL) and the upper control limit (UCL). These values mark the minimum and maximum inherent limits of the process, based on data collected from the process. If the control limits are within the specification limits or align with them, then the process is considered to be capable of meeting the specifications. If either or both of the control limits are outside the specification limits, then the process is considered incapable of meeting the specifications. Process capability…. (contd.)
  • 62. Relationship between Process Variability and Specification Width › Three possible ranges for Cp – Cp = 1, as in Fig. (a), process variability just meets specifications – Cp ≤ 1, as in Fig. (b), process not capable of producing within specifications – Cp ≥ 1, as in Fig. (c), process exceeds minimal specifications › One shortcoming, Cp assumes that the process is centered on the specification range › Cp=Cpk when process is centered
  • 63. Computing the Cp Value at Cocoa Fizz: three bottling machines are being evaluated for possible use at the Fizz plant. The machines must be capable of meeting the design specification of 15.8-16.2 oz. with at least a process capability index of 1.0 (Cp≥1) › The table below shows the information gathered from production runs on each machine. Are they all acceptable? › Solution: – Machine A – Machine B Cp= – Machine C Cp= Machine σ USL-LSL 6σ A .05 .4 .3 B .1 .4 .6 C .2 .4 1.2 1.33 6(.05) .4 6σ LSL USL Cp   
  • 64. Computing the Cpk Value at Cocoa Fizz › Design specifications call for a target value of 16.0 ±0.2 OZ. (USL = 16.2 & LSL = 15.8) › Observed process output has now shifted and has a µ of 15.9 and a σ of 0.1 oz. › Cpk is less than 1, revealing that the process is not capable .33 .3 .1 Cpk 3(.1) 15.8 15.9 , 3(.1) 15.9 16.2 min Cpk             
  • 65. When we start making efforts, many of the chance causes, which were persisting, now start disappearing and improvement start coming in. This will help to reduce the present spread of  3 ‘’ to lesser and lesser span as shown in the picture below: (- 6) Spec. Limit (T) (+ 6) Spec. Mean (3) (3)
  • 66. Six Sigma concept also professes similar idea with certain approach changes. With Six Sigma strategy an organisation can achieve an incredible level of efficiency i.e. the defects level can be brought down to a level of 3.4 parts per million.
  • 67. ±6 Sigma versus ± 3 Sigma › Motorola coined “Six-sigma” to describe their higher quality efforts back in 1980’s › Six-sigma quality standard is now a benchmark in many industries (including services) – Before design, marketing ensures customer product characteristics – Operations ensures that product design characteristics can be met by controlling materials and processes to 6σ levels – Other functions like finance and accounting use 6σ concepts to control all of their processes › PPM Defective for ±3σ versus ±6σ quality
  • 68. Expected Defects listed for six processes with Cp values ranging from 1.00 to 2.00
  • 69. The relationship between x and y › Correlation: is there a relationship between 2 variables? › Regression: how well a certain independent variable predict dependent variable?
  • 70. Correlation › statistical technique that is used to measure and describe a relationship between two variables (X and Y).
  • 71. Scatter Diagram is a graphical representation of relationship between two variables. It can be between a Cause and Effect and between two causes. It also reveals the nature of relationship between two variables and their approximate strength. Dr.Buxton developed printed graph paper which spurred the uses of Scatter diagram. In 1837 J.F.W. Herschat, an Englishman, used Scatter diagram. In 1950s Dr.K.Ishikawa popularised the use of Scatter diagram. SCATTER DIAGRAM
  • 72. SCATTER DIAGRAM (Contd..) Let us say that we are interested in find out the orientation angle (measure) before and after lapping in quartz crystal unit. Plot the data on the graph. If the emerging picture is something like this we say that there is a positive relationship or positive correlation. 90 80 70 60 70 50 40 30 20 10 10 20 30 40 50 60 70 80 90 A N G L E A F T E R L A P P I N G ANGLE BEFORE LAPPING
  • 73. Some examples of series of positive correlation are:  Heights and weights;  Household income and expenditure;  Amount of rainfall and yield of crops.
  • 74. SCATTER DIAGRAM (Contd..) If the picture is slightly spread like this then we say that there is a possibility of positive correlation 90 80 70 60 70 50 40 30 20 10 10 20 30 40 50 60 70 80 90 A N G L E A F T E R L A P P I N G ANGLE BEFORE LAPPING
  • 75. SCATTER DIAGRAM (Contd..) If it is like this we can say that there is ‘no correlation’ between them. 90 80 70 60 70 50 40 30 20 10 10 20 30 40 50 60 70 80 90 A N G L E A F T E R L A P P I N G ANGLE BEFORE LAPPING
  • 76. SCATTER DIAGRAM (Contd..) Some times emerging diagram can be like this, then we can say that there is a possibility of negative correlation. 90 80 70 60 70 50 40 30 20 10 10 20 30 40 50 60 70 80 90 A N G L E A F T E R L A P P I N G ANGLE BEFORE LAPPING
  • 77. SCATTER DIAGRAM (Contd..) If it is like this, we can say that there is a negative relationship or negative correlation between the two variables 90 80 70 60 70 50 40 30 20 10 10 20 30 40 50 60 70 80 90 A N G L E A F T E R L A P P I N G ANGLE BEFORE LAPPING
  • 78. Some examples of series of negative correlation are:  Volume and pressure of perfect gas;  Current and resistance [keeping the voltage constant] (R =V / I) ;  Price and demand of goods.
  • 79. SCATTER DIAGRAM (Contd..) Sometimes we may have Scatter like this also i.e. positive correlation upto certain level and then negative. 90 80 70 60 70 50 40 30 20 10 10 20 30 40 50 60 70 80 90 A N G L E A F T E R L A P P I N G ANGLE BEFORE LAPPING
  • 80. SCATTER DIAGRAM (Contd..) It can be vice versa also i.e. negative correlation upto a particular level and then positive. 90 80 70 60 70 50 40 30 20 10 10 20 30 40 50 60 70 80 90 A N G L E A F T E R L A P P I N G ANGLE BEFORE LAPPING
  • 81. The Coefficient of Correlation One of the most widely used statistics is the coefficient of correlation ‘r’ which measures the degree of association between the two values of related variables given in the data set. • It takes values from + 1 to – 1. • If two sets or data have r = +1, they are said to be perfectly correlated positively . • If r = -1 they are said to be perfectly correlated negatively; and if r = 0 they are uncorrelated.
  • 82.
  • 83. Regression › Correlation tells you if there is an association between x and y but it doesn’t describe the relationship or allow you to predict one variable from the other. › To do this we need REGRESSION!
  • 84. Regression › Is the statistical technique for finding the best-fitting straight line for a set of data. › To find the line that best describes the relationship for a set of X and Y data.
  • 85. Regression Analysis › Question asked: Given one variable, can we predict values of another variable? › Examples: Given the weight of a person, can we predict how tall he/she is; given the IQ of a person, can we predict their performance in statistics; given the basketball team’s wins, can we predict the extent of a riot. ...
  • 86. Best-fit Line = ŷ, predicted value › Aim of linear regression is to fit a straight line, ŷ = ax + b, to data that gives best prediction of y for any value of x › This will be the line that minimises distance between data and fitted line, i.e. the residuals intercept ε ŷ = ax + b ε = residual error = y i , true value slope
  • 87. Regression Equation Suppose we have a sample of size ‘n’ and it has two sets of measures, denoted by x and y. We can predict the values of ‘y’ given the values of ‘x’ by using the equation, called the regression equation. y* = a + bx where the coefficients a and b are given by The symbol y* refers to the predicted value of y from a given value of x from the regression equation.
  • 88. Example › Local tennis club charges $5 per hour plus an annual membership fee of $25. › Compute the total cost of playing tennis for 10 hours per month. (predicted cost) Y = (constant) bX + (constant) a When X = 10 Y= $5(10 hrs) + $25 Y = 75
  • 89. When X = 30 Y= $5(30 hrs) + $25 Y = $175
  • 90. Why Learn Probability? › Nothing in life is certain. In everything we do, we gauge the chances of successful outcomes, from business to medicine to the weather › A probability provides a quantitative description of the chances or likelihoods associated with various outcomes › It provides a bridge between descriptive and inferential statistics Population Sample Probability Statistics
  • 91. Probabilistic vs Statistical Reasoning › Suppose I know exactly the proportions of car makes in California. Then I can find the probability that the first car I see in the street is a Ford. This is probabilistic reasoning as I know the population and predict the sample › Now suppose that I do not know the proportions of car makes in California, but would like to estimate them. I observe a random sample of cars in the street and then I have an estimate of the proportions of the population. This is statistical reasoning
  • 92. What is Probability? › We measure “how often” using Relative frequency = f/n Sample And “How often” = Relative frequency Population Probability • As n gets larger,
  • 93. Basic Concepts › An experiment is the process by which an observation (or measurement) is obtained. › An event is an outcome of an experiment, usually denoted by a capital letter. – The basic element to which probability is applied – When an experiment is performed, a particular event either happens, or it doesn’t!
  • 94. Experiments and Events › Experiment: Record an age – A: person is 30 years old – B: person is older than 65 › Experiment: Toss a die – A: observe an odd number – B: observe a number greater than 2
  • 95. Basic Concepts › Two events are mutually exclusive if, when one event occurs, the other cannot, and vice versa. •Experiment: Toss a die –A: observe an odd number –B: observe a number greater than 2 –C: observe a 6 –D: observe a 3 Not Mutually Exclusive Mutually Exclusive B and C? B and D?
  • 96. Basic Concepts › An event that cannot be decomposed is called a simple event. › Denoted by E with a subscript. › Each simple event will be assigned a probability, measuring “how often” it occurs. › The set of all simple events of an experiment is called the sample space, S.
  • 97. Example ›The die toss: ›Simple events: Sample space: 1 2 3 4 5 6 E1 E2 E3 E4 E5 E6 S ={E1, E2, E3, E4, E5, E6} S •E1 •E6 •E2 •E3 •E4 •E5
  • 98. Basic Concepts › An event is a collection of one or more simple events. •The die toss: –A: an odd number –B: a number > 2 S A ={E1, E3, E5} B ={E3, E4, E5,E6} B A •E1 •E6 •E2 •E3 •E4 •E5
  • 99. The Probability of an Event › The probability of an event A measures “how often” A will occur. We write P(A). › Suppose that an experiment is performed n times. The relative frequency for an event A is n f n  occurs A times of Number n f A P n lim ) (   • If we let n get infinitely large,
  • 100. The Probability of an Event › P(A) must be between 0 and 1. – If event A can never occur, P(A) = 0. If event A always occurs when the experiment is performed, P(A) =1. › The sum of the probabilities for all simple events in S equals 1. • The probability of an event A is found by adding the probabilities of all the simple events contained in A.
  • 101. – Suppose that 10% of the U.S. population has red hair. Then for a person selected at random, Finding Probabilities › Probabilities can be found using – Estimates from empirical studies – Common sense estimates based on equally likely events. P(Head) = 1/2 P(Red hair) = .10 • Examples: –Toss a fair coin.
  • 102. Using Simple Events › The probability of an event A is equal to the sum of the probabilities of the simple events contained in A › If the simple events in an experiment are equally likely, you can calculate events simple of number total A in events simple of number ) (   N n A P A
  • 103. Example Toss a fair coin twice. What is the probability of observing at least one head? H 1st Coin 2nd Coin E P(Ei) H T T H T HH HT TH TT 1/4 1/4 1/4 1/4 P(at least 1 head) = P(E1) + P(E2) + P(E3) = 1/4 + 1/4 + 1/4 = 3/4