SlideShare a Scribd company logo
1 of 76
DESIGN OF
EXPERIMENTS
By
Dr. Virendra Kumar, (Ph.D IITD)
Email: virendra.kumar@niet.co.in
Web site: https://sites.google.com/view/virendra
Experimental Design (or DOE) economically maximizes information
LECTURE 1
SYLLABUS
Recommended
Books
Textbooks:
1. D.C. Montgomery, Design and Analysis of
Experiments, Wiley India, 5th Edition, 2006, ISBN
– 812651048- X.
2. Madhav S. Phadke, Quality Engineering Using
Robust Design, Prentice Hall PTR, Englewood
Cliffs, New Jersey 07632,1989, ISBN:
0137451679.
Reference Books
1. Robert H. Lochner, Joseph E. Matar, Designing
for Quality - an Introduction Best of Taghuchi and
Western Methods or Statistical Experimental
Design, Chapman and Hall, 1990, ISBN –
0412400200.
2. Philip J. Ross, Taguchi Techniques for Quality
Engineering: Loss Function, Orthogonal
Experiments, Parameter and Tolerance Design,
McGraw-Hill, 2nd Edition, 1996, ISBN:
0070539588.
What is Experiment?
• The term experiment is defined as the systematic
procedure carried out under controlled conditions in
order to discover an unknown effect, to test or establish
a hypothesis, or to illustrate a known effect.
• When analyzing a process, experiments are often used
to evaluate which process inputs have a significant
impact on the process output, and what the target
level of those inputs should be to achieve a desired
result (output).
• Experiments can be designed in many different ways to
collect this information.
• Design of Experiments (DOE) is also referred to
as Designed Experiments or Experimental Design -
all of the terms have the same meaning.
Aim to Design of Experiments
Experimental design can be
used at the point of greatest
leverage to reduce design
costs by speeding up the
design process, reducing late
engineering design changes,
and reducing product material
and labor complexity.
Designed Experiments are
also powerful tools to achieve
manufacturing cost savings
by minimizing process
variation and reducing
rework, scrap, and the need
for inspection.
What is experimental design?
In an experiment,
we deliberately
change one or
more process
variables (or
factors) in order
to observe the
effect the
changes have on
one or more
response
variables.
The
(statistical)
design of
experiments
(DOE) is an
efficient
procedure for
planning
experiments
so that the
data obtained
can be
analyzed to
yield valid and
objective
DOE begins
with
determining
the objectives
of an
experiment
and selecting
the process
factors for the
study.
An
Experimental
Design is the
laying out of a
detailed
experimental
plan in
advance of
doing the
experiment.
Well chosen
experimental
designs
maximize the
amount of
"information"
that can be
obtained for a
given amount
of
experimental
effort.
The
statistical
theory
underlying
DOE
generally
begins with
the concept
of process
models.
BLACK BOX
PROCESS MODEL
Definition of
Design of
Experiments
(DOE)
Design of experiments (DOE) can be defined
as a set of statistical tools that deal with the
planning, executing, analyzing, and
interpretation of controlled tests to
determine which factors will impact and
drive the outcomes of your process.
Development of DOE
The agricultural origins, 1908 – 1940s
• W.S. Gossett and the t-test (1908)
• R. A. Fisher & his co-workers
• Profound impact on agricultural science
• Factorial designs, ANOVA
The first industrial era, 1951 – late 1970s
• Box & Wilson, response surfaces
• Applications in the chemical & process industries
The second industrial era, late 1970s – 1990
• Quality improvement initiatives in many companies
• Taguchi and robust parameter design, process robustness
The modern era, beginning circa 1990
DESIGN OF
EXPERIMENTS
By
Dr. Virendra Kumar, (Ph.D IITD)
Email: virendra.kumar@niet.co.in
Web site: https://sites.google.com/view/virendra
Experimental Design (or DOE) economically maximizes information
LECTURE 2
DOE Approaches?
Two of the most common approaches to DOE are a full factorial DOE and a
fractional factorial DOE.
Full factorial
DOE:
is to determine at what settings of your process inputs will
you optimize the values of your process outcomes.
• Which combination of machine speed, fill
speed, and carbonation level will give you the
most consistent fill?
• The experimentation using all possible factor
combinations is called a full factorial design.
• These combinations are called Runs.
• With three variables, machine speed, fill speed, and carbonation level, how many different
unique combinations would you have to test to explore all the possibilities?
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑅𝑢𝑛𝑠 = 2𝑘
where k is the number of variables and 2 is the number of levels, such as
(High/Low) or (100 ml per minute/200 ml per minute.
• What if you aren’t able to run the entire set of combinations of a full
factorial? What if you have monetary or time constraints, or too many
variables?
• This is when you might choose to run a fractional factorial, also referred
to as a screening DOE, which uses only a fraction of the total runs.
• That fraction can be one-half, one-quarter, one-eighth, and so forth
depending on the number of factors or variables.
• While there is a formula to calculate the number of runs, suffice it to say
you can just calculate your full factorial runs and divide by the fraction
that you and your Black Belt or Master Black Belt determine is best for
Factorial
Designs
Example
• In a factorial experiment, all possible
combinations of factor levels can be tested.
• The golf experiment:
• Type of driver
• Type of ball
• Walking vs. riding
• Type of beverage
• Time of round
• Weather
• Type of golf spike
• Etc, etc, etc
Factorial Designs Example
• Consider the golf experiment and suppose that only two factors,
type of driver and type of ball, are of interest.
• Figure shows a two-factor factorial experiment for studying the
joint effects of these two factors on golf score.
• Notice that this factorial experiment has both factors at two levels
and that all possible combinations of the two factors across their
levels are used in the design.
• Geometrically, the four runs form the corners of a square.
• This particular type of factorial experiment is called a 22 factorial
design (two factors, each at two levels).
• Because I can reasonably expect to play eight rounds of golf to
investigate these factors, a reasonable plan would be to play two
rounds of golf at each combination of factor levels shown in
Figure.
• An experimental designer would say that we have replicated the
design twice.
• This experimental design would enable the experimenter to
investigate the individual effects of each factor (or the main
Fig. A two-factor factorial
experiment involving type of
driver and type of ball
• The scores from each round of golf played at the four test
combinations are shown at the corners of the square.
• Notice that there are four rounds of golf that provide information
about using the regular-sized driver and four rounds that provide
information about using the oversized driver.
Factorial Designs Example
• By finding the average difference in the scores on the right- and left-hand sides
of the square (as in Figure b), we have a measure of the effect of switching from
the oversized driver to the regular-sized driver, or
• That is, on average, switching from the oversized to the
regular-sized driver increases the score by 3.25 strokes per
round.
• Figure (a) shows the results of performing the factorial experiment .
Factorial Designs Example
• Similarly, the average difference in the four scores at the top
of the square and the four scores at the bottom measures the
effect of the type of ball used (see Figure c):
• Finally, a measure of the interaction effect between the type of
ball and the type of driver can be obtained by subtracting the
average scores on the left-to-right diagonal in the square from
the average scores on the right-to-left diagonal (see Figure d),
resulting in
Conclusion of Factorial Designs Example
• The results of this factorial experiment indicate that driver effect is larger than
either the ball effect or the interaction.
• Statistical testing could be used to determine whether any of these effects differ
from zero.
• In fact, it turns out that there is reasonably strong statistical evidence that the
driver effect differs from zero and the other two effects do not.
• Therefore, this experiment indicates that I should always play with the oversized
driver.
• This simple Example showed, factorials make the most efficient use of the
experimental data.
• Notice that this experiment included eight observations, and all eight
observations are used to calculate the driver, ball, and interaction effects.
• No other strategy of experimentation makes such an efficient use of the data.
• This is an important and useful feature of factorials.
Benefits of DOE
Doing a designed experiment as opposed to using a trial-and-error approach has a number of benefits.
1.Identify the main effects of your factors: A main
effect is the impact of a specific variable on your
output.
1.Identifying interactions: Interactions occur if the impact
of one factor on your response is dependent upon the
setting of another factor.
1.You can determine optimal settings for your variables:
After analyzing all of your main effects and interactions, you will
be able to determine what your settings should be for your
factors or variables.
Why is DOE important to understand?
1. Choosing Between Alternatives: A common use is planning an experiment to gather data to make
a decision between two or more alternatives. Types of comparitive studies
2. Selecting the Key Factors Affecting a Response: Selecting the few that matter from the many
possible factors
3. Response Surface Modeling a Process: Some reasons to model a process are below;
• Hitting a Target: Often we want to "fine tune" a process to consistently hit a target.
• Maximizing or Minimizing a Response: Optimizing a process output is a common goal.
• Reducing Variation: Processes that are on target, on the average, may still have too much
variability
• Making a Process Robust: The less a process or product is affected by external conditions, the
better it is - this is called "Robustness“
• Seeking Multiple Goals: Sometimes we have multiple outputs and we have to compromise to
achieve desirable outcomes - DOE can help here
• Regression Modeling: Regression models are used to fit more precise models
Best practices when thinking about DOE
• Experiments take planning and proper execution, otherwise the results
may be meaningless. Here are a few hints for making sure you properly
run your DOE.
• Your process variables have different impacts on your output. Some are
statistically important, and some are just noise. You need to understand
which is which.
Carefully identify your variables
• Use existing data and data analysis to try and identify the most logical
factors for your experiment.
• Regression analysis is often a good source of selecting potentially
significant factors.
Prevent contamination of your experiment
• During your experiment, you will have your experimental factors as well as
other environmental factors around you that you aren’t interested in
• Unless you’ve done some prior screening of your potential factors, you
might want to start your DOE with a screening or fractional factorial
design.
• This will provide information as to potentially significant factors without
consuming your whole budget.
• Once you’ve identified the best potential factors, you can do a full factorial
with the reduced number of factors.
Best practices when thinking about DOE
Use screening experiments to reduce
cost and time
What are the steps of
DOE?
Obtaining good results from a DOE involves these
seven steps:
DOE
Set
objectives
Select
process
variables
Select an
experimental
design
Execute the
design
Check that
the data are
consistent
with the
experimental
assumptions
Analyze and
interpret the
results
Use/present
the results
(may lead to
further runs
or DOE's).
A checklist of practical considerations
Important practical considerations in planning and running experiments
are
Check performance of gauges/measurement devices first.
Keep the experiment as simple as possible.
Check that all planned runs are feasible.
Watch out for process drifts and shifts during the run.
Avoid unplanned changes (e.g., swap operators at halfway point).
Allow some time (and back-up material) for unexpected events.
Obtain buy-in from all parties involved.
Maintain effective ownership of each step in the experimental plan.
Preserve all the raw data--do not keep only summary averages!
Record everything that happens.
Reset equipment to its original state after the experiment.
DESIGN OF
EXPERIMENTS
By
Dr. Virendra Kumar, (Ph.D IITD)
Email: virendra.kumar@niet.co.in
Web site: https://sites.google.com/view/virendra
Experimental Design (or DOE) economically maximizes information
LECTURE 3
Basic principles of experimental design
randomization
replication
blocking
Sometimes we add the factorial principle to
Randomization
• Randomization is the cornerstone underlying the use of statistical methods in
experimental design.
• By randomization we mean that both the allocation of the experimental material and
the order in which the individual runs of the experiment are to be performed are
randomly determined.
• Statistical methods require that the observations (or errors) be independently
distributed with random variables.
• Randomization usually makes this assumption valid.
• By properly randomizing the experiment, we also assist in “averaging out” the effects of
extraneous factors that may be present.
For example, suppose that the specimens in the hardness experiment are of
slightly different thicknesses and that the effectiveness of the quenching
medium may be affected by specimen thickness. If all the specimens subjected to
the oil quench are thicker than those subjected to the saltwater quench, we
may be introducing systematic bias into the experimental results. This bias handicaps
one of the quenching media and consequently invalidates our results. Randomly
assigning the specimens to the quenching media alleviates this problem.
• Computer software programs are widely used to assist experimenters in
selecting and constructing experimental designs.
• These programs often present the runs in the experimental design in random
order.
• This random order is created by using a random number generator.
• Even with such a computer program, it is still often necessary to assign units
of experimental material, operators, gauges or measurement devices, and so
forth for use in the experiment.
• Sometimes experimenters encounter situations where randomization of some
aspect of
• the experiment is difficult.
Replication
• By replication we mean an independent repeat run of each factor combination.
For example: In the metallurgical experiment, replication would consist of treating a specimen by oil
quenching and treating a specimen by saltwater quenching. Thus, if five specimens are treated in each
quenching medium, we say that five replicates have been obtained.
• Each of the 10 observations should be run in random order.
• Replication has two important properties.
First, it allows the experimenter to obtain an estimate of the experimental error. This estimate of error
becomes a basic unit of measurement for determining whether observed differences in the data are really
statistically different.
Second, if the sample mean ( 𝑦) is used to estimate the true mean response for one of the factor levels in
the experiment, replication permits the experimenter to obtain a more precise estimate of this parameter.
For example; if 𝜎2
is the variance of an individual observation and there are n replicates, the variance of
the sample mean is
Blocking
• Blocking is a design technique used to improve the precision with which comparisons
among the factors of interest are made.
• Often blocking is used to reduce or eliminate the variability transmitted from nuisance
factors—that is, factors that may influence the experimental response but in which we
are not directly interested.
For example, an experiment in a chemical process may require two batches of
raw material to make all the required runs. However, there could be differences
between the batches due to supplier-to-supplier variability, and if we are not specifically
interested in this effect, we would think of the batches of raw material as a
nuisance factor.
• Generally, a block is a set of relatively homogeneous experimental conditions.
• In the chemical process example, each batch of raw material would form a block,
because the variability within a batch would be expected to be smaller than the
variability between batches.
• Typically, as in this example, each level of the nuisance factor becomes a block.
• Then the experimenter divides the observations from the statistical design into groups
Guidelines for Designing Experiments
To use the statistical approach in designing and analyzing an experiment, it is
necessary for everyone involved in the experiment to have a clear idea in advance of
exactly what is to be studied, how the data are to be collected, and at least a
qualitative understanding of how these data are to be analyzed.
STEP 1: Recognition of and statement of the
problem.
• to realize that a problem requiring experimentation exists.
• to develop a clear and generally accepted statement for problem.
• It is important to solicit input from all concerned parties: engineering, quality
assurance, manufacturing, marketing, management, customer, and
operating personnel.
• It will be helpful if prepare a list of specific problems or questions that are to
be addressed by the experiment.
• Keep always the overall objectives of the experiment in mind.
• running experiments and each type of experiment will generate its own list
of specific questions that need to be addressed.
Guidelines for Designing Experiments
There are several broad reasons for running experiments some are
follows:
STEP 1: Recognition of and statement of the
problem
Factor screening or characterization:which factors have the most influence on the
response(s) of interest.
Optimization: find the settings or levels of the important factors that result in
desirable values of the response..
Confirmation: to verify that the system operates or behaves in a manner that is
consistent with some theory or past experience.
Discovery:In discovery experiments, the experimenters are usually trying to
determine what happens when we explore new materials, or new factors, or new
ranges for factors.
Robustness:what conditions do the response variables of interest seriously
degrade? Or what conditions would lead to unacceptable variability in the response
STEP 2: Selection of the response variable
Guidelines for Designing Experiments
• In selection of response variable, the experimenter should be certain that this
variable really provides useful information about the process under study.
• Most often, the average or standard deviation (or both) of the measured
characteristic will be the response variable.
• Multiple responses are not unusual.
• The experimenters must decide how each response will be measured, and
address issues such as how will any measurement system be calibrated and
how this calibration will be maintained during the experiment.
• The gauge or measurement system capability (or measurement error) is also
an important factor.
• It is usually critically important to identify issues related to defining the
responses of interest and how they are to be measured before conducting the
experiment.
• Sometimes designed experiments are employed to study and improve the
performance of measurement systems.
STEP 3: Choice of factors, levels, and range
Guidelines for Designing Experiments
• The experimenter should discover that these factors are either potential design
factors or nuisance factors.
• Further classify; helpful design factors, held-constant factors, and
allowed-to-vary factors.
• The design factors are the factors actually selected for study in the
experiment.
• Held-constant factors are variables that may exert some effect on the
response, but for purposes of the present experiment these factors are not of
interest, so they will be held at a specific level.
• An allowed-to-vary factors, the experimental units or the “materials” to which
the design factors are applied are usually nonhomogeneous, yet we often
ignore this unit-to-unit variability and rely on randomization to balance out any
material or experimental unit effect.
• We often assume that the effects of held-constant factors and allowed-to-
vary factors are relatively small.
STEP 3: Choice of factors, levels, and range
Guidelines for Designing Experiments
• When the objective of the experiment is factor screening or process
characterization, it is usually best to keep the number of factor levels low
(Generally two levels).
cause-and-effect diagram (fishbone diagram):It is a useful technique for
organizing some of the information generated in pre-experimental planning.
FIGURE: A cause-and-effect diagram for the etching
process experiment
FIGUREA: cause-and-effect diagram for the
CNC
STEP 4: Choice of experimental design.
Guidelines for Designing Experiments
• Choice of design involves consideration of sample size (number of replicates),
selection of a suitable run order for the experimental trials, and determination
of whether or not blocking or other randomization restrictions are involved.
• There are several interactive statistical software packages that support this
phase of experimental design.
• The experimenter can enter information about the number of factors, levels,
and ranges, and these programs will either present a selection of designs for
consideration or recommend a particular design.
• We usually prefer to see several alternatives instead of relying entirely on a
computer recommendation in most cases.
• Most software packages also provide some diagnostic information about how
each design will perform and helps in finding best design alternative.
• These programs will usually also provide a worksheet (with the orderof the
runs randomized) for use in conducting the experiment.
STEP 4: Choice of experimental design.
Guidelines for Designing Experiments
• Design selection also involves thinking about and selecting a tentative
empirical model to describe the results.
• The model is just a quantitative relationship (equation) between the
response and the important design factors.
• In many cases, a low-order polynomial model will be appropriate.
• A first-order model in two variables is
where y is the response, the x’s are the design factors, the x’s are unknown parameters that will be
estimated from the data in the experiment, and ε is a random error term that accounts for the
experimental error in the system that is being studied.
• The first-order model is also sometimes called a main effects model. First-order
models are used extensively in screening or characterization experiments.
• A common extension of the first-order model is to add an interaction
term, say
STEP 4: Choice of experimental design.
Guidelines for Designing Experiments
where the cross-product term x1x2 represents the two-factor interaction between the design
factors.
• Interactions between factors is relatively common, the first order model with
interaction is widely used.
• Higher-order interactions can also be included in experiments with more than
two factors if necessary.
• Another widely used model is the second-order model
• Second-order models are often used in optimization experiments
• In selecting the design, it is important to keep the experimental objectives in
mind.
STEP 5: Performing the experiment
Guidelines for Designing Experiments
• When running the experiment, it is vital to monitor the process carefully to
ensure that everything is being done according to plan.
• Errors in experimental procedure at this stage will usually destroy experimental
validity.
• One of the most common mistakes is that the people conducting the experiment
failed to set the variables to the proper levels on some runs.
• Someone should be assigned to check factor settings before each run.
• Up-front planning to prevent mistakes like this is crucial to success.
• It is easy to underestimate the logistical and planning aspects of running a
designed experiment in a complex manufacturing or research and development
environment.
• Coleman and Montgomery (1993) suggest that prior to conducting the
experiment a few trial runs or pilot runs are often helpful.
• These runs provide information about consistency of experimental material, a
check on the measurement system, a rough idea of experimental error, and a
chance to practice the overall experimental technique.
STEP 6: Statistical analysis of the data
Guidelines for Designing Experiments
• Statistical methods should be used to analyze the data so that results and
conclusions are objective rather than judgmental in nature.
• There are many excellent software packages designed to assist in data analysis,
and many of the programs used in step 4 to select the design provide a
seamless, direct interface to the statistical analysis.
• Often, we find that simple graphical methods play an important role in data
analysis and interpretation.
• It also helps in stablishing results of many experiments in terms of an empirical
model.
• Statical methods only provides guidelines as to the reliability and validity of
results.
• When properly applied, statistical methods do not allow anything to be proved
experimentally, but they do allow us to measure the likely.
• The primary advantage of statistical methods is that they add objectivity to the
decision-making process.
STEP 7: Conclusions and recommendations
Guidelines for Designing Experiments
• Once the data have been analyzed, the experimenter must draw practical
conclusions about the results and recommend a course of action.
• Graphical methods are often useful in this stage, particularly in presenting the
results to others.
• Follow-up runs and confirmation testing should also be performed to validate the
conclusions from the experiment.
• The experimentation is iterative process. It is usually a major mistake to design a
single, large, comprehensive experiment at the start of a study.
• A successful experiment requires knowledge of the important factors, the ranges
over which these factors should be varied, the appropriate number of levels to
use, and the proper units of measurement for these variables.
• Generally, we do not perfectly know the answers to these questions, but we
learn about them as we go along.
• As an experimental program progresses, we often drop some input variables,
add others, change the region of exploration for some factors, or add new
STEP 7: Conclusions and recommendations
Guidelines for Designing Experiments
• Consequently, we usually experiment sequentially, and as a general rule, no
more than about 25 percent of the available resources should be invested in the
first experiment.
• This will ensure that sufficient resources are available to perform confirmation
runs and ultimately accomplish the final objective of the experiment.
• Finally, it is important to recognize that all experiments are designed
experiments.
• The important issue is whether they are well designed or not.
• Good pre-experimental planning will usually lead to a good, successful
experiment.
DESIGN OF
EXPERIMENTS
By
Dr. Virendra Kumar, (Ph.D IITD)
Email: virendra.kumar@niet.co.in
Web site: https://sites.google.com/view/virendra
Experimental Design (or DOE) economically maximizes information
LECTURE 4
Concepts of random variable
• Random means are unpredictable. Hence, a random variable means a variable
whose future value is unpredictable despite knowing its past performance.
• A random variable is a variable whose possible values are the numerical outcomes of
a random experiment.
• Therefore, it is a function which associates a unique numerical value with every
outcome of an experiment.
• Further, its value varies with every trial of the experiment.
• For example, when you toss an unbiased coin, the outcome can be a head or a tail.
Even if you keep tossing the coin indefinitely, the outcomes are either of the two. Also,
you would never know the outcome in advance.
Random Experiment: A random experiment is a process which leads to an uncertain
outcome.
• Usually, it is assumed that the experiment is repeated indefinitely under homogeneous
conditions.
• While the result of a random experiment is not unique, it is one of the possible
• In a random experiment, the outcomes are not always numerical.
• But we need numbers as outcomes for calculations.
• Therefore, we define a random variable as a function which associates a unique
numerical value with every outcome of a random experiment.
• For example, in the case of the tossing of an unbiased coin, if there are 3 trials,
then the number of times a ‘head’ appears can be a random variable. This has
values 0, 1, 2, or 3 since, in 3 trials, you can get a minimum of 0 heads and a
maximum of 3 heads.
Concepts of random variable
• Classify of random variables are based on their probability distribution.
• A random variable either has an associated probability distribution (Discrete
Random Variable), or a probability density function (Continuous Random
Variable).
• Therefore, we have two types of random variables – Discrete and Continuous.
Types of Random variables
Discrete Random
Variables:
• Discrete random variables take on only a countable number of distinct values.
• Usually, these variables are counts (not necessarily though).
• If a random variable can take only a finite number of distinct values, then it is
discrete.
• Number of members in a family, number of defective light bulbs in a box of 10
bulbs, etc. are some examples of discrete random variables.
• The probability distribution of these variables is a list of probabilities associated
with each of its possible values.
• It is also called the probability function or the probability mass function.
Types of Random variables
Example of Discrete Random Variables
• You toss a coin 10 times. The random variable X is the number of times you get
a ‘tail’. X can only take values 0, 1, 2, … , 10. Therefore, X is a discrete random
variable.
• Let’s look at the probability of getting 8 tails.
• p8 (probability of getting 8 tails) falls in the range 0 to 1. Also, the sum of
probabilities for all possible values of tails p0 + p1 + … p10 = 1.
If a random variable (X) takes ‘k’ different values, with the probability
that X = xi is defined as P(X = xi) =pi, then it must satisfy the following:
0 < pi < 1 (for each ‘i’)
p1 + p2 + p3 + … + pk = 1
Types of Random variables
Continuous Random
Variables:
• Continuous random variables take up an infinite number of possible values
which are usually in a given range.
• Typically, these are measurements like weight, height, the time needed to finish
a task, etc.
• For example, the life of an individual in a community is a continuous random
variable. Let’s say that the average lifespan of an individual in a community is
110 years. Therefore, a person can die immediately on birth (where life = 0
years) or after he attains an age of 110 years. Within this range, he can die at
any age. Therefore, the variable ‘Age’ can take any value between 0 and 110.
• Hence, continuous random variables do not have specific values since the
number of values is infinite.
• Also, the probability at a specific value is almost zero.
• However, there is always a non-negative probability that a certain outcome will
lie within the interval between two values.
Probability
• Probability means possibility.
• Probability is a measure of the likelihood of an event to occur.
• Many events cannot be predicted with total certainty.
• We can predict only the chance of an event to occur i.e. how likely they are to
happen, using it.
• Probability can range in from 0 to 1, where 0 means the event to be an
impossible one and 1 indicates a certain event.
• The probability of all the events in a sample space adds up to 1.
The best example for understanding probability is flipping a coin: There are
two possible outcomes—heads or tails. What’s the probability of the coin landing
on Heads? We can find out using the equation P(H) = ?, You might intuitively know
that the likelihood is half/half, or 50%.
Probability=
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠
𝑇𝑜𝑡𝑎𝑙 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠
Probability Density Function (PDFs)/Density of a continuous
random variable
Height
(%)
Expected amount of rain (inch)
tomorrow
0 1 2 3 4 5
0.
5
𝑌 = 𝑒𝑥𝑎𝑐𝑡 𝑎𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝑟𝑎𝑖𝑛 𝑡𝑜𝑚𝑜𝑟𝑟𝑜𝑤
P(Y=2)=0.5 ?
What is a probability random variable
(Y) is exactly =2 inch.
Not 2.01/2.0001 or 1.99/1.9999
Not even we have tool which
can measure exactly 2 inch
Probability to have almost 2
inch is
𝑃 = ( 𝑌 − 2 <0.1 (tolerance))
Probability
P(1.9<Y>2.1
Area under the curve is Probability
Density function
F(x)
𝑃 = ( 𝑌 − 2 <0.1=
1.9
2.1
𝑓 𝑥 𝑑𝑥
• It is a function whose value at any given sample (or point) in the sample
space (the set of possible values taken by the random variable) can be
interpreted as providing a relative likelihood that the value of the random
variable would be close to that sample.
• In other words, while the absolute likelihood for a continuous random
variable to take on any particular value is 0 (since there is an infinite set of
possible values to begin with), the value of the PDF at two different samples
can be used to infer, in any particular draw of the random variable, how
much more likely it is that the random variable would be close to one sample
compared to the other sample.
random variable
DESIGN OF
EXPERIMENTS
By
Dr. Virendra Kumar, (Ph.D IITD)
Email: virendra.kumar@niet.co.in
Web site: https://sites.google.com/view/virendra
Experimental Design (or DOE) economically maximizes information
LECTURE 5
Cumulative Distribution Function(CDF)
• It is used to calculate the area under the curve to the left from a point to interest.
• It is used to evaluate the accumulated probability.
• For continuous probability distributions, the probability=area under the curve.
Total area=1
• The Probability distribution function (PDF) is f(x) which describes the shape of
the distribution (uniform, exponential or normal distribution).
Let Uniform
distribution
F(x)=
1
𝑏−𝑎
F(x)
a b
PDF
F(x)
a b
Area=Base × Hight
= (x-a) × F(x)
x
Cumulative Distribution Function (CDF)
𝐴𝑟𝑒𝑎 = 𝑥 − 𝑎 ×
1
(𝑏 − 𝑎)
𝐴𝑟𝑒𝑎 𝑙𝑒𝑓𝑡 𝑓𝑟𝑜𝑚 𝑝𝑜𝑖𝑛𝑡 𝑜𝑓 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡 = 𝑃 𝑋 ≤ 𝑥 =
𝑥 − 𝑎
𝑏 − 𝑎
𝑃 𝑋 ≤ 𝑥 =
𝑥 − 𝑎
𝑏 − 𝑎
It is called CDF
Let
exponential
distribution
λ
Value decreases with
time
PDF=f(x)=𝜆𝑒−𝜆𝑥
=𝜆 =
1
𝜇
Shape
of
graph
CDF=Area 𝐴𝑙
λ
x
𝐴𝑙 = 𝑃 𝑋 ≤ 𝑥 = 1 − 𝑒−𝜆𝑥
CD
F
Cumulative Distribution Function (CDF)
λ
• 𝐴𝑡 = 1
• 𝐴𝑙 + 𝐴𝑟 = 1
• 1 − 𝑒−𝜆𝑥
+ 𝐴𝑟 = 1
• 𝐴𝑟= 𝑒−𝜆𝑥
Cumulative Distribution Function (CDF)
a b
Area=P(a<x<b)=P(x<b) - P(x<a)
CDF= P(a<x<b)= 1 − 𝑒−λ𝑏 −[1-𝑒−𝜆𝑎]
Area left to b
Area left to a
Remember that P(x≤a≤b=P(a<x<b) for continuous probability distribution
P(x=a)=0 because x=a is only line which has only hight but no width.
Sampling and Sampling Distributions
• A sampling distribution is a statistic that is arrived out through repeated sampling
from a larger population
• It describes a range of possible outcomes that of a statistic, such as the mean or
mode of some variable, as it truly exists a population.
• The majority of data analyzed by researchers are actually drawn from samples,
and not populations.
• In statistics, a population is the entire pool from which a statistical sample is
drawn.
• A population may refer to an entire group of people, objects, events, hospital
visits, or measurements.
Understanding Sampling Distribution
• A lot of data drawn and used by academicians, statisticians, researchers, marketers, analysts, etc.
are actually samples, not populations. A sample is a subset of a population.
• For example, a medical researcher that wanted to compare the average weight of all babies born
in Uttar Pradesh from 1995 to 2005 to those born in Delhi within the same time period cannot
within a reasonable amount of time draw the data for the entire population of over a million
childbirths that occurred over the ten-year time frame.
• He will instead only use the weight of, say, 100 babies, in each continent to make a conclusion.
• The weight of 200 babies used is the sample and the average weight calculated is the sample
mean.
• Now suppose that instead of taking just one sample of 100 newborn weights from each continent,
the medical researcher takes repeated random samples from the general population, and
• The average weight computed for each sample set is the sampling distribution of the
mean.
• Not just the mean can be calculated from a sample. Other statistics, such as the
standard deviation, variance, proportion, and range can be calculated from sample
data.
• The standard deviation and variance measure the variability of the sampling
distribution.
• The number of observations in a population, the number of observations in a sample
and the procedure used to draw the sample sets determine the variability of a sampling
distribution.
• The standard deviation of a sampling distribution is called the standard error.
• While the mean of a sampling distribution is equal to the mean of the population, the
standard error depends on the standard deviation of the population, the size of the
population and the size of the sample.
Understanding Sampling Distribution
• For example, suppose that y1, y2, . . . , yn represents a sample.
and the sample
variance
Then the sample mean
• These quantities are measures of the central tendency and dispersion of the
sample, respectively.
• Sometimes 𝑠 = 𝑠2 , called the sample standard deviation, is used as a
measure of dispersion.
• Experimenters often prefer to use the standard deviation to measure
dispersion because its units are the same as those for the variable of interest
y.
DESIGN OF
EXPERIMENTS
By
Dr. Virendra Kumar, (Ph.D IITD)
Email: virendra.kumar@niet.co.in
Web site: https://sites.google.com/view/virendra
Experimental Design (or DOE) economically maximizes information
LECTURE 6
Measures of Central Tendency: Mean, Median, and Mode
• A measure of central tendency is a summary statistic that represents the
center point or typical value of a dataset.
• These measures indicate where most values in a distribution fall and are
also referred to as the central location of a distribution.
• You can think of it as the tendency of data to cluster around a middle value.
• In statistics, the three most common measures of central tendency are
the mean, median, and mode.
• Each of these measures calculates the location of the central point using a
different method.
• Choosing the best measure of central tendency depends on the type of data
you have.
• In each distribution, look for the region where the most common values fall.
• Even though the shapes and type of data are different, you can find that central
location.
• That’s the area in the distribution where the most common values are located.
• The three distributions below represent different data conditions.
Mean
• The mean is the arithmetic average, and it is probably the measure of central
tendency that you are most familiar.
• Calculating the mean is very simple. You just add up all of the values and divide
by the number of observations in your dataset.
• The calculation of the mean incorporates all
values in the data. If you change any value,
the mean changes.
• However, the mean doesn’t always locate
the center of the data accurately.
• Observe the histograms where I showed the
mean in the distributions.
• Extreme values in an extended tail pull the
mean away from the center.
Median
• The median is the middle value.
• It is the value that splits the dataset in half.
• To find the median, order your data from smallest to largest,
and then find the data point that has an equal amount of values
above it and below it.
• The method for locating the median varies slightly depending on
whether your dataset has an even or odd number of values.
• When there is an even number of values, you The average of 27 and 29 is
28. Consequently, 28 is the median of this dataset. count in to the two
innermost values and then take the average.
• In the examples, I used whole numbers for simplicity, but you can have
decimal places.
• In the dataset with the odd number of observations, notice how the
number 12 has six values above it and six below it. Therefore, 12 is the
median of this dataset.
• Outliners and skewed data have a smaller effect on the median.
• For example: we have the Median dataset below and find that the
median is 46. However, we discover data entry errors and need to
change four values, which are shaded in the Median Fixed dataset.
We’ll make them all significantly higher so that we now have a
skewed distribution with large outliers.
• As you can see, the median doesn’t change at all. It is still 46.
• Unlike the mean, the median value doesn’t depend on all the values
in the dataset.
• Consequently, when some of the values are more extreme, the
effect on the median is smaller.
• Of course, with other types of changes, the median can change.
• When you have a skewed distribution, the median is a better
measure of central tendency than the mean.
Comparing the mean and median
• In a symmetric distribution, the mean and median both find the center accurately.
They are approximately equal.
• In a skewed distribution, the outliers in the tail pull the mean away from the
center towards the longer tail. For this example, the mean and median differ by
over 9000, and the median better represents the central tendency for the
distribution.
Mode
• The mode is the value that occurs the most frequently in your data
set.
• On a bar chart, the mode is the highest bar.
• If the data have multiple values that are tied for occurring the most
frequently, you have a multimodal distribution.
• If no value repeats, the data do not have a mode.
• In the dataset, the value 5 occurs most frequently, which makes it the
mode. These data might represent a 5-point Likert scale.
• Typically, you use the mode with categorical, ordinal, and discrete data.
• In fact, the mode is the only measure of central tendency that you can use
with categorical data—such as the most preferred flavor of ice cream.
• However, with categorical data, there isn’t a central value because you can’t
order the groups.
• With ordinal and discrete data, the mode can be a value that is not in the center.
Again, the mode represents the most common value.
When should you use the mean, median or mode?
Confidence Level: What is it?
• When a poll is reported in the media, a confidence level is often included in the
results.
• For example, a survey might report a 95 percent confidence level. But what
exactly does this mean? At first glance you might think that it means it’s 95
percent accurate. That’s close to the truth, but like many things in statistics, it’s
actually a little more defined.
• It is often expressed as a % whereby
a population mean lies between an
upper and lower interval.
• Due to natural sampling variability, the
sample mean (center of the CI) will
vary from sample to sample.
• As the sample size increases, the range of interval values will narrow,
meaning that you know that mean with much more accuracy compared with a
smaller sample
• Accordingly, there is a 5% chance that the population mean lies outside of the
upper and lower confidence interval (as illustrated by the 2.5% of outliers on
either side of the 1.96 z-scores).
Why do researchers use confidence intervals?
• It is more or less impossible to study every single
person in a population so researchers select a sample
or sub-group of the population.
• This means that the researcher can only estimate the
parameters (i.e. characteristics) of a population, the
estimated range being calculated from a given set of
sample data.
• Therefore, a confidence interval is simply a way to measure how well your sample
represents the population you are studying.
• The probability that the confidence interval includes the true mean value within a
population is called the confidence level of the CI.
• You can calculate a CI for any confidence level you like, but the most commonly
used value is 95%. A 95% means you can be 95% certain.
Factors that Affect Confidence Intervals (CI)
• Population size: this does not usually affect the CI but can be a factor if you
are working with small and known groups of people.
• Sample Size: the smaller your sample, the less likely it is you can be confident
the results reflect the true population parameter.
• Percentage: Extreme answers come with better accuracy. For example, if 99
percent of voters are for gay marriage, the chances of error are small. However,
if 49.9 percent of voters are “for” and 50.1 percent are “against” then the
chances of error are bigger.
0% and 100% Confidence Level
• A 0% confidence level means you have no faith at all that if you repeated the
survey that you would get the same results.
• A 100% confidence level means there is no doubt at all that if you repeated the
survey you would get the same results.
• In reality, you would never publish the results from a survey where you had no
confidence at all that your statistics were accurate (you would probably repeat
the survey with better techniques).
• A 100% confidence level doesn’t exist in statistics, unless you surveyed an
entire population — and even then you probably couldn’t be 100 percent sure
that your survey wasn’t open to some kind or error or bias.
• The confidence coefficient is the confidence level stated as a proportion,
rather than as a percentage. For example, if you had a confidence level of
99%, the confidence coefficient would be 99.
How do I calculate a confidence interval?
• To calculate the confidence interval, start by computing the mean and standard error of the
sample.
• Remember, you must calculate an upper and low score for the confidence interval using the z-
score for the chosen confidence level (see table below).
Confidence Interval Formula
Where:
x is the mean
z is the chosen Z-value (1.96 for
95%)
s is the standard error
n is the sample size
An Example
•𝑿 (mean) = 86
•Z = 1.960 (from the table above for 95%)
•s (standard error) = 6.2
•n (sample size) = 46
Lower Value: 86 - 1.960 × 6.2 √46 = 86 - 1.79 = 84.21
Upper Value: 86 + 1.960 × 6.2 √46 = 86 + 1.79 = 87.79
So the population mean is likely to be between 84.21 and 87.79

More Related Content

What's hot

design of experiments
design of experimentsdesign of experiments
design of experiments
sigma-tau
 
Principles of design of experiments (doe)20 5-2014
Principles of  design of experiments (doe)20 5-2014Principles of  design of experiments (doe)20 5-2014
Principles of design of experiments (doe)20 5-2014
Awad Albalwi
 
General Factor Factorial Design
General Factor Factorial DesignGeneral Factor Factorial Design
General Factor Factorial Design
Noraziah Ismail
 

What's hot (20)

Design of experiments formulation development exploring the best practices ...
Design of  experiments  formulation development exploring the best practices ...Design of  experiments  formulation development exploring the best practices ...
Design of experiments formulation development exploring the best practices ...
 
Design of Experiments (Pharma)
Design of Experiments (Pharma)Design of Experiments (Pharma)
Design of Experiments (Pharma)
 
design of experiments
design of experimentsdesign of experiments
design of experiments
 
Fractional Factorial Designs
Fractional Factorial DesignsFractional Factorial Designs
Fractional Factorial Designs
 
Taguchi design of experiments nov 24 2013
Taguchi design of experiments nov 24 2013Taguchi design of experiments nov 24 2013
Taguchi design of experiments nov 24 2013
 
Design of Experiments
Design of ExperimentsDesign of Experiments
Design of Experiments
 
Design of experiments-Box behnken design
Design of experiments-Box behnken designDesign of experiments-Box behnken design
Design of experiments-Box behnken design
 
Principles of design of experiments (doe)20 5-2014
Principles of  design of experiments (doe)20 5-2014Principles of  design of experiments (doe)20 5-2014
Principles of design of experiments (doe)20 5-2014
 
Introduction to Design of Experiments by Teck Nam Ang (University of Malaya)
Introduction to Design of Experiments by Teck Nam Ang (University of Malaya)Introduction to Design of Experiments by Teck Nam Ang (University of Malaya)
Introduction to Design of Experiments by Teck Nam Ang (University of Malaya)
 
Optimization techniques
Optimization  techniquesOptimization  techniques
Optimization techniques
 
How conduct a Design of Experiments
How conduct a Design of ExperimentsHow conduct a Design of Experiments
How conduct a Design of Experiments
 
Design of Experiments
Design of Experiments Design of Experiments
Design of Experiments
 
Crossover design ppt
Crossover design pptCrossover design ppt
Crossover design ppt
 
Experimental design
Experimental designExperimental design
Experimental design
 
LATIN SQUARE DESIGN - RESEARCH DESIGN
LATIN SQUARE DESIGN - RESEARCH DESIGNLATIN SQUARE DESIGN - RESEARCH DESIGN
LATIN SQUARE DESIGN - RESEARCH DESIGN
 
General Factor Factorial Design
General Factor Factorial DesignGeneral Factor Factorial Design
General Factor Factorial Design
 
Optimization through statistical response surface methods
Optimization through statistical response surface methodsOptimization through statistical response surface methods
Optimization through statistical response surface methods
 
design of experiments.ppt
design of experiments.pptdesign of experiments.ppt
design of experiments.ppt
 
Minitab- A statistical tool
Minitab- A statistical tool Minitab- A statistical tool
Minitab- A statistical tool
 
Factorial design M Pharm 1st Yr.
Factorial design M Pharm 1st Yr.Factorial design M Pharm 1st Yr.
Factorial design M Pharm 1st Yr.
 

Similar to introduction to design of experiments

Planning of experiment in industrial research
Planning of experiment in industrial researchPlanning of experiment in industrial research
Planning of experiment in industrial research
pbbharate
 

Similar to introduction to design of experiments (20)

Planning of experiment in industrial research
Planning of experiment in industrial researchPlanning of experiment in industrial research
Planning of experiment in industrial research
 
PE-2021-306 OVAT and DoE.pptx
PE-2021-306 OVAT and DoE.pptxPE-2021-306 OVAT and DoE.pptx
PE-2021-306 OVAT and DoE.pptx
 
Design of Experiments
Design of ExperimentsDesign of Experiments
Design of Experiments
 
Design of experiments
Design of experimentsDesign of experiments
Design of experiments
 
om
omom
om
 
om
omom
om
 
mel705-15.ppt
mel705-15.pptmel705-15.ppt
mel705-15.ppt
 
mel705-15.ppt
mel705-15.pptmel705-15.ppt
mel705-15.ppt
 
computer aided formulation development
 computer aided formulation development computer aided formulation development
computer aided formulation development
 
Design of experiments BY Minitab
Design of experiments BY MinitabDesign of experiments BY Minitab
Design of experiments BY Minitab
 
Unit-1 DOE.ppt
Unit-1 DOE.pptUnit-1 DOE.ppt
Unit-1 DOE.ppt
 
Unit-1 DOE.ppt
Unit-1 DOE.pptUnit-1 DOE.ppt
Unit-1 DOE.ppt
 
Concept of optimization Optimization parameters.pptx
Concept of optimization Optimization parameters.pptxConcept of optimization Optimization parameters.pptx
Concept of optimization Optimization parameters.pptx
 
Introduction to Statistics and Probability:
Introduction to Statistics and Probability:Introduction to Statistics and Probability:
Introduction to Statistics and Probability:
 
DAE1.pptx
DAE1.pptxDAE1.pptx
DAE1.pptx
 
Experimental Design.pptx
Experimental Design.pptxExperimental Design.pptx
Experimental Design.pptx
 
Design of Experiment ppt by Ganesh Asabe
Design of Experiment ppt by Ganesh AsabeDesign of Experiment ppt by Ganesh Asabe
Design of Experiment ppt by Ganesh Asabe
 
plackett-burmandesignppt.pptx
plackett-burmandesignppt.pptxplackett-burmandesignppt.pptx
plackett-burmandesignppt.pptx
 
Tema 4. Diseño experimental para un factor
Tema 4. Diseño experimental para un factorTema 4. Diseño experimental para un factor
Tema 4. Diseño experimental para un factor
 
1_Design and Analysis of Experiment_Data Science.pptx
1_Design and Analysis of Experiment_Data Science.pptx1_Design and Analysis of Experiment_Data Science.pptx
1_Design and Analysis of Experiment_Data Science.pptx
 

Recently uploaded

Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
EADTU
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Recently uploaded (20)

PANDITA RAMABAI- Indian political thought GENDER.pptx
PANDITA RAMABAI- Indian political thought GENDER.pptxPANDITA RAMABAI- Indian political thought GENDER.pptx
PANDITA RAMABAI- Indian political thought GENDER.pptx
 
AIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptAIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.ppt
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
 
VAMOS CUIDAR DO NOSSO PLANETA! .
VAMOS CUIDAR DO NOSSO PLANETA!                    .VAMOS CUIDAR DO NOSSO PLANETA!                    .
VAMOS CUIDAR DO NOSSO PLANETA! .
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf arts
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
Our Environment Class 10 Science Notes pdf
Our Environment Class 10 Science Notes pdfOur Environment Class 10 Science Notes pdf
Our Environment Class 10 Science Notes pdf
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Economic Importance Of Fungi In Food Additives
Economic Importance Of Fungi In Food AdditivesEconomic Importance Of Fungi In Food Additives
Economic Importance Of Fungi In Food Additives
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Details on CBSE Compartment Exam.pptx1111
Details on CBSE Compartment Exam.pptx1111Details on CBSE Compartment Exam.pptx1111
Details on CBSE Compartment Exam.pptx1111
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Simple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfSimple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdf
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 

introduction to design of experiments

  • 1.
  • 2. DESIGN OF EXPERIMENTS By Dr. Virendra Kumar, (Ph.D IITD) Email: virendra.kumar@niet.co.in Web site: https://sites.google.com/view/virendra Experimental Design (or DOE) economically maximizes information LECTURE 1
  • 4. Recommended Books Textbooks: 1. D.C. Montgomery, Design and Analysis of Experiments, Wiley India, 5th Edition, 2006, ISBN – 812651048- X. 2. Madhav S. Phadke, Quality Engineering Using Robust Design, Prentice Hall PTR, Englewood Cliffs, New Jersey 07632,1989, ISBN: 0137451679. Reference Books 1. Robert H. Lochner, Joseph E. Matar, Designing for Quality - an Introduction Best of Taghuchi and Western Methods or Statistical Experimental Design, Chapman and Hall, 1990, ISBN – 0412400200. 2. Philip J. Ross, Taguchi Techniques for Quality Engineering: Loss Function, Orthogonal Experiments, Parameter and Tolerance Design, McGraw-Hill, 2nd Edition, 1996, ISBN: 0070539588.
  • 5. What is Experiment? • The term experiment is defined as the systematic procedure carried out under controlled conditions in order to discover an unknown effect, to test or establish a hypothesis, or to illustrate a known effect. • When analyzing a process, experiments are often used to evaluate which process inputs have a significant impact on the process output, and what the target level of those inputs should be to achieve a desired result (output). • Experiments can be designed in many different ways to collect this information. • Design of Experiments (DOE) is also referred to as Designed Experiments or Experimental Design - all of the terms have the same meaning.
  • 6. Aim to Design of Experiments Experimental design can be used at the point of greatest leverage to reduce design costs by speeding up the design process, reducing late engineering design changes, and reducing product material and labor complexity. Designed Experiments are also powerful tools to achieve manufacturing cost savings by minimizing process variation and reducing rework, scrap, and the need for inspection.
  • 7. What is experimental design? In an experiment, we deliberately change one or more process variables (or factors) in order to observe the effect the changes have on one or more response variables. The (statistical) design of experiments (DOE) is an efficient procedure for planning experiments so that the data obtained can be analyzed to yield valid and objective DOE begins with determining the objectives of an experiment and selecting the process factors for the study. An Experimental Design is the laying out of a detailed experimental plan in advance of doing the experiment. Well chosen experimental designs maximize the amount of "information" that can be obtained for a given amount of experimental effort. The statistical theory underlying DOE generally begins with the concept of process models.
  • 9. Definition of Design of Experiments (DOE) Design of experiments (DOE) can be defined as a set of statistical tools that deal with the planning, executing, analyzing, and interpretation of controlled tests to determine which factors will impact and drive the outcomes of your process.
  • 10. Development of DOE The agricultural origins, 1908 – 1940s • W.S. Gossett and the t-test (1908) • R. A. Fisher & his co-workers • Profound impact on agricultural science • Factorial designs, ANOVA The first industrial era, 1951 – late 1970s • Box & Wilson, response surfaces • Applications in the chemical & process industries The second industrial era, late 1970s – 1990 • Quality improvement initiatives in many companies • Taguchi and robust parameter design, process robustness The modern era, beginning circa 1990
  • 11. DESIGN OF EXPERIMENTS By Dr. Virendra Kumar, (Ph.D IITD) Email: virendra.kumar@niet.co.in Web site: https://sites.google.com/view/virendra Experimental Design (or DOE) economically maximizes information LECTURE 2
  • 12. DOE Approaches? Two of the most common approaches to DOE are a full factorial DOE and a fractional factorial DOE. Full factorial DOE: is to determine at what settings of your process inputs will you optimize the values of your process outcomes. • Which combination of machine speed, fill speed, and carbonation level will give you the most consistent fill? • The experimentation using all possible factor combinations is called a full factorial design. • These combinations are called Runs. • With three variables, machine speed, fill speed, and carbonation level, how many different unique combinations would you have to test to explore all the possibilities?
  • 13. 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑅𝑢𝑛𝑠 = 2𝑘 where k is the number of variables and 2 is the number of levels, such as (High/Low) or (100 ml per minute/200 ml per minute. • What if you aren’t able to run the entire set of combinations of a full factorial? What if you have monetary or time constraints, or too many variables? • This is when you might choose to run a fractional factorial, also referred to as a screening DOE, which uses only a fraction of the total runs. • That fraction can be one-half, one-quarter, one-eighth, and so forth depending on the number of factors or variables. • While there is a formula to calculate the number of runs, suffice it to say you can just calculate your full factorial runs and divide by the fraction that you and your Black Belt or Master Black Belt determine is best for
  • 14. Factorial Designs Example • In a factorial experiment, all possible combinations of factor levels can be tested. • The golf experiment: • Type of driver • Type of ball • Walking vs. riding • Type of beverage • Time of round • Weather • Type of golf spike • Etc, etc, etc
  • 15. Factorial Designs Example • Consider the golf experiment and suppose that only two factors, type of driver and type of ball, are of interest. • Figure shows a two-factor factorial experiment for studying the joint effects of these two factors on golf score. • Notice that this factorial experiment has both factors at two levels and that all possible combinations of the two factors across their levels are used in the design. • Geometrically, the four runs form the corners of a square. • This particular type of factorial experiment is called a 22 factorial design (two factors, each at two levels). • Because I can reasonably expect to play eight rounds of golf to investigate these factors, a reasonable plan would be to play two rounds of golf at each combination of factor levels shown in Figure. • An experimental designer would say that we have replicated the design twice. • This experimental design would enable the experimenter to investigate the individual effects of each factor (or the main Fig. A two-factor factorial experiment involving type of driver and type of ball
  • 16. • The scores from each round of golf played at the four test combinations are shown at the corners of the square. • Notice that there are four rounds of golf that provide information about using the regular-sized driver and four rounds that provide information about using the oversized driver. Factorial Designs Example • By finding the average difference in the scores on the right- and left-hand sides of the square (as in Figure b), we have a measure of the effect of switching from the oversized driver to the regular-sized driver, or • That is, on average, switching from the oversized to the regular-sized driver increases the score by 3.25 strokes per round. • Figure (a) shows the results of performing the factorial experiment .
  • 17. Factorial Designs Example • Similarly, the average difference in the four scores at the top of the square and the four scores at the bottom measures the effect of the type of ball used (see Figure c): • Finally, a measure of the interaction effect between the type of ball and the type of driver can be obtained by subtracting the average scores on the left-to-right diagonal in the square from the average scores on the right-to-left diagonal (see Figure d), resulting in
  • 18. Conclusion of Factorial Designs Example • The results of this factorial experiment indicate that driver effect is larger than either the ball effect or the interaction. • Statistical testing could be used to determine whether any of these effects differ from zero. • In fact, it turns out that there is reasonably strong statistical evidence that the driver effect differs from zero and the other two effects do not. • Therefore, this experiment indicates that I should always play with the oversized driver. • This simple Example showed, factorials make the most efficient use of the experimental data. • Notice that this experiment included eight observations, and all eight observations are used to calculate the driver, ball, and interaction effects. • No other strategy of experimentation makes such an efficient use of the data. • This is an important and useful feature of factorials.
  • 19. Benefits of DOE Doing a designed experiment as opposed to using a trial-and-error approach has a number of benefits. 1.Identify the main effects of your factors: A main effect is the impact of a specific variable on your output. 1.Identifying interactions: Interactions occur if the impact of one factor on your response is dependent upon the setting of another factor. 1.You can determine optimal settings for your variables: After analyzing all of your main effects and interactions, you will be able to determine what your settings should be for your factors or variables.
  • 20. Why is DOE important to understand? 1. Choosing Between Alternatives: A common use is planning an experiment to gather data to make a decision between two or more alternatives. Types of comparitive studies 2. Selecting the Key Factors Affecting a Response: Selecting the few that matter from the many possible factors 3. Response Surface Modeling a Process: Some reasons to model a process are below; • Hitting a Target: Often we want to "fine tune" a process to consistently hit a target. • Maximizing or Minimizing a Response: Optimizing a process output is a common goal. • Reducing Variation: Processes that are on target, on the average, may still have too much variability • Making a Process Robust: The less a process or product is affected by external conditions, the better it is - this is called "Robustness“ • Seeking Multiple Goals: Sometimes we have multiple outputs and we have to compromise to achieve desirable outcomes - DOE can help here • Regression Modeling: Regression models are used to fit more precise models
  • 21. Best practices when thinking about DOE • Experiments take planning and proper execution, otherwise the results may be meaningless. Here are a few hints for making sure you properly run your DOE. • Your process variables have different impacts on your output. Some are statistically important, and some are just noise. You need to understand which is which. Carefully identify your variables • Use existing data and data analysis to try and identify the most logical factors for your experiment. • Regression analysis is often a good source of selecting potentially significant factors. Prevent contamination of your experiment • During your experiment, you will have your experimental factors as well as other environmental factors around you that you aren’t interested in
  • 22. • Unless you’ve done some prior screening of your potential factors, you might want to start your DOE with a screening or fractional factorial design. • This will provide information as to potentially significant factors without consuming your whole budget. • Once you’ve identified the best potential factors, you can do a full factorial with the reduced number of factors. Best practices when thinking about DOE Use screening experiments to reduce cost and time
  • 23. What are the steps of DOE? Obtaining good results from a DOE involves these seven steps: DOE Set objectives Select process variables Select an experimental design Execute the design Check that the data are consistent with the experimental assumptions Analyze and interpret the results Use/present the results (may lead to further runs or DOE's).
  • 24. A checklist of practical considerations Important practical considerations in planning and running experiments are Check performance of gauges/measurement devices first. Keep the experiment as simple as possible. Check that all planned runs are feasible. Watch out for process drifts and shifts during the run. Avoid unplanned changes (e.g., swap operators at halfway point). Allow some time (and back-up material) for unexpected events. Obtain buy-in from all parties involved. Maintain effective ownership of each step in the experimental plan. Preserve all the raw data--do not keep only summary averages! Record everything that happens. Reset equipment to its original state after the experiment.
  • 25. DESIGN OF EXPERIMENTS By Dr. Virendra Kumar, (Ph.D IITD) Email: virendra.kumar@niet.co.in Web site: https://sites.google.com/view/virendra Experimental Design (or DOE) economically maximizes information LECTURE 3
  • 26. Basic principles of experimental design randomization replication blocking Sometimes we add the factorial principle to
  • 27. Randomization • Randomization is the cornerstone underlying the use of statistical methods in experimental design. • By randomization we mean that both the allocation of the experimental material and the order in which the individual runs of the experiment are to be performed are randomly determined. • Statistical methods require that the observations (or errors) be independently distributed with random variables. • Randomization usually makes this assumption valid. • By properly randomizing the experiment, we also assist in “averaging out” the effects of extraneous factors that may be present. For example, suppose that the specimens in the hardness experiment are of slightly different thicknesses and that the effectiveness of the quenching medium may be affected by specimen thickness. If all the specimens subjected to the oil quench are thicker than those subjected to the saltwater quench, we may be introducing systematic bias into the experimental results. This bias handicaps one of the quenching media and consequently invalidates our results. Randomly assigning the specimens to the quenching media alleviates this problem.
  • 28. • Computer software programs are widely used to assist experimenters in selecting and constructing experimental designs. • These programs often present the runs in the experimental design in random order. • This random order is created by using a random number generator. • Even with such a computer program, it is still often necessary to assign units of experimental material, operators, gauges or measurement devices, and so forth for use in the experiment. • Sometimes experimenters encounter situations where randomization of some aspect of • the experiment is difficult.
  • 29. Replication • By replication we mean an independent repeat run of each factor combination. For example: In the metallurgical experiment, replication would consist of treating a specimen by oil quenching and treating a specimen by saltwater quenching. Thus, if five specimens are treated in each quenching medium, we say that five replicates have been obtained. • Each of the 10 observations should be run in random order. • Replication has two important properties. First, it allows the experimenter to obtain an estimate of the experimental error. This estimate of error becomes a basic unit of measurement for determining whether observed differences in the data are really statistically different. Second, if the sample mean ( 𝑦) is used to estimate the true mean response for one of the factor levels in the experiment, replication permits the experimenter to obtain a more precise estimate of this parameter. For example; if 𝜎2 is the variance of an individual observation and there are n replicates, the variance of the sample mean is
  • 30. Blocking • Blocking is a design technique used to improve the precision with which comparisons among the factors of interest are made. • Often blocking is used to reduce or eliminate the variability transmitted from nuisance factors—that is, factors that may influence the experimental response but in which we are not directly interested. For example, an experiment in a chemical process may require two batches of raw material to make all the required runs. However, there could be differences between the batches due to supplier-to-supplier variability, and if we are not specifically interested in this effect, we would think of the batches of raw material as a nuisance factor. • Generally, a block is a set of relatively homogeneous experimental conditions. • In the chemical process example, each batch of raw material would form a block, because the variability within a batch would be expected to be smaller than the variability between batches. • Typically, as in this example, each level of the nuisance factor becomes a block. • Then the experimenter divides the observations from the statistical design into groups
  • 31. Guidelines for Designing Experiments To use the statistical approach in designing and analyzing an experiment, it is necessary for everyone involved in the experiment to have a clear idea in advance of exactly what is to be studied, how the data are to be collected, and at least a qualitative understanding of how these data are to be analyzed. STEP 1: Recognition of and statement of the problem. • to realize that a problem requiring experimentation exists. • to develop a clear and generally accepted statement for problem. • It is important to solicit input from all concerned parties: engineering, quality assurance, manufacturing, marketing, management, customer, and operating personnel. • It will be helpful if prepare a list of specific problems or questions that are to be addressed by the experiment. • Keep always the overall objectives of the experiment in mind. • running experiments and each type of experiment will generate its own list of specific questions that need to be addressed.
  • 32. Guidelines for Designing Experiments There are several broad reasons for running experiments some are follows: STEP 1: Recognition of and statement of the problem Factor screening or characterization:which factors have the most influence on the response(s) of interest. Optimization: find the settings or levels of the important factors that result in desirable values of the response.. Confirmation: to verify that the system operates or behaves in a manner that is consistent with some theory or past experience. Discovery:In discovery experiments, the experimenters are usually trying to determine what happens when we explore new materials, or new factors, or new ranges for factors. Robustness:what conditions do the response variables of interest seriously degrade? Or what conditions would lead to unacceptable variability in the response
  • 33. STEP 2: Selection of the response variable Guidelines for Designing Experiments • In selection of response variable, the experimenter should be certain that this variable really provides useful information about the process under study. • Most often, the average or standard deviation (or both) of the measured characteristic will be the response variable. • Multiple responses are not unusual. • The experimenters must decide how each response will be measured, and address issues such as how will any measurement system be calibrated and how this calibration will be maintained during the experiment. • The gauge or measurement system capability (or measurement error) is also an important factor. • It is usually critically important to identify issues related to defining the responses of interest and how they are to be measured before conducting the experiment. • Sometimes designed experiments are employed to study and improve the performance of measurement systems.
  • 34. STEP 3: Choice of factors, levels, and range Guidelines for Designing Experiments • The experimenter should discover that these factors are either potential design factors or nuisance factors. • Further classify; helpful design factors, held-constant factors, and allowed-to-vary factors. • The design factors are the factors actually selected for study in the experiment. • Held-constant factors are variables that may exert some effect on the response, but for purposes of the present experiment these factors are not of interest, so they will be held at a specific level. • An allowed-to-vary factors, the experimental units or the “materials” to which the design factors are applied are usually nonhomogeneous, yet we often ignore this unit-to-unit variability and rely on randomization to balance out any material or experimental unit effect. • We often assume that the effects of held-constant factors and allowed-to- vary factors are relatively small.
  • 35. STEP 3: Choice of factors, levels, and range Guidelines for Designing Experiments • When the objective of the experiment is factor screening or process characterization, it is usually best to keep the number of factor levels low (Generally two levels). cause-and-effect diagram (fishbone diagram):It is a useful technique for organizing some of the information generated in pre-experimental planning. FIGURE: A cause-and-effect diagram for the etching process experiment FIGUREA: cause-and-effect diagram for the CNC
  • 36. STEP 4: Choice of experimental design. Guidelines for Designing Experiments • Choice of design involves consideration of sample size (number of replicates), selection of a suitable run order for the experimental trials, and determination of whether or not blocking or other randomization restrictions are involved. • There are several interactive statistical software packages that support this phase of experimental design. • The experimenter can enter information about the number of factors, levels, and ranges, and these programs will either present a selection of designs for consideration or recommend a particular design. • We usually prefer to see several alternatives instead of relying entirely on a computer recommendation in most cases. • Most software packages also provide some diagnostic information about how each design will perform and helps in finding best design alternative. • These programs will usually also provide a worksheet (with the orderof the runs randomized) for use in conducting the experiment.
  • 37. STEP 4: Choice of experimental design. Guidelines for Designing Experiments • Design selection also involves thinking about and selecting a tentative empirical model to describe the results. • The model is just a quantitative relationship (equation) between the response and the important design factors. • In many cases, a low-order polynomial model will be appropriate. • A first-order model in two variables is where y is the response, the x’s are the design factors, the x’s are unknown parameters that will be estimated from the data in the experiment, and ε is a random error term that accounts for the experimental error in the system that is being studied. • The first-order model is also sometimes called a main effects model. First-order models are used extensively in screening or characterization experiments.
  • 38. • A common extension of the first-order model is to add an interaction term, say STEP 4: Choice of experimental design. Guidelines for Designing Experiments where the cross-product term x1x2 represents the two-factor interaction between the design factors. • Interactions between factors is relatively common, the first order model with interaction is widely used. • Higher-order interactions can also be included in experiments with more than two factors if necessary. • Another widely used model is the second-order model • Second-order models are often used in optimization experiments • In selecting the design, it is important to keep the experimental objectives in mind.
  • 39. STEP 5: Performing the experiment Guidelines for Designing Experiments • When running the experiment, it is vital to monitor the process carefully to ensure that everything is being done according to plan. • Errors in experimental procedure at this stage will usually destroy experimental validity. • One of the most common mistakes is that the people conducting the experiment failed to set the variables to the proper levels on some runs. • Someone should be assigned to check factor settings before each run. • Up-front planning to prevent mistakes like this is crucial to success. • It is easy to underestimate the logistical and planning aspects of running a designed experiment in a complex manufacturing or research and development environment. • Coleman and Montgomery (1993) suggest that prior to conducting the experiment a few trial runs or pilot runs are often helpful. • These runs provide information about consistency of experimental material, a check on the measurement system, a rough idea of experimental error, and a chance to practice the overall experimental technique.
  • 40. STEP 6: Statistical analysis of the data Guidelines for Designing Experiments • Statistical methods should be used to analyze the data so that results and conclusions are objective rather than judgmental in nature. • There are many excellent software packages designed to assist in data analysis, and many of the programs used in step 4 to select the design provide a seamless, direct interface to the statistical analysis. • Often, we find that simple graphical methods play an important role in data analysis and interpretation. • It also helps in stablishing results of many experiments in terms of an empirical model. • Statical methods only provides guidelines as to the reliability and validity of results. • When properly applied, statistical methods do not allow anything to be proved experimentally, but they do allow us to measure the likely. • The primary advantage of statistical methods is that they add objectivity to the decision-making process.
  • 41. STEP 7: Conclusions and recommendations Guidelines for Designing Experiments • Once the data have been analyzed, the experimenter must draw practical conclusions about the results and recommend a course of action. • Graphical methods are often useful in this stage, particularly in presenting the results to others. • Follow-up runs and confirmation testing should also be performed to validate the conclusions from the experiment. • The experimentation is iterative process. It is usually a major mistake to design a single, large, comprehensive experiment at the start of a study. • A successful experiment requires knowledge of the important factors, the ranges over which these factors should be varied, the appropriate number of levels to use, and the proper units of measurement for these variables. • Generally, we do not perfectly know the answers to these questions, but we learn about them as we go along. • As an experimental program progresses, we often drop some input variables, add others, change the region of exploration for some factors, or add new
  • 42. STEP 7: Conclusions and recommendations Guidelines for Designing Experiments • Consequently, we usually experiment sequentially, and as a general rule, no more than about 25 percent of the available resources should be invested in the first experiment. • This will ensure that sufficient resources are available to perform confirmation runs and ultimately accomplish the final objective of the experiment. • Finally, it is important to recognize that all experiments are designed experiments. • The important issue is whether they are well designed or not. • Good pre-experimental planning will usually lead to a good, successful experiment.
  • 43. DESIGN OF EXPERIMENTS By Dr. Virendra Kumar, (Ph.D IITD) Email: virendra.kumar@niet.co.in Web site: https://sites.google.com/view/virendra Experimental Design (or DOE) economically maximizes information LECTURE 4
  • 44. Concepts of random variable • Random means are unpredictable. Hence, a random variable means a variable whose future value is unpredictable despite knowing its past performance. • A random variable is a variable whose possible values are the numerical outcomes of a random experiment. • Therefore, it is a function which associates a unique numerical value with every outcome of an experiment. • Further, its value varies with every trial of the experiment. • For example, when you toss an unbiased coin, the outcome can be a head or a tail. Even if you keep tossing the coin indefinitely, the outcomes are either of the two. Also, you would never know the outcome in advance. Random Experiment: A random experiment is a process which leads to an uncertain outcome. • Usually, it is assumed that the experiment is repeated indefinitely under homogeneous conditions. • While the result of a random experiment is not unique, it is one of the possible
  • 45. • In a random experiment, the outcomes are not always numerical. • But we need numbers as outcomes for calculations. • Therefore, we define a random variable as a function which associates a unique numerical value with every outcome of a random experiment. • For example, in the case of the tossing of an unbiased coin, if there are 3 trials, then the number of times a ‘head’ appears can be a random variable. This has values 0, 1, 2, or 3 since, in 3 trials, you can get a minimum of 0 heads and a maximum of 3 heads. Concepts of random variable
  • 46. • Classify of random variables are based on their probability distribution. • A random variable either has an associated probability distribution (Discrete Random Variable), or a probability density function (Continuous Random Variable). • Therefore, we have two types of random variables – Discrete and Continuous. Types of Random variables Discrete Random Variables: • Discrete random variables take on only a countable number of distinct values. • Usually, these variables are counts (not necessarily though). • If a random variable can take only a finite number of distinct values, then it is discrete. • Number of members in a family, number of defective light bulbs in a box of 10 bulbs, etc. are some examples of discrete random variables. • The probability distribution of these variables is a list of probabilities associated with each of its possible values. • It is also called the probability function or the probability mass function.
  • 47. Types of Random variables Example of Discrete Random Variables • You toss a coin 10 times. The random variable X is the number of times you get a ‘tail’. X can only take values 0, 1, 2, … , 10. Therefore, X is a discrete random variable. • Let’s look at the probability of getting 8 tails. • p8 (probability of getting 8 tails) falls in the range 0 to 1. Also, the sum of probabilities for all possible values of tails p0 + p1 + … p10 = 1. If a random variable (X) takes ‘k’ different values, with the probability that X = xi is defined as P(X = xi) =pi, then it must satisfy the following: 0 < pi < 1 (for each ‘i’) p1 + p2 + p3 + … + pk = 1
  • 48. Types of Random variables Continuous Random Variables: • Continuous random variables take up an infinite number of possible values which are usually in a given range. • Typically, these are measurements like weight, height, the time needed to finish a task, etc. • For example, the life of an individual in a community is a continuous random variable. Let’s say that the average lifespan of an individual in a community is 110 years. Therefore, a person can die immediately on birth (where life = 0 years) or after he attains an age of 110 years. Within this range, he can die at any age. Therefore, the variable ‘Age’ can take any value between 0 and 110. • Hence, continuous random variables do not have specific values since the number of values is infinite. • Also, the probability at a specific value is almost zero. • However, there is always a non-negative probability that a certain outcome will lie within the interval between two values.
  • 49. Probability • Probability means possibility. • Probability is a measure of the likelihood of an event to occur. • Many events cannot be predicted with total certainty. • We can predict only the chance of an event to occur i.e. how likely they are to happen, using it. • Probability can range in from 0 to 1, where 0 means the event to be an impossible one and 1 indicates a certain event. • The probability of all the events in a sample space adds up to 1. The best example for understanding probability is flipping a coin: There are two possible outcomes—heads or tails. What’s the probability of the coin landing on Heads? We can find out using the equation P(H) = ?, You might intuitively know that the likelihood is half/half, or 50%. Probability= 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝑇𝑜𝑡𝑎𝑙 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠
  • 50. Probability Density Function (PDFs)/Density of a continuous random variable Height (%) Expected amount of rain (inch) tomorrow 0 1 2 3 4 5 0. 5 𝑌 = 𝑒𝑥𝑎𝑐𝑡 𝑎𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝑟𝑎𝑖𝑛 𝑡𝑜𝑚𝑜𝑟𝑟𝑜𝑤 P(Y=2)=0.5 ? What is a probability random variable (Y) is exactly =2 inch. Not 2.01/2.0001 or 1.99/1.9999 Not even we have tool which can measure exactly 2 inch Probability to have almost 2 inch is 𝑃 = ( 𝑌 − 2 <0.1 (tolerance)) Probability P(1.9<Y>2.1 Area under the curve is Probability Density function F(x) 𝑃 = ( 𝑌 − 2 <0.1= 1.9 2.1 𝑓 𝑥 𝑑𝑥
  • 51. • It is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can be interpreted as providing a relative likelihood that the value of the random variable would be close to that sample. • In other words, while the absolute likelihood for a continuous random variable to take on any particular value is 0 (since there is an infinite set of possible values to begin with), the value of the PDF at two different samples can be used to infer, in any particular draw of the random variable, how much more likely it is that the random variable would be close to one sample compared to the other sample. random variable
  • 52. DESIGN OF EXPERIMENTS By Dr. Virendra Kumar, (Ph.D IITD) Email: virendra.kumar@niet.co.in Web site: https://sites.google.com/view/virendra Experimental Design (or DOE) economically maximizes information LECTURE 5
  • 53. Cumulative Distribution Function(CDF) • It is used to calculate the area under the curve to the left from a point to interest. • It is used to evaluate the accumulated probability. • For continuous probability distributions, the probability=area under the curve. Total area=1 • The Probability distribution function (PDF) is f(x) which describes the shape of the distribution (uniform, exponential or normal distribution). Let Uniform distribution F(x)= 1 𝑏−𝑎 F(x) a b PDF F(x) a b Area=Base × Hight = (x-a) × F(x) x
  • 54. Cumulative Distribution Function (CDF) 𝐴𝑟𝑒𝑎 = 𝑥 − 𝑎 × 1 (𝑏 − 𝑎) 𝐴𝑟𝑒𝑎 𝑙𝑒𝑓𝑡 𝑓𝑟𝑜𝑚 𝑝𝑜𝑖𝑛𝑡 𝑜𝑓 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡 = 𝑃 𝑋 ≤ 𝑥 = 𝑥 − 𝑎 𝑏 − 𝑎 𝑃 𝑋 ≤ 𝑥 = 𝑥 − 𝑎 𝑏 − 𝑎 It is called CDF Let exponential distribution λ Value decreases with time PDF=f(x)=𝜆𝑒−𝜆𝑥 =𝜆 = 1 𝜇 Shape of graph CDF=Area 𝐴𝑙 λ x 𝐴𝑙 = 𝑃 𝑋 ≤ 𝑥 = 1 − 𝑒−𝜆𝑥 CD F
  • 55. Cumulative Distribution Function (CDF) λ • 𝐴𝑡 = 1 • 𝐴𝑙 + 𝐴𝑟 = 1 • 1 − 𝑒−𝜆𝑥 + 𝐴𝑟 = 1 • 𝐴𝑟= 𝑒−𝜆𝑥
  • 56. Cumulative Distribution Function (CDF) a b Area=P(a<x<b)=P(x<b) - P(x<a) CDF= P(a<x<b)= 1 − 𝑒−λ𝑏 −[1-𝑒−𝜆𝑎] Area left to b Area left to a Remember that P(x≤a≤b=P(a<x<b) for continuous probability distribution P(x=a)=0 because x=a is only line which has only hight but no width.
  • 57. Sampling and Sampling Distributions • A sampling distribution is a statistic that is arrived out through repeated sampling from a larger population • It describes a range of possible outcomes that of a statistic, such as the mean or mode of some variable, as it truly exists a population. • The majority of data analyzed by researchers are actually drawn from samples, and not populations. • In statistics, a population is the entire pool from which a statistical sample is drawn. • A population may refer to an entire group of people, objects, events, hospital visits, or measurements.
  • 58. Understanding Sampling Distribution • A lot of data drawn and used by academicians, statisticians, researchers, marketers, analysts, etc. are actually samples, not populations. A sample is a subset of a population. • For example, a medical researcher that wanted to compare the average weight of all babies born in Uttar Pradesh from 1995 to 2005 to those born in Delhi within the same time period cannot within a reasonable amount of time draw the data for the entire population of over a million childbirths that occurred over the ten-year time frame. • He will instead only use the weight of, say, 100 babies, in each continent to make a conclusion. • The weight of 200 babies used is the sample and the average weight calculated is the sample mean. • Now suppose that instead of taking just one sample of 100 newborn weights from each continent, the medical researcher takes repeated random samples from the general population, and
  • 59. • The average weight computed for each sample set is the sampling distribution of the mean. • Not just the mean can be calculated from a sample. Other statistics, such as the standard deviation, variance, proportion, and range can be calculated from sample data. • The standard deviation and variance measure the variability of the sampling distribution. • The number of observations in a population, the number of observations in a sample and the procedure used to draw the sample sets determine the variability of a sampling distribution. • The standard deviation of a sampling distribution is called the standard error. • While the mean of a sampling distribution is equal to the mean of the population, the standard error depends on the standard deviation of the population, the size of the population and the size of the sample. Understanding Sampling Distribution
  • 60. • For example, suppose that y1, y2, . . . , yn represents a sample. and the sample variance Then the sample mean • These quantities are measures of the central tendency and dispersion of the sample, respectively. • Sometimes 𝑠 = 𝑠2 , called the sample standard deviation, is used as a measure of dispersion. • Experimenters often prefer to use the standard deviation to measure dispersion because its units are the same as those for the variable of interest y.
  • 61. DESIGN OF EXPERIMENTS By Dr. Virendra Kumar, (Ph.D IITD) Email: virendra.kumar@niet.co.in Web site: https://sites.google.com/view/virendra Experimental Design (or DOE) economically maximizes information LECTURE 6
  • 62. Measures of Central Tendency: Mean, Median, and Mode • A measure of central tendency is a summary statistic that represents the center point or typical value of a dataset. • These measures indicate where most values in a distribution fall and are also referred to as the central location of a distribution. • You can think of it as the tendency of data to cluster around a middle value. • In statistics, the three most common measures of central tendency are the mean, median, and mode. • Each of these measures calculates the location of the central point using a different method. • Choosing the best measure of central tendency depends on the type of data you have.
  • 63. • In each distribution, look for the region where the most common values fall. • Even though the shapes and type of data are different, you can find that central location. • That’s the area in the distribution where the most common values are located. • The three distributions below represent different data conditions.
  • 64. Mean • The mean is the arithmetic average, and it is probably the measure of central tendency that you are most familiar. • Calculating the mean is very simple. You just add up all of the values and divide by the number of observations in your dataset. • The calculation of the mean incorporates all values in the data. If you change any value, the mean changes. • However, the mean doesn’t always locate the center of the data accurately. • Observe the histograms where I showed the mean in the distributions. • Extreme values in an extended tail pull the mean away from the center.
  • 65. Median • The median is the middle value. • It is the value that splits the dataset in half. • To find the median, order your data from smallest to largest, and then find the data point that has an equal amount of values above it and below it. • The method for locating the median varies slightly depending on whether your dataset has an even or odd number of values. • When there is an even number of values, you The average of 27 and 29 is 28. Consequently, 28 is the median of this dataset. count in to the two innermost values and then take the average. • In the examples, I used whole numbers for simplicity, but you can have decimal places. • In the dataset with the odd number of observations, notice how the number 12 has six values above it and six below it. Therefore, 12 is the median of this dataset.
  • 66. • Outliners and skewed data have a smaller effect on the median. • For example: we have the Median dataset below and find that the median is 46. However, we discover data entry errors and need to change four values, which are shaded in the Median Fixed dataset. We’ll make them all significantly higher so that we now have a skewed distribution with large outliers. • As you can see, the median doesn’t change at all. It is still 46. • Unlike the mean, the median value doesn’t depend on all the values in the dataset. • Consequently, when some of the values are more extreme, the effect on the median is smaller. • Of course, with other types of changes, the median can change. • When you have a skewed distribution, the median is a better measure of central tendency than the mean.
  • 67. Comparing the mean and median • In a symmetric distribution, the mean and median both find the center accurately. They are approximately equal. • In a skewed distribution, the outliers in the tail pull the mean away from the center towards the longer tail. For this example, the mean and median differ by over 9000, and the median better represents the central tendency for the distribution.
  • 68. Mode • The mode is the value that occurs the most frequently in your data set. • On a bar chart, the mode is the highest bar. • If the data have multiple values that are tied for occurring the most frequently, you have a multimodal distribution. • If no value repeats, the data do not have a mode. • In the dataset, the value 5 occurs most frequently, which makes it the mode. These data might represent a 5-point Likert scale. • Typically, you use the mode with categorical, ordinal, and discrete data. • In fact, the mode is the only measure of central tendency that you can use with categorical data—such as the most preferred flavor of ice cream. • However, with categorical data, there isn’t a central value because you can’t order the groups. • With ordinal and discrete data, the mode can be a value that is not in the center. Again, the mode represents the most common value.
  • 69. When should you use the mean, median or mode?
  • 70. Confidence Level: What is it? • When a poll is reported in the media, a confidence level is often included in the results. • For example, a survey might report a 95 percent confidence level. But what exactly does this mean? At first glance you might think that it means it’s 95 percent accurate. That’s close to the truth, but like many things in statistics, it’s actually a little more defined. • It is often expressed as a % whereby a population mean lies between an upper and lower interval. • Due to natural sampling variability, the sample mean (center of the CI) will vary from sample to sample.
  • 71. • As the sample size increases, the range of interval values will narrow, meaning that you know that mean with much more accuracy compared with a smaller sample • Accordingly, there is a 5% chance that the population mean lies outside of the upper and lower confidence interval (as illustrated by the 2.5% of outliers on either side of the 1.96 z-scores).
  • 72. Why do researchers use confidence intervals? • It is more or less impossible to study every single person in a population so researchers select a sample or sub-group of the population. • This means that the researcher can only estimate the parameters (i.e. characteristics) of a population, the estimated range being calculated from a given set of sample data. • Therefore, a confidence interval is simply a way to measure how well your sample represents the population you are studying. • The probability that the confidence interval includes the true mean value within a population is called the confidence level of the CI. • You can calculate a CI for any confidence level you like, but the most commonly used value is 95%. A 95% means you can be 95% certain.
  • 73. Factors that Affect Confidence Intervals (CI) • Population size: this does not usually affect the CI but can be a factor if you are working with small and known groups of people. • Sample Size: the smaller your sample, the less likely it is you can be confident the results reflect the true population parameter. • Percentage: Extreme answers come with better accuracy. For example, if 99 percent of voters are for gay marriage, the chances of error are small. However, if 49.9 percent of voters are “for” and 50.1 percent are “against” then the chances of error are bigger.
  • 74. 0% and 100% Confidence Level • A 0% confidence level means you have no faith at all that if you repeated the survey that you would get the same results. • A 100% confidence level means there is no doubt at all that if you repeated the survey you would get the same results. • In reality, you would never publish the results from a survey where you had no confidence at all that your statistics were accurate (you would probably repeat the survey with better techniques). • A 100% confidence level doesn’t exist in statistics, unless you surveyed an entire population — and even then you probably couldn’t be 100 percent sure that your survey wasn’t open to some kind or error or bias. • The confidence coefficient is the confidence level stated as a proportion, rather than as a percentage. For example, if you had a confidence level of 99%, the confidence coefficient would be 99.
  • 75. How do I calculate a confidence interval? • To calculate the confidence interval, start by computing the mean and standard error of the sample. • Remember, you must calculate an upper and low score for the confidence interval using the z- score for the chosen confidence level (see table below). Confidence Interval Formula Where: x is the mean z is the chosen Z-value (1.96 for 95%) s is the standard error n is the sample size
  • 76. An Example •𝑿 (mean) = 86 •Z = 1.960 (from the table above for 95%) •s (standard error) = 6.2 •n (sample size) = 46 Lower Value: 86 - 1.960 × 6.2 √46 = 86 - 1.79 = 84.21 Upper Value: 86 + 1.960 × 6.2 √46 = 86 + 1.79 = 87.79 So the population mean is likely to be between 84.21 and 87.79

Editor's Notes

  1. What is Hypothesis? Hypothesis is an assumption that is made on the basis of some evidence. This is the initial point of any investigation that translates the research questions into a prediction. It includes components like variables, population and the relation between the variables. A research hypothesis is a hypothesis that is used to test the relationship between two or more variables.
  2. It is common to begin with a process model of the `black box' type, with several discrete or continuous input factors that can be controlled--that is, varied at will by the experimenter--and one or more measured output responses. Experimental data are used to derive an empirical (approximation) model linking the outputs and inputs. These empirical models generally contain first and second-order terms. Often the experiment has to account for a number of uncontrolled factors that may be discrete, such as different machines or operators, and/or continuous such as ambient temperature or humidity. The objectives of the experiment may include the following: Determining which variables are most influential on the response y Determining where to set the influential x’s so that y is almost always near the desired nominal value Determining where to set the influential x’s so that variability in y is small Determining where to set the influential x’s so that the effects of the uncontrollable variables z1, z2, . . . , zq are minimized.
  3. Developed by Ronald Fisher, ANOVA stands for Analysis of Variance. One-Way Analysis of Variance tells you if there are any statistical differences between the means of three or more independent groups.
  4. Full factorial: A full factorial design consists of all. possible factor combinations in a test, and, most importantly, varies the factors simultaneously rather. than one factor at a time. Fractional Factorial: A fractional factorial design is a reduced version of the full factorial design, meaning only a fraction of the runs are used. A fractional factorial design allows for a more efficient use of resources as it reduces the sample size of a test, but it comes with a tradeoff ( a balance achieved between two desirable) in information. if your output is the fill level of a bottle of carbonated drink, and your primary process variables are machine speed, fill speed, and carbonation level, then what combination of those factors will give you the desired consistent fill level of the bottle?
  5. Regression analysis:
  6. In an experiment, an extraneous variable is any variable that you're not investigating that can potentially affect the outcomes of your research study.
  7. The potential design factors are those factors that the experimenter may wish to vary in the experiment. A controllable nuisance factor is one whose levels may be set by the experimenter. For example, the experimenter can select different batches of raw material or different days of the week when conducting the experiment.
  8. The potential design factors are those factors that the experimenter may wish to vary in the experiment. A controllable nuisance factor is one whose levels may be set by the experimenter. For example, the experimenter can select different batches of raw material or different days of the week when conducting the experiment.
  9. Source link: https://youtu.be/Fvi9A_tEmXQ
  10. Video source: https://www.youtube.com/watch?v=3xAIWiTJCvE