Experimental Design Fundamentals

Experimental design
Introduction :
The design of experiments (DOE, DOX, or experimental design) is the design of any task that
aims to describe or explain the variation of information under conditions that are hypothesized to
reflect the variation. The term is generally associated with true experiments in which the design
introduces conditions that directly affect the variation, but may also refer to the design of quasi-
experiments, in which natural conditions that influence the variation are selected for observation.
In its simplest form, an experiment aims at predicting the outcome by introducing a change of the
preconditions, which is reflected in a variable called the predictor (independent). The change in
the predictor is generally hypothesized to result in a change in the second variable, hence called
the outcome (dependent) variable. Experimental design involves not only the selection of suitable
predictors and outcomes, but planning the delivery of the experiment under statistically optimal
conditions given the constraints of available resources.
Main concerns in experimental design include the establishment of validity, reliability, and
replicability. For example, these concerns can be partially addressed by carefully choosing the
predictor, reducing the risk of measurement error, and ensuring that the documentation of the
method is sufficiently detailed. Related concerns include achieving appropriate levels of
statistical power and sensitivity.
What is Experimental Design?
Experimental design is a way to carefully plan experiments in advance so that your results are
both objective and valid. Ideally, your experimental design should:
Describe how participants are allocated to experimental groups. A common method is completely
randomized design, where participants are assigned to groups at random. A second method is
randomized block design, where participants are divided into homogeneous blocks (for example,
age groups) before being randomly assigned to groups.
Minimize or eliminate confounding variables, which can offer alternative explanations for the
experimental results.
Allow you to make inferences about the relationship between independent variables and
dependent variables. Reduce variability, to make it easier for you to find differences in treatment
outcomes.

Types of Experimental Design:
a. Between Subjects Design.
b. Completely Randomized Design.
c. Factorial Design.
d. Matched-Pairs Design.
e. Observational Study
f. Longitudinal Research
g. Cross Sectional Research
h. Pretest-Posttest Design.
i. Quasi-Experimental Design.
j. Randomized Block Design.
k. Randomized Controlled Trial
l. Within subjects Design.
Completely randomized design(CRD):
In the design of experiments, completely randomized designs are for studying the effects of one
primary factor without the need to take other nuisance variables into account. This article
describes completely randomized designs that have one primary factor. The experiment compares
the values of a response variable based on the different levels of that primary factor. For
completely randomized designs, the levels of the primary factor are randomly assigned to the
experimental units
Description of the Design:
-Simplest design to use.
-Design can be used when experiment al units are essentially homogeneous.
-Because of the homogeneity requirement, it may be difficult to use this design for field
experiments.
-The CRD is best suited for experiments with a small number of treatments
Randomization Procedure:
-Treatments are assigned to experimental units completely at random.
-Every experimental unit has the sameprobability of receiving any treatment.
-Randomization is performed using a random number table, computer, program, etc.

Example of Randomizatin:
-Given you have 4 treatments (A, B, C, and D) and 5 replicates, how many experimentalunits
would you have?
-Note that there is no “blocking” ofexperimental units into replicates.
-Every experimental unit has the sameprobability of receiving any treatment.
Advantages of a CRD:
1.Very flexible design (i.e. number of treatments and replicates is only limited by the available
number ofexperimental units).
2.Statistical analysis is simple compared to other designs.
3.Loss of information due to missing data is small compared to other designs due to the larger
number of degrees of freed om for the error source of variation.
Disadvantages of a CRD:
1.If experimental units are not homogeneous and you fail to minimize this variation using
blocking, there may be a loss of precision.
2.Usually the least efficient design unless experimental units are homogeneous.
3.Not suited for a large number of treatments.
Randomized Complete Block Design(RCBD):
in Randomized Complete Design (CRD), there is no restriction on the allocation of the treatments
to experimental units. But in practical life there are situations where there is relatively large
variability in the experimental material, it is possible to make blocks (in simpler sense groups) of

the relatively homogeneous experimental material or units. The design applied in such situations
is named as Randomized Complete Block Design (RCBD).
Actually RCBD is a one restrictional design, used to control a variable which is influence the
response variable. The main aim of the restriction is to control the variable causing the
variability in response. Efforts of blocking is done to create the situation of homogeneity within
block. A blocking is a source of variability. An example of blocking factor might be the gender
of a patient (by blocking on gender), this is source of variability controlled for, leading to
greater accuracy. RCBD is a mixed model in which a factor is fixed and other is random. The
main assumption of the design is that there is no contact between the treatment and block effect.
Randomized Complete Block design is said to be complete design because in this design the
experimental units and number of treatments are equal. Each treatment occurs in each block.
The general model is defined as:
Yij=μ+ηi+ξj+eij
where i=1,2,3⋯,t and j=1,2,⋯,b with t treatments and b blocks. μ is the overall mean based on
all observations, ηi is the effect of the ith treatment response, ξ is the effect of jth block and eij
is the corresponding error term which is assumed to be independent and normally distributed
with mean zero and constant variance.
The main objective of blocking is to reduce the variability among experimental units within a
block as much as possible and to maximize the variation among blocks; the design would not
contribute to improve the precision in detecting treatment differences.
Definition:
The Randomized Complete Block Design may be defined as the design in which the
experimental material is divided into blocks/groups of homogeneous experimental units
(experimental units have same characteristics) and each block/group contains a complete set of
treatments which are assigned at random to the experimental units
Randomized Complete Block Design Experimental Layout:
Suppose there are t treatments and r blocks in a randomized complete block design, then each
block contains homogeneous plots one of each treatment. An experimental layout for such a
design using four treatments in three blocks be as follows.

Block 1 Block 2 Block 3
A B C
B C D
C D A
D A B
From RCBD layout we can see that:
 The treatments are assigned at random within blocks of adjacent subjects and each of the
treatment appears once in a block.
 The number of block represents the number of replications
 Any treatment can be adjacent to any other treatment, but not to the same treatment
within the block.
 Variation in an experiment is controlled by accounting spatial effects.
Advantages of the RCBD:
1.Generally more precise than the CRD.
2.No restriction on the number of
treatments or replicates.
3.Some treatments may be replicated more times than others.
4.Missing plots are easily estimated.
5.Whole treatments or entire replicatesmay be deleted from the analysis.
6.If experimental error is heterogeneous,valid comparisons can still be made.
Disadvantages of the RCBD:
1.Error df is smaller than that for the CRD (problem with a small number oftreatments).
2.If there is a large variation between experimental unitswithin a block, a large errorterm may
result (this may be due to too many treatments).
3.If there are missing data, a RCBD experiment may be less efficient than a CRD
NOTE: The most important item to consider when choosing a design is theuniformity of
the experimental units.

Hypothesis Testing
What is a hypothesis test?
A hypothesis test is a statistical test that is used to determine whether there is enough evidence in
a sample of data to infer that a certain condition is true for the entire population.
A hypothesis test examines two opposing hypotheses about a population: the null hypothesis and
the alternative hypothesis. The null hypothesis is the statement being tested. Usually the null
hypothesis is a statement of "no effect" or "no difference". The alternative hypothesis is the
statement you want to be able to conclude is true.
Based on the sample data, the test determines whether to reject the null hypothesis. You use a p-
value, to make the determination. If the p-value is less than or equal to the level of significance,
which is a cut-off point that you define, then you can reject the null hypothesis.
A common misconception is that statistical hypothesis tests are designed to select the more likely
of two hypotheses. Instead, a test will remain with the null hypothesis until there is enough
evidence (data) to support the alternative hypothesis.
Examples of questions you can answer with a hypothesis test include:
 Does the mean height of undergraduate women differ from 66 inches?
 Is the standard deviation of their height equal less than 5 inches?
 Do male and female undergraduates differ in height?
Statistical Hypotheses:
The best way to determine whether a statistical hypothesis is true would be to examine the entire population.
Since that is often impractical, researchers typically examine a random sample from the population. If sample
data are not consistent with the statistical hypothesis, the hypothesis is rejected.
There are two types of statistical hypotheses:
 Null hypothesis. The null hypothesis, denoted by H0, is usually the hypothesis that sample observations
result purely from chance.
 Alternative hypothesis. The alternative hypothesis, denoted by H1 or Ha, is the hypothesis that sample
observations are influenced by some non-random cause.
For example, suppose we wanted to determine whether a coin was fair and balanced. A null hypothesis might be
that half the flips would result in Heads and half, in Tails. The alternative hypothesis might be that the number
of Heads and Tails would be very different. Symbolically, these hypotheses would be expressed as

H0: P = 0.5
Ha: P ≠ 0.5
Suppose we flipped the coin 50 times, resulting in 40 Heads and 10 Tails. Given this result, we
would be inclined to reject the null hypothesis. We would conclude, based on the evidence, that
the coin was probably not fair and balanced.
Significance of hypothesis testing:
Statistics are helpful in analyzing most collections of data. This is equally true of hypothesis
testing which can justify conclusions even when no scientific theory exists. In the Lady tasting
tea example, it was "obvious" that no difference existed between (milk poured into tea) and (tea
poured into milk). The data contradicted the "obvious".
Real world applications of hypothesis testing include:
 Testing whether more men than women suffer from nightmares
 Establishing authorship of documents
 Evaluating the effect of the full moon on behavior
 Determining the range at which a bat can detect an insect by echo
 Deciding whether hospital carpeting results in more infections
 Selecting the best means to stop smoking
 Checking whether bumper stickers reflect car owner behavior
 Testing the claims of handwriting analysts
Statistical hypothesis testing plays an important role in the whole of statistics and in statistical
inference. For example, Lehmann (1992) in a review of the fundamental paper by Neyman and
Pearson (1933) says: "Nevertheless, despite their shortcomings, the new paradigm formulated in
the 1933 paper, and the many developments carried out within its framework continue to play a
central role in both the theory and practice of statistics and can be expected to do so in the
foreseeable future".
Significance testing has been the favored statistical tool in some experimental social sciences
(over 90% of articles in the Journal of Applied Psychology during the early 1990s).[13] Other
fields have favored the estimation of parameters (e.g., effect size). Significance testing is used as
a substitute for the traditional comparison of predicted value and experimental result at the core
of the scientific method. When theory is only capable of predicting the sign of a relationship, a
directional (one-sided) hypothesis test can be configured so that only a statistically significant
result supports theory. This form of theory appraisal is the most heavily criticized application of
hypothesis testingOne-sample tests are appropriate when a sample is being compared to the
population from a hypothesis. The population characteristics are known from theory or are
calculated from the population.

Two-sample tests:
Two-sample tests are appropriate for comparing two samples, typically experimental and
control samples from a scientifically controlled experiment.
Paired tests:
Paired tests are appropriate for comparing two samples where it is impossible to control important
variables. Rather than comparing two sets, members are paired between samples so the difference
between the members becomes the sample. Typically the mean of the differences is then
compared to zero. The common example scenario for when a paired difference test is appropriate
is when a single set of test subjects has something applied to them and the test is intended to
check for an effect.
Z-tests are appropriate for comparing means under stringent conditions regarding normality and
a known standard deviation.
A t-test is appropriate for comparing means under relaxed conditions (less is assumed).
Tests of proportions are analogous to tests of means (the 50% proportion).
Chi-squared tests use the same calculations and the same probability distribution for different
applications:
 Chi-squared tests for variance are used to determine whether a normal population has a
specified variance. The null hypothesis is that it does.
 Chi-squared tests of independence are used for deciding whether two variables are
associated or are independent. The variables are categorical rather than numeric. It can be
used to decide whether left-handedness is correlated with libertarian politics (or not). The
null hypothesis is that the variables are independent. The numbers used in the calculation
are the observed and expected frequencies of occurrence (from contingency tables).
 Chi-squared goodness of fit tests are used to determine the adequacy of curves fit to data.
The null hypothesis is that the curve fit is adequate. It is common to determine curve shapes
to minimize the mean square error, so it is appropriate that the goodness-of-fit calculation
sums the squared errors.
F-tests (analysis of variance, ANOVA) are commonly used when deciding whether groupings of
data by category are meaningful. If the variance of test scores of the left-handed in a class is much
smaller than the variance of the whole class, then it may be useful to study lefties as a group. The
null hypothesis is that two variances are the same – so the proposed grouping is not meaningful.

One-Tailed and Two-Tailed Tests:
A test of a statistical hypothesis, where the region of rejection is on only one side of the
sampling distribution, is called a one-tailed test. For example, suppose the null hypothesis
states that the mean is less than or equal to 10. The alternative hypothesis would be that the
mean is greater than 10. The region of rejection would consist of a range of numbers located on
the right side of sampling distribution; that is, a set of numbers greater than 10.
A test of a statistical hypothesis, where the region of rejection is on both sides of the sampling
distribution, is called a two-tailed test. For example, suppose the null hypothesis states that the
mean is equal to 10. The alternative hypothesis would be that the mean is less than 10 or greater
than 10. The region of rejection would consist of a range of numbers located on both sides of
sampling distribution; that is, the region of rejection would consist partly of numbers that were
less than 10 and partly of numbers that were greater than 10.
ANOVA
The ANOVA Test:
An ANOVA test is a way to find out if survey or experiment results are significant. In other
words, they help you to figure out if you need to reject the null hypothesis or accept the alternate
hypothesis. Basically, you’re testing groups to see if there’s a difference between them. Examples
of when you might want to test different groups:
 A group of psychiatric patients are trying three different therapies: counseling, medication
and biofeedback. You want to see if one therapy is better than the others.
 A manufacturer has two different processes to make light bulbs. They want to know if one
process is better than the other.
 Students from different colleges take the same exam. You want to see if one college
outperforms the other.
Types of Tests:
There are two main types: one-way and two-way. Two-way tests can be with or without
replication.One-way ANOVA between groups: used when you want to test two groups to see if
there’s a difference between them.
Two way ANOVA without replication: used when you have one group and you’re double-testing
that same group. For example, you’re testing one set of individuals before and after they take a
medication to see if it works or not.

Two way ANOVA with replication: Two groups, and the members of those groups are doing
more than one thing. For example, two groups of patients from different hospitals trying two
different therapie
One Way ANOVA:
A one way ANOVA is used to compare two means from two independent (unrelated) groups
using the F-distribution. The null hypothesis for the test is that the two means are equal.
Therefore, a significant result means that the two means are unequal.
When to use a one way ANOVA?
Situation 1: You have a group of individuals randomly split into smaller groups and completing
different tasks. For example, you might be studying the effects of tea on weight loss and form
three groups: green tea, black tea, and no tea.
Situation 2: Similar to situation 1, but in this case the individuals are split into groups based on
an attribute they possess. For example, you might be studying leg strength of people according
to weight. You could split participants into weight categories (obese, overweight and normal)
and measure their leg strength on a weight machine.
Limitations of the One Way ANOVA:
A one way ANOVA will tell you that at least two groups were different from each other. But it
won’t tell you what groups were different. If your test returns a significant f-statistic, you may
need to run an ad hoc test (like the Least Significant Difference test) to tell you exactly which
groups had a difference in means.
Two Way ANOVA:
A Two Way ANOVA is an extension of the One Way ANOVA. With a One Way, you have one
independent variable affecting a dependent variable. With a Two Way ANOVA, there are two
independents. Use a two way ANOVA when you have one measurement variable (i.e. a
quantitative variable) and two nominal variables. In other words, if your experiment has a
quantitative outcome and you have two categorical explanatory variables, a two way ANOVA is
appropriate.
For example, you might want to find out if there is an interaction between income and gender for
anxiety level at job interviews. The anxiety level is the outcome, or the variable that can be
measured. Gender and Income are the two categorical variables. These categorical variables are
also the independent variables, which are called factors in a Two Way ANOVA.
The factors can be split into levels. In the above example, income level could be split into three
levels: low, middle and high income. Gender could be split into three levels: male, female, and

transgender. Treatment groups and all possible combinations of the factors. In this example there
would be 3 x 3 = 9 treatment group
Let’s start with a brief review of few critical points from one-way ANOVA that we’ll be facing
again. First, there’s the question of whether one should use a common error-term for all tests or
use unique error-terms for each separate test. In the case of one-way ANOVA under which the
initial ANOVA only includes one test (i.e., the test of the one and only factor), this question
didn’t arise until we reached the follow-up, pair-wise tests that are needed when the factor is
significantnand has three or more levels. In the case of multi-way ANOVA there are multiple
tests included in the initial ANOVA -- one for each factor and then additional tests for each
combination of factors -- so the question will come up much sooner. Recall, from before, that the
typical answer to the error-term question depended on type of design. We usually use a common
error-term (based on a pooled estimate of variance) for between-subject analyses, because we
want every subject to be included in every test, because our “target of generalization” is people
(plus the standard analysis assumes equal variance, anyway). In contrast, we always use unique
error-terms for within-subject analyses, because these will automatically include every subject
without any pooling, because every subject is in every condition (plus this avoids the equal-
variance assumption which can sometimes cause grief). We’ll be making the same decisions when
it comes to pure-betweens and pure-withins that are
Multi-way: common error-terms for betweens and unique error-terms for withins. The fun
isgoing to be encountered when we get to mixed multi-ways, where we have at least one between-
subjects factor and at least one within-subjects factor. (More on this later.) The second question
that we faced in one-way ANOVA was the issue of follow-up tests when a factor is significant
and has more than two levels. We’ll face that again for multi-way ANOVA, but -- in some cases
-- there’s going to be another layer of follow-up tests to do before we get to the pairwise
comparisons. This will occur when there’s a significant interaction between two or more factors
in the initial analysis.
Interactions There are several ways to describe interactions; here are two: (1) an interaction is
when the DV depends on the specific combination of two or more factors, instead of just each of
the factors (separately); and (2) an interaction is when the effect of a given factor on the DV
depends on the level of some other factor. As a relatively simple example using the second
wording, imagine that you have run a Stroop-like (ignore the word and name the color)
experiment with two factors: congruent vs incongruent (as is standard) combined with words-in-
English vs words-in-Klingon (or some other language that the subjects don’t know). It probably
won’t surprise you to learn that the first factor – congruence -- only has an effect when the level
of the second factor is “English”; the congruence of words in a language (and script) that the
subjects can’t read has no effect color-naming. That’s an interaction: the dependence of one factor
(congruence) on the level of another (language).
At this point we need to be clear about the (correct) labels various effects. The initial ANOVA
for a two-way design will provide the results from three tests: a test of the overall effect of the
first factor, a test of the overall effect of the second factor, and a test of the interaction between
the two factors. The correct label for the overall effect of a factor (in the initial ANOVA) is “main

effect.” For example, the Stroop Effect is extremely robust; even when you include some extra
conditions where it doesn’t occur (e.g., sometimes the words are in a language that the subjects
can’t read), the overall effect of congruence is almost always significant. The correct way to say
this is that “the main effect of Stroop congruence is almost always significant.” In contrast, the
effect of a factor at a specific level of another factor -- e.g., the effect of congruence when the
words are in Klingon -- has a different label. These are called simple main effects.”
Note that you should always specify the level of the second factor when discussing a simple main
effect. For example, you might end up saying something like: “the simple main effect of
congruence for words-in-English was significant, while the simple main effect of congruence
forwords-in-Klingon was not.” Avoid ever saying “the simple main effect of congruence was [or
wasn’t] significant” without mentioning the level of the factor that defines the separate simple
main effects.
Getting back to the analysis (and assuming a two-way design), if the interaction is not significant,
you can -- in effect -- treat the data as if they came from two separate one-way experiments that
happened to be run at the same time. If a given main effect is significant and has more than two
levels, you need to conduct some pair-wise comparisons to see which levels of this factor are
different from which. This will be done in the same way as you did for a one-way design. That’s
easy (but probably not what you hoped for when it comes to getting the experiment published).
In contrast, when you have a significant interaction, the main effects are pretty much ignored and
the interaction must be analyzed or followed-up-on before you move on to any pairwise
comparisons. In order to do this, you have to make an important decision: you have to decide
which factor (involved in the interaction) will be analyze and which factor will set the levels for
the separate tests. In technical terms and using the running example: you have to decide whether
to look at the simple main effects of language for each level of congruence (which doesn’t make
much sense) or to look at the simple main effects of congruence for each level of language (which
does make sense). In many cases, the decision is easy (as it just was). In other cases, the decision
is much harder. Parsing and Analyzing an Interaction There are two general approaches to
deciding how to parse and then analyze an interaction. The first approach is based on statistics
and doesn’t always apply, but takes precedence when it does apply: if the design is mixed-factors,
you make the decision based on which factor is within-subjects vs which is between. (More on
this later.) In all other cases, you should think of the general issue that really ought to be in the
background whenever you perform any statistical
Analysis:
Does this analysis match the story that I’m trying to tell to the reader. If you make some lternative
plots of the data (or just find some plots of similar data in your favorite journal), you might be
able to figure this rule out without being told. When all else fails and you simply can’t decide in
any principled way how to parse the interaction, you should decide this issue in terms of the sizes
of the factors (i.e., how many levels they have). In general, it is better to examine the larger factor
at each level of the smaller factor, because this requires the smaller number of follow-up tests.
For example, if you’ve run a 2×3 design and found an interaction but have no idea how to parse
it, examine the three-level factor at each level of the two-level factor, so you’ll only be doing two

sets of follow-up tests, instead of three. One last comment for now (even if this was already said):
when you find a significant interaction, the main effects of the factors should be ignored. Yes, the
journal will probably demand that you report them, but don’t spend any more time on them than
it takes to scribble them down. Most ofall: don’t do any pair-wise tests, even if the main effect is
significant and has more than two levels. Main effects can be highly misleading when an
interaction is present. The less said about the main effects (when there’s an interaction), the better.
Your focus needs to be on the simple main effects, instead. Upshot: when conducting a multi-
way ANOVA, the first thing that you look at in the output from the initial ANOVA is the test of
the interaction. If the interaction is not significant, you are free to look at the main effects and
then conduct any needed pair-wise comparisons. Nothing new here. In contrast, if the interaction
is significant, you aren’t even close to being done. You must decide how to parse the interaction
and then conduct the required tests of the simple main effects. And if any of the simple main
effects is significant and has more than two levels, you must run pair-wise comparisons inside
that simple main effect.
Multiple comparisons:
In statistics, the multiple comparisons, multiplicity or multiple testing problem occurs when one
considers a set of statistical inferences simultaneously or infers a subset of parameters selected
based on the observed values In certain fields it is known as the look-elsewhere effect.
The more inferences are made, the more likely erroneous inferences are to occur. Several
statistical techniques have been developed to prevent this from happening, allowing significance
levels for single and multiple comparisons to be directly compared. These techniques generally
require a higher significance threshold for individual comparisons, so as to compensate for the
number of inferences being made.
De Multiple comparisons arise when a statistical analysis involves multiple statistical tests, each
of which has a potential to produce a "discovery." Failure to compensate for multiple comparisons
can have important real-world consequences, as illustrated by the following examples:
Suppose the treatment is a new way of teaching writing to students, and the control is the standard
way of teaching writing. Students in the two groups can be compared in terms of grammar,
spelling, organization, content, and so on. As more attributes are compared, it becomes
increasingly likely that the treatment and control groups will appear to differ on at least one
attribute due to random sampling error alone.
Suppose we consider the efficacy of a drug in terms of the reduction of any one of a number of
disease symptoms. As more symptoms are considered, it becomes increasingly likely that the
drug will appear to be an improvement over existing drugs in terms of at least one symptom.
In both examples, as the number of comparisons increases, it becomes more likely that the groups
being compared will appear to differ in terms of at least one attribute. Our confidence that a result
will generalize to independent data should generally be weaker if it is observed as part of an

analysis that involves multiple comparisons, rather than an analysis that involves only a single
comparison.
For example, if one test is performed at the 5% level and the corresponding null hypothesis is
true, there is only a 5% chance of incorrectly rejecting the null hypothesis. However, if 100 tests
are conducted and all corresponding null hypotheses are true, the expected number of incorrect
rejections (also known as false positives or Type I errors) is 5. If the tests are statistically
independent from each other, the probability of at least one incorrect rejection is 99.4%.
The multiple comparisons problem also applies to confidence intervals. A single confidence
interval with a 95% coverage probability level will contain the population parameter in 95% of
experiments. However, if one considers 100 confidence intervals simultaneously, each with 95%
coverage probability, the expected number of non-covering intervals is 5. If the intervals are
statistically independent from each other, the probability that at least one interval does not contain
the population parameter is 99.4%.
Techniques have been developed to prevent the inflation of false positive rates and non-coverage
rates that occur with multiple statistical tests.
Classification of multiple hypothesis tests:
the following table defines the possible outcomes when testing multiple null hypotheses. Suppose
we have a number m of null hypotheses, denoted by: H1, H2, ..., Hm. Using a statistical test, we
reject the null hypothesis if the test is declared significant. We do not reject the null hypothesis if
the test is non-significant. Summing each type of outcome over all Hi yields the following
random variables:
 is the total number hypotheses tested
 is the number of true null hypotheses, an unknown parameter
 is the number of true alternative hypotheses
 is the number of false positives (Type I error) (also called "false discoveries")
 is the number of true positives (also called "true discoveries")
 is the number of false negatives (Type II error)

 is the number of true negatives
 Is the number of rejected null hypotheses In hypothesis tests of which are true null
hypotheses, is an observable random variable, and are unobservable random variables.
What are type I and type II errors?
When you do a hypothesis test, two types of errors are possible: type I and type II. The risks of
these two errors are inversely related and determined by the level of significance and the power
for the test. Therefore, you should determine which error has more severe consequences for your
situation before you define their risks.
No hypothesis test is 100% certain. Because the test is based on probabilities, there is always a
chance of drawing an incorrect conclusion.
Type I error:
When the null hypothesis is true and you reject it, you make a type I error. The probability of
making a type I error is α, which is the level of significance you set for your hypothesis test. An
α of 0.05 indicates that you are willing to accept a 5% chance that you are wrong when you reject
the null hypothesis. To lower this risk, you must use a lower value for α. However, using a lower
value for alpha means that you will be less likely to detect a true difference if one really exists.
Type II error:
When the null hypothesis is false and you fail to reject it, you make a type II error. The probability
of making a type II error is β, which depends on the power of the test. You can decrease your risk
of committing a type II error by ensuring your test has enough power. You can do this by ensuring
your sample size is large enough to detect a practical difference when one truly exists.
The probability of rejecting the null hypothesis when it is false is equal to 1–β. This value is the
power of the test.
Null Hypothesis
Decision True False
Fail to
reject
Correct Decision (probability = 1 - α) Type II Error - fail to reject the null when it is
false (probability = β)
Reject Type I Error - rejecting the null when it is
true (probability = α)
Correct Decision (probability = 1 - β)

Example of type I and type II error:
To understand the interrelationship between type I and type II error, and to determine which error
has more severe consequences for your situation, consider the following example:
A medical researcher wants to compare the effectiveness of two medications. The null and
alternative hypotheses are:
 Null hypothesis (H0): μ1= μ2
The two medications are equally effective.
 Alternative hypothesis (H1): μ1≠ μ2
The two medications are not equally effective.
A type I error occurs if the researcher rejects the null hypothesis and concludes that the two
medications are different when, in fact, they are not. If the medications have the same
effectiveness, the researcher may not consider this error too severe because the patients still
benefit from the same level of effectiveness regardless of which medicine they take. However, if
a type II error occurs, the researcher fails to reject the null hypothesis when it should be rejected.
That is, the researcher concludes that the medications are the same when, in fact, they are
different. This error is potentially life-threatening if the less-effective medication is sold to the
public instead of the more effective one.
As you conduct your hypothesis tests, consider the risks of making type I and type II errors. If
the consequences of making one type of error are more severe or costly than making the other
type of error, then choose a level of significance and a power for the test that will reflect the
relative severity of those consequences.
Correlation
Introduction:
Correlation is a statistical technique that can show whether and how strongly pairs of variables
are related. For example, height and weight are related; taller people tend to be heavier than
shorter people. The relationship isn't perfect. People of the same height vary in weight, and you
can easily think of two people you know where the shorter one is heavier than the taller one.
Nonetheless, the average weight of people 5'5'' is less than the average weight of people 5'6'',
and their average weight is less than that of people 5'7'', etc. Correlation can tell you just how
much of the variation in peoples' weights is related to their heights.

Although this correlation is fairly obvious your data may contain unsuspected correlations. You
may also suspect there are correlations, but don't know which are the strongest. An intelligent
correlation analysis can lead to a greater understanding of your data.
Techniques in Determining Correlation:
There are several different correlation techniques. The Survey System's optional Statistics
Module includes the most common type, called the Pearson or product-moment correlation. The
module also includes a variation on this type called partial correlation. The latter is useful when
you want to look at the relationship between two variables while removing the effect of one or
two other variables.
Like all statistical techniques, correlation is only appropriate for certain kinds of data. Correlation
works for quantifiable data in which numbers are meaningful, usually quantities of some sort. It
cannot be used for purely categorical data, such as gender, brands purchased, or favorite color.
Rank correlation coefficients:
Main articles: Spearman's rank correlation coefficient and Kendall tau rank correlation
coefficient
Rank correlation coefficients, such as Spearman's rank correlation coefficient and Kendall's
rank correlation coefficient (τ) measure the extent to which, as one variable increases, the other
variable tends to increase, without requiring that increase to be represented by a linear
relationship. If, as the one variable increases, the other decreases, the rank correlation
coefficients will be negative. It is common to regard these rank correlation coefficients as
alternatives to Pearson's coefficient, used either to reduce the amount of calculation or to make
the coefficient less sensitive to non-normality in distributions. However, this view has little
mathematical basis, as rank correlation coefficients measure a different type of relationship than
the Pearson product-moment correlation coefficient, and are best seen as measures of a different
type of association, rather than as alternative measure of the population correlation coefficient.
To illustrate the nature of rank correlation, and its difference from linear correlation, consider
the following four pairs of numbers (x, y):
(0, 1), (10, 100), (101, 500), (102, 2000).
As we go from each pair to the next pair x increases, and so does y. This relationship is perfect,
in the sense that an increase in x is always accompanied by an increase in y. This means that we
have a perfect rank correlation, and both Spearman's and Kendall's correlation coefficients are
1, whereas in this example Pearson product-moment correlation coefficient is 0.7544, indicating
that the points are far from lying on a straight line. In the same way if y always decreases when
x increases, the rank correlation coefficients will be −1, while the Pearson product-moment
correlation coefficient may or may not be close to −1, depending on how close the points are to
a straight line. Although in the extreme cases of perfect rank correlation the two coefficients are

both equal (being both +1 or both −1), this is not generally the case, and so values of the two
coefficients cannot meaningfully be compared.[7]
For example, for the three pairs (1, 1) (2, 3)
(3, 2) Spearman's coefficient is 1/2, while Kendall's coefficient is 1/3.
Multiple Correlation:
We can also calculate the correlation between more than two variables.
Definition 1: Given variables x, y and z, we define the multiple correlation coefficient
where rxz, ryz, rxy are as defined in Definition 2 of Basic Concepts of Correlation. Here x and y
are viewed as the independent variables and z is the dependent variable.
We also define the multiple coefficient of determination to be the square of the multiple
correlation coefficient.
Often the subscripts are dropped and the multiple correlation coefficient and multiple
coefficient of determination are written simply as R and R2
respectively. These definitions may
also be expanded to more than two independent variables. With just one independent variable
the multiple correlation coefficient is simply r.
Unfortunately R is not an unbiased estimate of the population multiple correlation coefficient,
which is evident for small samples. A relatively unbiased version of R is given by R adjusted.
Definition 2: If R is Rz,xy as defined above (or similarly for more variables) then the adjusted
multiple coefficient of determination is

where k = the number of independent variables and n = the number of data elements in the sample
for z (which should be the same as the samples for x and y).
Excel Data Analysis Tools: In addition to the various correlation functions described elsewhere,
Excel provides the Covariance and Correlation data analysis tools. The Covariance tool
calculates the pairwise population covariances for all the variables in the data set. Similarly the
Correlation tool calculates the various correlation coefficients as described in the following
example.
Definition 3: Given x, y and z as in Definition 1, the partial correlation of x and z holding y
constant is defined as follows:
In the semi-partial correlation, the correlation between x and y is eliminated, but not the
correlation between x and z and y and z:
Regression
Introduction:
Regression is a statistical measure used in finance, investing and other disciplines that attempts
to determine the strength of the relationship between one dependent variable (usually denoted by
Y) and a series of other changing variables (known as independent variables). Regression helps
investment and financial managers to value assets and understand the relationships between
variables, such as commodity prices and the stocks of businesses dealing in those commodities.
Linear Regression:
Linear regression attempts to model the relationship between two variables by fitting a linear
equation to observed data. One variable is considered to be an explanatory variable, and the other
is considered to be a dependent variable. For example, a modeler might want to relate the weights
of individuals to their heights using a linear regression model.

Before attempting to fit a linear model to observed data, a modeler should first determine whether
or not there is a relationship between the variables of interest. This does not necessarily imply
that one variable causes the other (for example, higher SAT scores do not cause higher college
grades), but that there is some significant association between the two variables. A scatterplot can
be a helpful tool in determining the strength of the relationship between two variables. If there
appears to be no association between the proposed explanatory and dependent variables (i.e., the
scatterplot does not indicate any increasing or decreasing trends), then fitting a linear regression
model to the data probably will not provide a useful model. A valuable numerical measure of
association between two variables is the correlation coefficient, which is a value between -1 and
1 indicating the strength of the association of the observed data for the two variables.
A linear regression line has an equation of the form Y = a + bX, where X is the explanatory
variable and Y is the dependent variable. The slope of the line is b, and a is the intercept (the
value of y when x = 0).
Curvilinear Regression:
Simple linear regression has been developed to fit straight lines to data points. However,
sometimes the relationship between two variables may be represented by a curve instead of a
straight line. Such "non-linear" relationships need not be non-linear in a mathematical sense. For
example, a parabolic relationship may be well-modeled by a (modified) linear regression, since a
parabola is a linear equation, as far as its parameters are concerned. Sometimes, such relationships
are called "curvilinear".
There are several ways to fit a curve other than a line (or, generally speaking, an n-dimensional
hyperplane) to the data:
 deriving the proper regression formula, which may be cumbersome, and requires some
calculus knowledge,
 using linear regression after transforming the data into a linear problem,
 using optimization algorithms to minimize the error surface, and
 using non-linear modeling techniques, such as neural networks.
The first two approaches require the type of functional relationship to be known. In many
standard cases, the second approach may be appropriate.
Coefficient of Determination:
The coefficient of determination (denoted by R2
) is a key output of regression analysis. It is
interpreted as the proportion of the variance in the dependent variable that is predictable from the
independent variable.

 The coefficient of determination is the square of the correlation (r) between predicted y
scores and actual y scores; thus, it ranges from 0 to 1.
 With linear regression, the coefficient of determination is also equal to the square of the
correlation between x and y scores.
 An R2
of 0 means that the dependent variable cannot be predicted from the independent
variable.
 An R2
of 1 means the dependent variable can be predicted without error from the
independent variable.
 An R2
between 0 and 1 indicates the extent to which the dependent variable is predictable.
An R2
of 0.10 means that 10 percent of the variance in Y is predictable from X; an R2
of
0.20 means that 20 percent is predictable; and so on.
The formula for computing the coefficient of determination for a linear regression model with
one independent variable is given below.

Conclution:
Introductory Statistics, Third Edition is a rigorous introductory statistics textbook, with integrated
online supplements that offer the best of today's web technologies, including: student engagement
and performance analytics, search, tag clouds, social networking, video and podcasting. Students
can read the textbook online, have it printed on-demand and delivered to their door, or download
and listen to the audio podcast series while on-the-go.
In the classroom, your teaching is supported with integrated lecture slides and AutoGrade pop
quizzes. Away from class, your students can put the theory they’re learning into practice with
VirtualTutor questions that immediately provide personalized, step-by-step feedback. And when
you set homework, your textbook’s customized Algorithmic Homework exercises will prevent
students from cheating and automatically grade your students’ work.
But most importantly, the Introductory Statistics, Third Edition textbook offers a tightly
integrated pedagogy that blends all of these technologies with the easy-to-understand
conversational teaching style of Dr Shaun Thompson, to make the mastery of accounting more
achievable for your students. Your textbook will immerse students in reading, watching, listening
to, and experiencing accounting - while supportively encouraging them to test their knowledge
and reflect on their learning.
Reference :
1. Dr. Abdur Rashid Ahmed ; Methods Of Statistic ; 2nd
edition
2. https://www.perdisco.com/stats/
3. http://www.investopedia.com/terms/r/regression.asp
4. http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-
tests/basics/what-is-a-hypothesis-test/
5. https://en.wikipedia.org/wiki/Statistical_hypothesis_testing
6. http://www.statisticshowto.com/anova/
7. http://www.statisticshowto.com/anova/

Content
Serial no. Name of content
01 Experimental design
02 Hypothesis Testing
03 Anova testing
04 Correlation and Regression
05 Conclusion:
06 reference

Experimental Design Fundamentals

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Experimental Design Fundamentals

Similar to Experimental Design Fundamentals (20)

More from As Siyam

More from As Siyam (20)

Recently uploaded

Recently uploaded (20)

Experimental Design Fundamentals