Understanding Validity in
Research Design
Internal Validity
External & Internal Validity
• Internal validity is
concerned with the degree
to which the results of the
study are due to the
independent variable(s)
under consideration and
not due to anything else.
• External validity relates to
the degree to which
findings can be
generalized/ transferred to
populations or situations.
Relationship between Internal &
External Validity
• Internal and External Validity are
Interconnected
• Necessary Condition for Causality
Challenges in Research Design
• Complexity of Isolating Variables
• Designing an Effective Study
Goal of Presentation
• To outline common threats to internal validity.
• To explain how they distort research results.
• To offer methods to mitigate these threats.
Extraneous Factors Affecting
Internal and External Validity
• Campbell and Stanley (1963), Cook and
Campbell (1979), and others has identified
multiple factors that threaten both internal
and external validity.
• Gall et al. (1996) summarized these factors
in their research.
• Miles and Huberman (1994) presented a
similar list of validity threats.
Threats to Internal Validity
(by Fred L. Perry)
• A list of threats compiled by Fred L. Perry:
14 primary threats
9 subordinate ones
• He illustrates these extraneous factors with the
Research Minefield. And calls these threats
‘mines’.
Threats to Internal Validity
1. History
2. Maturation
3. Differential Selection
4. Statistical Regression
5. Subject Attrition
6. Competing Group
Contamination
7. Testing
8. Researcher and Data Gatherer
effect
9. Pygmalion Effect
10. Hawthorne Effect
11. Treatment Intervention
12. Accumulative Treatment Effect
13. Treatment Fidelity
14. Treatment Strength-Time
Interaction
1. History
• It refers to the influence of events that take place at different
points in time on the dependent variable other than the
independent variable.
• Example of History in Practice: In a study examining the
effect of a new teaching methodology on children’s
second language (L2) learning over several months, an
external event (the airing of a new bilingual TV program)
could influence language behavior, potentially interacting
with the new teaching methodology.
1. History (cont…)
• History as a Threat to Longitudinal Studies
• Azpillaga et al. (2001): The study aimed to investigate the
effects of a drama-based teaching method on English
language achievement over two years. The independent variable
was the type of teaching method (dramatized vs.
nondramatized), and the dependent variable was language
achievement (aural comprehension and oral production).
2. Maturation
• Maturation refers to natural developmental changes in participants
over time that are unrelated to the treatment. This could include
physical, cognitive, or emotional development.
• Example from Piaget’s Theory
• Azpillaga et al. (2001): While the study participants were all the same
age and likely developed at similar rates, maturation could still play a
role. For instance, early pubescent children may respond differently to
drama-based teaching methods than pre-pubescent children. The study
didn’t explore this possibility, but it highlights a potential interaction
between the treatment and participants’ developmental stages.
3. Differential Selection
• This occurs when participants are not randomly selected
and placed into different groups (e.g., treatment vs.
control).
• Pre-existing differences between the groups can affect the
results.
• Azpillaga et al. (2001): The researchers did not use
random sampling but took steps to match participants on
various criteria (e.g., shy students, trouble-makers, and
average students). This effort helped control for
preexisting differences between the groups.
4. Statistical Regression
• Statistical regression occurs when participants chosen for extreme scores (either
very high or very low) are likely to score closer to the average on subsequent
measurements. This is not due to any intervention but a natural statistical
phenomenon.
• Example: If a study selects participants who perform very poorly on a language
test and later tests them after a treatment, any improvement might not be due to
the treatment. Instead, it could be due to the tendency of extreme scores to move
toward the mean (average).
• In qualitative studies, researchers might specifically choose extreme cases (such
as very low or very high performers) because they provide rich data. In this case,
statistical regression may not be a concern, as the focus is on in-depth
understanding rather than generalizing results.
5. Subject Attrition
• Subject attrition refers to the loss of participants during the course of a study, which
can distort the results if the attrition is not random. Those who drop out may differ
systematically from those who remain in the study, affecting the generalizability
and internal validity of the findings.
• For Example: A researcher conducts a 12-week study to evaluate the effect of a
new exercise program on weight loss.
Participants are divided into two groups:
Group A: Follows the new exercise program.
Group B: Does not participate in the exercise program (control group).
The researcher measures participants’ weight at the beginning and end of the
study.
• By the end of the study: 30% of participants in Group A drop out, citing reasons
such as the program being too intense or scheduling conflicts. In contrast, only 5%
of participants in Group B drop out.
6. Competing Group Contamination
• Competing group contamination occurs when there are multiple
treatment groups or a lack of proper control groups in a study. If
the groups are not well controlled or coordinated, external factors
could influence the results, leading to confounded outcomes.
• Azpillaga et al. (2001): The study compared two teaching
methods: dramatized format versus non-dramatized format for
teaching a third language. The treatment was applied consistently
across multiple groups, but the control groups came from
different schools with no coordination between them.
6. Competing Group Contamination
(cont…)
• Competing group contamination can take four different directions:
 Competing Group Rivalry (John Henry Effect): participants in competing
groups change their behavior to outdo one another. In
the Azpillaga et al. (2001) study, the control group and the experimental group
came from the same sociogeographical location. Although the possibility of the
John Henry effect was not mentioned, the fact that the experimental group
outperformed the control group suggests that rivalry was not a significant issue.
 Experimental Treatment Diffusion (Compromise): participants in competing
groups gain knowledge about the treatment conditions in other groups and
incorporate these factors into their own treatment. In the
Azpillaga et al. (2001) study, while there was no explicit mention of
experimental treatment diffusion, there was a potential for participants in the
control group to become aware of the experimental treatment.
6. Competing Group Contamination
(cont…)
 Compensatory equalization of treatments: when researchers, in an
attempt to make the control group feel less disadvantaged, provide them
with extra materials or special treatment that effectively turns them into a
new treatment group. In the Azpillaga et al. (2001) study, since the
experimental group outperformed the control group, compensatory equalization
likely did not occur.
 Demoralization (boycott) of the control group: when participants in
the control group feel resentful or demoralized because they perceive that
the treatment group is receiving better or more interesting treatments.
This resentment may lead to decreased effort and motivation in the
control group. In the Azpillaga et al. (2001) study, since both the treatment
and control groups were from the same sociogeographical location, the control
group might have become demoralized if they learned they were not receiving the
new, more interesting dramatized format.
7. Testing
• Testing refers to ways in which measuring the dependent variable(s) can distort the results of
a study.
• List of five sources to which the consumer of research needs to pay attention.
1. Instrumentation: when different instruments are used to assess performance at different
stages of the study (e.g., pretest vs. posttest).
2. Measurement-treatment interaction: when the results of an intervention (or treatment)
only become apparent through the use of a specific type of measurement, such as a
particular kind of test or assessment.
3. Pretest effect: when the test administered before the treatment (the pretest) heightens
participants’ awareness of certain material that they might not have paid as much attention
to otherwise.
4. Posttest effect: when the design of the posttest inadvertently helps participants make
associations or connections that they would not have made otherwise, potentially making
the treatment appear more effective than it truly is.
5. Time of measurement effect: This effect refers to the timing of when measurements are
taken after the treatment.
8. Researcher and Data Gatherer Effect:
• When the identity or behavior of the person administering
the treatment or collecting data influences the results.
• This could be due to biases or expectations from the
researcher, or the influence of the data-gathering process
itself (e.g., whether a research assistant, a tape recorder, or
a video camera is used).
• The mere presence of a researcher or data collector can
change participants’ behavior or responses.
• Bejarano et al.’s (1997) study.
• Wesche and Paribakht’s (2000) study.
9. Pygmalion Effect (Researcher Effect):
• When the researcher’s expectations influence
their observations and judgments of the
participants.
• If a researcher believes that certain
participants have higher ability, they may
unknowingly treat them more leniently or be
more encouraging, which could lead to biased
results.
10. Hawthorne Effect
• When participants alter their behavior simply because they
are aware that they are part of a study. This awareness can
lead them to act in ways that they wouldn’t in a normal,
non-research environment.
• In Gray’s (1998) study, teacher trainees were asked to
write in interactive diaries outside of class hours. The
students were flattered by their involvement and might
have altered their behavior due to the awareness of being
part of an educational program, which could have affected
the quality of the diaries.
11. Treatment Intervention
• Treatment intervention can affect the results of a study in at
least two undesirable ways: Novelty and Disruption.
• Novelty Effect: New treatments may create an initial boost
in motivation simply because they are novel. This effect could
distort results if it fades over time.
• Disruption Effect: Unfamiliar treatments or tools may
disrupt performance. For example, students using computers for
the first time might perform poorly due to unfamiliarity with the
technology, which may obscure the treatment’s effectiveness.
• Example
12. Accumulated Treatment Effect
(Multiple-Treatment Interference or Order Effect)
• When the order in which treatments are
presented influences the outcomes.
• Mehnert (1998) studied the effects of
planning time on L2 German speakers’ speech
performance..
13. Treatment Fidelity
• Treatment fidelity refers to whether the treatment was
implemented as intended. If the treatment is not applied
consistently or properly, the results may not reflect the true effects
of the treatment.
• Bejarano et al. (1997) trained teachers to use specific group work
techniques and then monitored them through lesson plans and
teacher logs to ensure that both the treatment and control groups
received the intended treatment.
14. Treatment Strength–Time Interaction
• Some treatments may require more time to show their effects.
Short treatment durations may not provide enough time for the
treatment to have a noticeable impact, leading to misleading
conclusions.
• In the study by Rodriguez and Sadoski (2000), participants
were given only one session to learn mnemonic strategies.
Internal
Validity
Evaluation
Inventory

Internal Validity in Research Methodology (Factors that affect Internal Validity)

  • 1.
    Understanding Validity in ResearchDesign Internal Validity
  • 2.
    External & InternalValidity • Internal validity is concerned with the degree to which the results of the study are due to the independent variable(s) under consideration and not due to anything else. • External validity relates to the degree to which findings can be generalized/ transferred to populations or situations.
  • 3.
    Relationship between Internal& External Validity • Internal and External Validity are Interconnected • Necessary Condition for Causality
  • 4.
    Challenges in ResearchDesign • Complexity of Isolating Variables • Designing an Effective Study
  • 5.
    Goal of Presentation •To outline common threats to internal validity. • To explain how they distort research results. • To offer methods to mitigate these threats.
  • 6.
    Extraneous Factors Affecting Internaland External Validity • Campbell and Stanley (1963), Cook and Campbell (1979), and others has identified multiple factors that threaten both internal and external validity. • Gall et al. (1996) summarized these factors in their research. • Miles and Huberman (1994) presented a similar list of validity threats.
  • 7.
    Threats to InternalValidity (by Fred L. Perry) • A list of threats compiled by Fred L. Perry: 14 primary threats 9 subordinate ones • He illustrates these extraneous factors with the Research Minefield. And calls these threats ‘mines’.
  • 8.
    Threats to InternalValidity 1. History 2. Maturation 3. Differential Selection 4. Statistical Regression 5. Subject Attrition 6. Competing Group Contamination 7. Testing 8. Researcher and Data Gatherer effect 9. Pygmalion Effect 10. Hawthorne Effect 11. Treatment Intervention 12. Accumulative Treatment Effect 13. Treatment Fidelity 14. Treatment Strength-Time Interaction
  • 9.
    1. History • Itrefers to the influence of events that take place at different points in time on the dependent variable other than the independent variable. • Example of History in Practice: In a study examining the effect of a new teaching methodology on children’s second language (L2) learning over several months, an external event (the airing of a new bilingual TV program) could influence language behavior, potentially interacting with the new teaching methodology.
  • 10.
    1. History (cont…) •History as a Threat to Longitudinal Studies • Azpillaga et al. (2001): The study aimed to investigate the effects of a drama-based teaching method on English language achievement over two years. The independent variable was the type of teaching method (dramatized vs. nondramatized), and the dependent variable was language achievement (aural comprehension and oral production).
  • 11.
    2. Maturation • Maturationrefers to natural developmental changes in participants over time that are unrelated to the treatment. This could include physical, cognitive, or emotional development. • Example from Piaget’s Theory • Azpillaga et al. (2001): While the study participants were all the same age and likely developed at similar rates, maturation could still play a role. For instance, early pubescent children may respond differently to drama-based teaching methods than pre-pubescent children. The study didn’t explore this possibility, but it highlights a potential interaction between the treatment and participants’ developmental stages.
  • 12.
    3. Differential Selection •This occurs when participants are not randomly selected and placed into different groups (e.g., treatment vs. control). • Pre-existing differences between the groups can affect the results. • Azpillaga et al. (2001): The researchers did not use random sampling but took steps to match participants on various criteria (e.g., shy students, trouble-makers, and average students). This effort helped control for preexisting differences between the groups.
  • 13.
    4. Statistical Regression •Statistical regression occurs when participants chosen for extreme scores (either very high or very low) are likely to score closer to the average on subsequent measurements. This is not due to any intervention but a natural statistical phenomenon. • Example: If a study selects participants who perform very poorly on a language test and later tests them after a treatment, any improvement might not be due to the treatment. Instead, it could be due to the tendency of extreme scores to move toward the mean (average). • In qualitative studies, researchers might specifically choose extreme cases (such as very low or very high performers) because they provide rich data. In this case, statistical regression may not be a concern, as the focus is on in-depth understanding rather than generalizing results.
  • 14.
    5. Subject Attrition •Subject attrition refers to the loss of participants during the course of a study, which can distort the results if the attrition is not random. Those who drop out may differ systematically from those who remain in the study, affecting the generalizability and internal validity of the findings. • For Example: A researcher conducts a 12-week study to evaluate the effect of a new exercise program on weight loss. Participants are divided into two groups: Group A: Follows the new exercise program. Group B: Does not participate in the exercise program (control group). The researcher measures participants’ weight at the beginning and end of the study. • By the end of the study: 30% of participants in Group A drop out, citing reasons such as the program being too intense or scheduling conflicts. In contrast, only 5% of participants in Group B drop out.
  • 15.
    6. Competing GroupContamination • Competing group contamination occurs when there are multiple treatment groups or a lack of proper control groups in a study. If the groups are not well controlled or coordinated, external factors could influence the results, leading to confounded outcomes. • Azpillaga et al. (2001): The study compared two teaching methods: dramatized format versus non-dramatized format for teaching a third language. The treatment was applied consistently across multiple groups, but the control groups came from different schools with no coordination between them.
  • 16.
    6. Competing GroupContamination (cont…) • Competing group contamination can take four different directions:  Competing Group Rivalry (John Henry Effect): participants in competing groups change their behavior to outdo one another. In the Azpillaga et al. (2001) study, the control group and the experimental group came from the same sociogeographical location. Although the possibility of the John Henry effect was not mentioned, the fact that the experimental group outperformed the control group suggests that rivalry was not a significant issue.  Experimental Treatment Diffusion (Compromise): participants in competing groups gain knowledge about the treatment conditions in other groups and incorporate these factors into their own treatment. In the Azpillaga et al. (2001) study, while there was no explicit mention of experimental treatment diffusion, there was a potential for participants in the control group to become aware of the experimental treatment.
  • 17.
    6. Competing GroupContamination (cont…)  Compensatory equalization of treatments: when researchers, in an attempt to make the control group feel less disadvantaged, provide them with extra materials or special treatment that effectively turns them into a new treatment group. In the Azpillaga et al. (2001) study, since the experimental group outperformed the control group, compensatory equalization likely did not occur.  Demoralization (boycott) of the control group: when participants in the control group feel resentful or demoralized because they perceive that the treatment group is receiving better or more interesting treatments. This resentment may lead to decreased effort and motivation in the control group. In the Azpillaga et al. (2001) study, since both the treatment and control groups were from the same sociogeographical location, the control group might have become demoralized if they learned they were not receiving the new, more interesting dramatized format.
  • 18.
    7. Testing • Testingrefers to ways in which measuring the dependent variable(s) can distort the results of a study. • List of five sources to which the consumer of research needs to pay attention. 1. Instrumentation: when different instruments are used to assess performance at different stages of the study (e.g., pretest vs. posttest). 2. Measurement-treatment interaction: when the results of an intervention (or treatment) only become apparent through the use of a specific type of measurement, such as a particular kind of test or assessment. 3. Pretest effect: when the test administered before the treatment (the pretest) heightens participants’ awareness of certain material that they might not have paid as much attention to otherwise. 4. Posttest effect: when the design of the posttest inadvertently helps participants make associations or connections that they would not have made otherwise, potentially making the treatment appear more effective than it truly is. 5. Time of measurement effect: This effect refers to the timing of when measurements are taken after the treatment.
  • 19.
    8. Researcher andData Gatherer Effect: • When the identity or behavior of the person administering the treatment or collecting data influences the results. • This could be due to biases or expectations from the researcher, or the influence of the data-gathering process itself (e.g., whether a research assistant, a tape recorder, or a video camera is used). • The mere presence of a researcher or data collector can change participants’ behavior or responses. • Bejarano et al.’s (1997) study. • Wesche and Paribakht’s (2000) study.
  • 20.
    9. Pygmalion Effect(Researcher Effect): • When the researcher’s expectations influence their observations and judgments of the participants. • If a researcher believes that certain participants have higher ability, they may unknowingly treat them more leniently or be more encouraging, which could lead to biased results.
  • 21.
    10. Hawthorne Effect •When participants alter their behavior simply because they are aware that they are part of a study. This awareness can lead them to act in ways that they wouldn’t in a normal, non-research environment. • In Gray’s (1998) study, teacher trainees were asked to write in interactive diaries outside of class hours. The students were flattered by their involvement and might have altered their behavior due to the awareness of being part of an educational program, which could have affected the quality of the diaries.
  • 22.
    11. Treatment Intervention •Treatment intervention can affect the results of a study in at least two undesirable ways: Novelty and Disruption. • Novelty Effect: New treatments may create an initial boost in motivation simply because they are novel. This effect could distort results if it fades over time. • Disruption Effect: Unfamiliar treatments or tools may disrupt performance. For example, students using computers for the first time might perform poorly due to unfamiliarity with the technology, which may obscure the treatment’s effectiveness. • Example
  • 23.
    12. Accumulated TreatmentEffect (Multiple-Treatment Interference or Order Effect) • When the order in which treatments are presented influences the outcomes. • Mehnert (1998) studied the effects of planning time on L2 German speakers’ speech performance..
  • 24.
    13. Treatment Fidelity •Treatment fidelity refers to whether the treatment was implemented as intended. If the treatment is not applied consistently or properly, the results may not reflect the true effects of the treatment. • Bejarano et al. (1997) trained teachers to use specific group work techniques and then monitored them through lesson plans and teacher logs to ensure that both the treatment and control groups received the intended treatment.
  • 25.
    14. Treatment Strength–TimeInteraction • Some treatments may require more time to show their effects. Short treatment durations may not provide enough time for the treatment to have a noticeable impact, leading to misleading conclusions. • In the study by Rodriguez and Sadoski (2000), participants were given only one session to learn mnemonic strategies.
  • 26.