The document discusses validity and reliability in research. It defines validity as the appropriateness and meaningfulness of inferences made from an instrument, and reliability as the consistency of scores from one administration to another. There are three main types of validity evidence: content, criterion, and construct validity. Reliability can be measured through test-retest, parallel forms, and internal consistency methods. Researchers must ensure the inferences they draw from data are valid and supported by evidence to minimize threats to internal validity.
3. Validity refers to the appropriateness,
meaningfulness, correctness and usefulness of the
inferences a researcher makes.
Reliability refers to the consistency of scores or
answers from one administration of an instrument to
another, and from one set of items to another.
4. Validation is the process of collecting and analyzing
evidence to support inferences.
The term validity refers to the degree to which
evidence supports any inferences a researcher makes
based on the data he/she collects using a particular
instrument.
The validation process is on not on the instrument
but on the inferences a researcher makes.
5. An appropriate inference would be the one that is
relevant ( related to the purposes of the study).
A meaningful inference says something about the
meaning of the information (such as test scores)
obtained through the use of an instrument.
Validity depends on the amount and type of evidence
there is to support the interpretations researchers
wish to make concerning data they have collected.
6. 1) Content-related evidence of validity
2) Criterion-related evidence of
validity
3) Construct-related evidence of validity
7. This evidence refers to the content and format of the
instrument.
The content and format must be consistent with the
definition of the variable and the sample of subjects
to be measured.
One key element in this kind of evidence is the
adequacy of the sampling.
8. Content validation is a matter of determining if the
content that the instrument contains is an adequate
sample of the domain of content it is supposed to
represent.
The other aspect of content validation has to do with
the format of the instrument such as clarity of printing
and appropriateness of language.
However, valid results cannot be obtained if an
adequate questions in an instrument presented in an
inappropriate format ( such as giving a test written in
English to children whose English is minimal).
9. A common way to do this is to have someone look at
the content and format of the instrument and judge
whether or not it is appropriate.
However, the qualifications of the judges are always
in important consideration, and the judges must keep
in mind the characteristics of the intended sample.
10. Criterion – A second test or other assessment
procedure presumed to measure the same variable.
Researcher usually will compare the performance
from one instrument with other performance
obtained from another criterion.
11. Concurrent
Allow a time interval to
elapse between
administration of
instrument and obtaining
criterion score
Both data from the
instrument and criterion
are collected in near same
time
12. A key index to both forms of criterion-related
evidence.
Symbolized by the letter r
Indicating the degree of relationship that exist
between the scores individuals obtained by the two
instrument
Can have positive and negative relationship.
The difference between +1.00 and -1.00
If the r is .00, it means that there is no relation.
13.
14. More typically associated with research studies than
testing.
Multiple sources are used to collect evidence.
A combination of observation, surveys, focus groups,
and other measures are used to identify how much of
the trait being measured is possessed by the observee.
15. The variable being measured is clearly
defined.
Hypotheses, are formed about how people
who possess a lot versus a little of the
variable will behave in a particular
situation.
The hypotheses are tested both logically
and empirically.
16.
17. Reliability refers to the consistency of the scores
obtained.
If scores are completely inconsistent for a person,
they provide no useful information.
The distinction between reliability and validity is
shown in Figure 8.2
Indeed, if the data is unreliable, it cannot be valid but
if the data is valid, it will always be reliable to be used.
18. There are many factors lead to errors of measurements
such as:
1) Differences in motivation
2) Energy
3) Anxiety
4) Different testing situation
A reliability coefficient expresses relationship between
scores of the same individuals on the same instrument at
two different times, or on two parts of the same
instrument.
20. Test-retest reliability is a measure of reliability obtained
by administering the same test twice over a period of time
to a group of individuals. The scores from Time 1 and
Time 2 can then be correlated in order to evaluate the test
for stability over time.
A reliability coefficient is then calculated to indicate the
relationship between the two sets of scores obtained.
Stability of scores over a two- to three-month is usually
viewed as sufficient evidence of test-retest reliability for
most educational research.
21. In reporting test-retest reliability coefficients, the
time interval between the two testings should always
be reported.
As an example, a test designed to assess student
learning in psychology could be given to a group of
students twice, with the second administration
perhaps coming a week after the first. The obtained
correlation coefficient would indicate the stability of
the scores.
22. The shorter the time gap, the higher the correlation;
the longer the time gap, the lower the correlation.
This is because the two observations are related over
time -- the closer in time we get the more similar the
factors that contribute to error.
Since this correlation is the test-retest estimate of
reliability, you can obtain considerably different
estimates depending on the interval.
23. Two different but equivalent forms of an instrument
are administered to the same group of people.
Knows as ‘alternate’ or ‘parallel’.
Containing the same content but constructed
differently
High coefficient will indicate a strong evidence of
reliability and vice versa
Can be combined with test-retest where the
coefficient will also cover the consistency over time.
24. Correlation coefficient will be calculated from the
scoring two halves (usually the odd vs. the even) of
the test.
The internal consistency of the test will be described
by the relativity of the two halves.
Calculated using the Spearman-Brown Prophecy
Formula (pg. 156).
The reliability of the test may be increased by adding
more item to it (have to be similar to the original
item.)
25. Most frequently applied method in determining the
internal consistency.
Need 3 information
Number of items on the test – K
The mean – M
Standard deviation of the test – SD
Example – pg. 157
26. Also known as Cronbach alpha
Is a general form of the KR20 formula
To be used in calculating the reliability of items that
are not scored by right vs. wrong (more than one
answer is possible)
27.
28. An index that shows the extent to which a measurement
would vary under changed circumstances.
Hence, there are many possible standard errors for a given
score.
eg: IQ tests;
SMEas over one year period with different content = 5 points
10-year period
= 8 points’
We doubled standard errors in measurement in computing
the ranges within which the second score is expected to fall.
This was done so 95% sure that our estimates were correct.
29. Instrument s that use direct observation are highly
vulnerable to observer differences.
Researchers are obliged to investigate and report the
degree of scoring agreement
Enhanced by training the observers and increasing
the number of observation periods.
30. All researches should ensure that any inferences they
draw that are based on data obtained through the use
of an instrument are appropriate, credible, and
backed up by evidence.
31.
32. Internal validity refers to the degree to which
observed differences on the dependent variable are
directed to the independent variable, not to some
other (uncontrolled) variable.
When a study has internal validity, it means that any
relationship observed between two or more variables
should be unambiguous as to what it means rather
than being due to “something else”.
33. The “something else” may be any one (or more) of a
number of factors, such as the age or the ability of the
subjects, the conditions under which the study is
conducted, or the types of materials used.
If these factors are not in some way or another
controlled or accounted for, the researcher can never
be sure that they are not the reason for any observed
results.
In qualitative research, a study is said to have good
internal validity if the alternative explanations (the
“something else”) have been systematically ruled out.
34. Regardless of whether the study is qualitative or
quantitative, if these “rival hypotheses” are not
controlled or accounted for in some way, the
researcher can never be sure that they are not the
reason for any observed results.
35. 1) Subject characteristics
2) Loss of subjects (Mortality)
3) Location
4) Instrumentation
5) Data collector characteristics
37. Selection bias happens when the selection of people
for a study may result in the individuals (or groups)
differing from one another in unintended ways that
are related to the variables to be studied.
Some examples that might affect the results of a study
include:
1) Age
2) Strength
3) Maturity
4) Gender ethnicity
38. 5) Coordination
6) Speed
7) Intelligence
8) Vocabulary
9) Attitude
10) Reading ability
11) Fluency
12) Manual dexterity
13) Socioeconomic status
14) Religious beliefs
15) Political beliefs
39. Mortality threat refers to the possibility that results
are due to the fact that subjects who are for whatever
reason “lost” to a study may differ from those who
remain so that their absence has an important effect
on the results of the study.
This threat is due to some reasons such as illness,
family relocation, requirements of other activities or
some individuals may drop out of the study.
40. Loss of subjects not only limits generalizability but also
introduce bias- if those subjects who are lost would have
responded differently from those from whom data were
obtained.
However, there is an attempt to eliminate the problem of
mortality is to provide evidence that the subjects lost
were similar to those remaining on pertinent
characteristics such as age, gender, ethnicity, pretest
scores, or other variables that presumably that might be
related to the study outcomes.
Indeed, the best solution to this threat is preventing or
minimizing the loss of subjects.
41. Experimental mortality which is also known as the loss of
subjects.
Example:
In a Web-based instruction project entitled Eruditio, it
started with 161 subjects and only 95 of them completed
the entire module. Those who stayed in the project all
the way to end may be more motivated to learn and thus
achieved higher performance.
42. Location threat happens when the particular
locations in which an intervention is carried out, may
create alternative explanations for results.
The best method to control this problem is to hold
location constant-that is, keep it the same for all
participants.
43. The particular location in which data are collected or
in which an intervention is carried out, may create
alternative explanations for results. This is called a
location threat.
Example:
Classrooms in which students are taught may have
more or less resources, workstations, lighting, or
teachers who may skew the results inadvertently. The
location in which tests are administered may affect
responses. Parent assessments of their children may be
different when done at home than at school or if done
individually or in groups.
44. Student performance on tests may be lower if tests are
given in noisy or poorly lighted rooms. Observations of
student interaction may be affected by the physical
arrangement in certain classrooms.
The best method of control for a location threat is to
hold location constant that is, keep it the same for all
participants.
45. Instrumentation is the process where instruments and
procedures are used in collecting data in a study.
Instrument decay happens when instrumentation
creates problem if the nature of the instrument
(including the scoring procedure) is changed in some
way or another.
This is often the case when instrument permits
different interpretations of results (as in essay tests) or
is especially long or difficult to score, thereby resulting
in fatigue of the scorer.
46. Instruments are devices used by researchers to collect
information.
Examples are: questionnaires, surveys, tests,
observation, participation, studies….
Instrument Decay can be a problem if the nature of
the instrument is changed over time.
This may be due to fatigue or repetition on the part
of the person administering the test, taking the test,
or correcting the test.
47. Fatigue often happens when a researcher scores a
number of tests one after the other; he/she becomes
tired and scores the tests differently.
For example, more rigorously at first, more
generously later.
Data Collector Characteristics is an inevitable part of
most instruments and can affect results. The
individual who collects the data may affect the results
unintentionally.
48. Example:
People may be more willing to be interviewed by females
rather than males. Other characteristics could be
language patterns, ethnicity, age, size….
Also, individuals may present information, researchers
may collect data differently, or counselors may use
different tactics when presenting orally.
These threats, know as the implementer effect need to be
controlled for as much as possible.
49. The characteristics of the data gatherers may tamper
the data.
Female data collector will elicit more of a confession
from the situation compared
Primary ways to control the threat
Use the same data collector(s)
Analysing data separately between each collector
Ensuring each collector were used equally in a group
setting
50. The data collectors or the scorer may unconsciously
distort the data
More time allowed in the exam
Interviewers asking leading questions
Technique to handle data collectors bias :
Standardize all procedure
Require some training
Ensure data collectors lack the information that require
them to distort the data
Also known Planned Ignorance
51. Testing : the use of any instrumentation.
Testing Threat : the subjects already ‘practiced’ the
post-test using the pre-test given to them prior to the
subject
A pre-test sometimes regarded as a practice thus
making the subjects alert/aware of the questions.
52. One or more unanticipated and unplanned event
occur during the course of the study that can affect
the result/outcome
Construction noises
Death of a certain eminent person
Researcher must be alert on any events or occurrence
during the study.
53. During an intervention, the change happen with the
influence of time rather than the intervention itself.
It is a serious threat to pre-post studies or studies that
span over years of time.
The best way overcome this is to have a good
comparison group in the study.
54. Hawthorne Effect
•
This positive effect, resulting from increased attention and recognition
of subjects
eg: productivity increased were made in physical working conditions
(increase in the number of breaks)
Opposite Effect
The negative effect, resulting in becoming demoralized or resentful and
hence perform more poorly than the treatment group.
eg: productivity decreased when the control group receive no treatment
at all
Remedy
•
Provide the control or comparison group(s) with a special treatment
comparable to that received by the experimental group.
55. Presence of regression threat
Change is studied in a group that is extremely low or
high in its preintervention performance.
eg: a class of students of markedly low ability may
are given special help. Six months later, their
average score on test involving similar problem
has improved, but not necessarily because of the
special help
56. A possibility where an experimental group may be
treated in ways that are unintended and not
necessarily part of the method, yet which give them
an advantage of one sort or another.
Different individuals
implement different methods
different outcomes
Some individuals have a personal bias in favor of one
method over the other.
57. Evaluate the individuals who implement each method
on pertinent (relevant) characteristic, and then try to
equate the treatment groups on these dimensions.
To require that each method be taught by all teachers in
the study.
58. Some teachers may have different abilities to
implement the different methods.
Use several different individuals to implement each
method, thereby reducing the chances of an
advantage to either method.
Allow individuals to choose the method they wish to
implement.
Have all methods used by all implementers, but with
their preferences known beforehand.
59. Standardizing the conditions under which the study
occurs.
(Location, Instrumentation, Subject attitude and Implementation threats)
Obtaining and using more information on the subjects
of the study.
(Subject characteristics, Mortality, Maturation and Regression threats)
Obtaining and using more information on the details of
the study.
( Location, Instrumentation, History, Subject attitude and Implementation
threats)
Choosing an appropriate design.
60.
61. The degree to which results are generalizable, or
applicable, to groups and environments outside the
research setting.
The extent to which the results of a study can be
generalized determines the external validity of the
study.
External validity is related to generalizing. That's the
major thing you need to keep in mind.
62. A study that has a large, randomly selected sample or
a carefully matched sample is said to have external
validity.
Recall that validity refers to the approximate truth of
propositions, inferences, or conclusions
so external validity refers to the approximate truth of
conclusions the involve generalizations.
In simpler words, external validity is the degree to
which the conclusions in your study would hold for
other persons in other places and at other times.
63. Population refers to any set of people or events from
which the sample is selected and to which the study
results will generalize.
Population generalizability refers to the degree to which
a sample represents the population of interest.
However, if the results of a study only apply to the
group being studied and if that group is fairly small or is
narrowly defined, the usefulness of any findings is
seriously limited.
64. This is why trying to find a representative sample is so
important because researchers usually want the
results of an investigation to be as widely applicable
as possible.
Representativeness refers only to the essential, or
relevant, characteristics of a population.
In science there are two major approaches on how we
provide evidence for a generalization.
65. The first approach is the Sampling Model.
In the sampling model, you start by identifying the
population you would like to generalize to. Then, you
draw a fair sample from that population and conduct
your research with the sample. Finally, because the
sample is representative of the population, you can
automatically generalize your results back to the
population.
66. However, there are several problems with this
approach.
First, perhaps you don't know at the time of your
study who you might ultimately like to generalize to.
Second, you may not be easily able to draw a fair or
representative sample.
Third, it's impossible to sample across all times that
you might like to generalize to (like next year).
67. A non-random sample reduces the external validity of
the study.
Much medical research is done on the patients one
sees in the clinic, this is a non-random sample that is
not representative of a larger population. It will not
generalize because it is not a fatal flaw in the study.
A study with a non-random sample still identifies true
facts about the sample and perhaps those findings will
be true for others as well. It is best to define your
population first, and then obtain a random sample.
68. The sample size required depends on the
requirements of the study and size of the population.
As a rule the bigger the better. If the sample is too
small then the performance of a few individuals can
have a big effect on the data, and render the data less
representative of the population.
69. Researches should describe the sample as thoroughly
ad possible (in detail; age, gender, ethnicity and others) so that
interested others can judge for themselves the degree
that they want.
Replication; repeats the study using different
groups of subjects in different situations.
70. Have not been used;
Educational researches may be unaware of the hazards involved in
generalizing when one does not have a random sample.
It is simply not feasible for a researcher to invest the time, money or
other resources to obtain a random sample.
71. The degree to which the result of a study can be
extended to other settings or condition.
The researcher must ensure that the important aspect
must match in order to generalize the finding from
another study
What hold true for another
subject/material/condition/time doesn’t mean it will
remain true with the other
Hence, researcher must be careful in generalizing the
findings from another research with the other