Chapter 8 compilation

Validity refers to the appropriateness,

meaningfulness, correctness and usefulness of the
inferences a researcher makes.
Reliability refers to the consistency of scores or

answers from one administration of an instrument to
another, and from one set of items to another.

Validation is the process of collecting and analyzing

evidence to support inferences.
The term validity refers to the degree to which

evidence supports any inferences a researcher makes
based on the data he/she collects using a particular
instrument.
The validation process is on not on the instrument

but on the inferences a researcher makes.

An appropriate inference would be the one that is

relevant ( related to the purposes of the study).
A meaningful inference says something about the

meaning of the information (such as test scores)
obtained through the use of an instrument.
Validity depends on the amount and type of evidence

there is to support the interpretations researchers
wish to make concerning data they have collected.

1) Content-related evidence of validity
2) Criterion-related evidence of
validity
3) Construct-related evidence of validity

This evidence refers to the content and format of the

instrument.
The content and format must be consistent with the

definition of the variable and the sample of subjects
to be measured.
One key element in this kind of evidence is the

adequacy of the sampling.

Content validation is a matter of determining if the

content that the instrument contains is an adequate
sample of the domain of content it is supposed to
represent.
The other aspect of content validation has to do with

the format of the instrument such as clarity of printing
and appropriateness of language.
However, valid results cannot be obtained if an

adequate questions in an instrument presented in an
inappropriate format ( such as giving a test written in
English to children whose English is minimal).

A common way to do this is to have someone look at

the content and format of the instrument and judge
whether or not it is appropriate.
However, the qualifications of the judges are always

in important consideration, and the judges must keep
in mind the characteristics of the intended sample.

Criterion – A second test or other assessment

procedure presumed to measure the same variable.
Researcher usually will compare the performance
from one instrument with other performance
obtained from another criterion.

Concurrent
Allow a time interval to
elapse between
administration of
instrument and obtaining
criterion score

Both data from the
instrument and criterion
are collected in near same
time

A key index to both forms of criterion-related

evidence.
Symbolized by the letter r
Indicating the degree of relationship that exist
between the scores individuals obtained by the two
instrument
Can have positive and negative relationship.
The difference between +1.00 and -1.00
If the r is .00, it means that there is no relation.

 More typically associated with research studies than

testing.
 Multiple sources are used to collect evidence.
 A combination of observation, surveys, focus groups,
and other measures are used to identify how much of
the trait being measured is possessed by the observee.

 The variable being measured is clearly

defined.
 Hypotheses, are formed about how people
who possess a lot versus a little of the
variable will behave in a particular
situation.
 The hypotheses are tested both logically
and empirically.

Reliability refers to the consistency of the scores

obtained.
If scores are completely inconsistent for a person,

they provide no useful information.
The distinction between reliability and validity is

shown in Figure 8.2
Indeed, if the data is unreliable, it cannot be valid but

if the data is valid, it will always be reliable to be used.

There are many factors lead to errors of measurements

such as:
1) Differences in motivation
2) Energy
3) Anxiety
4) Different testing situation
A reliability coefficient expresses relationship between

scores of the same individuals on the same instrument at
two different times, or on two parts of the same
instrument.

1) Test-retest method
2) Equivalent forms method
3) Internal-consistency methods

 Test-retest reliability is a measure of reliability obtained

by administering the same test twice over a period of time
to a group of individuals. The scores from Time 1 and
Time 2 can then be correlated in order to evaluate the test
for stability over time.

 A reliability coefficient is then calculated to indicate the

relationship between the two sets of scores obtained.
 Stability of scores over a two- to three-month is usually

viewed as sufficient evidence of test-retest reliability for
most educational research.

In reporting test-retest reliability coefficients, the

time interval between the two testings should always
be reported.
As an example, a test designed to assess student

learning in psychology could be given to a group of
students twice, with the second administration
perhaps coming a week after the first. The obtained
correlation coefficient would indicate the stability of
the scores.

The shorter the time gap, the higher the correlation;

the longer the time gap, the lower the correlation.
 This is because the two observations are related over

time -- the closer in time we get the more similar the
factors that contribute to error.
Since this correlation is the test-retest estimate of

reliability, you can obtain considerably different
estimates depending on the interval.

Two different but equivalent forms of an instrument

are administered to the same group of people.
Knows as ‘alternate’ or ‘parallel’.
Containing the same content but constructed
differently
High coefficient will indicate a strong evidence of
reliability and vice versa
Can be combined with test-retest where the
coefficient will also cover the consistency over time.

Correlation coefficient will be calculated from the

scoring two halves (usually the odd vs. the even) of
the test.
The internal consistency of the test will be described
by the relativity of the two halves.
Calculated using the Spearman-Brown Prophecy
Formula (pg. 156).
The reliability of the test may be increased by adding
more item to it (have to be similar to the original
item.)

Most frequently applied method in determining the

internal consistency.
Need 3 information
 Number of items on the test – K
 The mean – M
 Standard deviation of the test – SD

Example – pg. 157

Also known as Cronbach alpha
Is a general form of the KR20 formula
To be used in calculating the reliability of items that

are not scored by right vs. wrong (more than one
answer is possible)

 An index that shows the extent to which a measurement

would vary under changed circumstances.
 Hence, there are many possible standard errors for a given
score.
eg: IQ tests;
SMEas over one year period with different content = 5 points
10-year period

= 8 points’

 We doubled standard errors in measurement in computing

the ranges within which the second score is expected to fall.
 This was done so 95% sure that our estimates were correct.

 Instrument s that use direct observation are highly

vulnerable to observer differences.
 Researchers are obliged to investigate and report the
degree of scoring agreement
Enhanced by training the observers and increasing
the number of observation periods.

 All researches should ensure that any inferences they

draw that are based on data obtained through the use
of an instrument are appropriate, credible, and
backed up by evidence.

Internal validity refers to the degree to which

observed differences on the dependent variable are
directed to the independent variable, not to some
other (uncontrolled) variable.
When a study has internal validity, it means that any

relationship observed between two or more variables
should be unambiguous as to what it means rather
than being due to “something else”.

 The “something else” may be any one (or more) of a

number of factors, such as the age or the ability of the
subjects, the conditions under which the study is
conducted, or the types of materials used.
 If these factors are not in some way or another

controlled or accounted for, the researcher can never
be sure that they are not the reason for any observed
results.
In qualitative research, a study is said to have good

internal validity if the alternative explanations (the
“something else”) have been systematically ruled out.

Regardless of whether the study is qualitative or

quantitative, if these “rival hypotheses” are not
controlled or accounted for in some way, the
researcher can never be sure that they are not the
reason for any observed results.

1) Subject characteristics
2) Loss of subjects (Mortality)
3) Location
4) Instrumentation
5) Data collector characteristics

6) Testing
7) History
8) Maturation
9) Attitude of subjects
10)Regression
11) Implementation

Selection bias happens when the selection of people

for a study may result in the individuals (or groups)
differing from one another in unintended ways that
are related to the variables to be studied.
Some examples that might affect the results of a study

include:
1) Age
2) Strength
3) Maturity
4) Gender ethnicity

5) Coordination
6) Speed
7) Intelligence
8) Vocabulary
9) Attitude
10) Reading ability
11) Fluency
12) Manual dexterity
13) Socioeconomic status
14) Religious beliefs
15) Political beliefs

Mortality threat refers to the possibility that results

are due to the fact that subjects who are for whatever
reason “lost” to a study may differ from those who
remain so that their absence has an important effect
on the results of the study.
This threat is due to some reasons such as illness,

family relocation, requirements of other activities or
some individuals may drop out of the study.

 Loss of subjects not only limits generalizability but also

introduce bias- if those subjects who are lost would have
responded differently from those from whom data were
obtained.
However, there is an attempt to eliminate the problem of

mortality is to provide evidence that the subjects lost
were similar to those remaining on pertinent
characteristics such as age, gender, ethnicity, pretest
scores, or other variables that presumably that might be
related to the study outcomes.
Indeed, the best solution to this threat is preventing or

minimizing the loss of subjects.

Experimental mortality which is also known as the loss of

subjects.

Example:
In a Web-based instruction project entitled Eruditio, it
started with 161 subjects and only 95 of them completed
the entire module. Those who stayed in the project all
the way to end may be more motivated to learn and thus
achieved higher performance.

Location threat happens when the particular

locations in which an intervention is carried out, may
create alternative explanations for results.
The best method to control this problem is to hold

location constant-that is, keep it the same for all
participants.

The particular location in which data are collected or

in which an intervention is carried out, may create
alternative explanations for results. This is called a
location threat.
Example:
Classrooms in which students are taught may have
more or less resources, workstations, lighting, or
teachers who may skew the results inadvertently. The
location in which tests are administered may affect
responses. Parent assessments of their children may be
different when done at home than at school or if done
individually or in groups.

Student performance on tests may be lower if tests are

given in noisy or poorly lighted rooms. Observations of
student interaction may be affected by the physical
arrangement in certain classrooms.
The best method of control for a location threat is to

hold location constant that is, keep it the same for all
participants.

Instrumentation is the process where instruments and

procedures are used in collecting data in a study.

Instrument decay happens when instrumentation

creates problem if the nature of the instrument
(including the scoring procedure) is changed in some
way or another.
This is often the case when instrument permits

different interpretations of results (as in essay tests) or
is especially long or difficult to score, thereby resulting
in fatigue of the scorer.

Instruments are devices used by researchers to collect

information.
Examples are: questionnaires, surveys, tests,

observation, participation, studies….
Instrument Decay can be a problem if the nature of

the instrument is changed over time.
 This may be due to fatigue or repetition on the part

of the person administering the test, taking the test,
or correcting the test.

Fatigue often happens when a researcher scores a

number of tests one after the other; he/she becomes
tired and scores the tests differently.
For example, more rigorously at first, more

generously later.
Data Collector Characteristics is an inevitable part of

most instruments and can affect results. The
individual who collects the data may affect the results
unintentionally.

Example:
People may be more willing to be interviewed by females
rather than males. Other characteristics could be
language patterns, ethnicity, age, size….
Also, individuals may present information, researchers

may collect data differently, or counselors may use
different tactics when presenting orally.
These threats, know as the implementer effect need to be

controlled for as much as possible.

The characteristics of the data gatherers may tamper

the data.
 Female data collector will elicit more of a confession

from the situation compared

Primary ways to control the threat
 Use the same data collector(s)
 Analysing data separately between each collector
 Ensuring each collector were used equally in a group

setting

The data collectors or the scorer may unconsciously

distort the data
 More time allowed in the exam
 Interviewers asking leading questions

Technique to handle data collectors bias :
 Standardize all procedure
Require some training

 Ensure data collectors lack the information that require

them to distort the data
Also known Planned Ignorance

Testing : the use of any instrumentation.
Testing Threat : the subjects already ‘practiced’ the

post-test using the pre-test given to them prior to the
subject
A pre-test sometimes regarded as a practice thus
making the subjects alert/aware of the questions.

One or more unanticipated and unplanned event

occur during the course of the study that can affect
the result/outcome
 Construction noises
 Death of a certain eminent person

Researcher must be alert on any events or occurrence

during the study.

During an intervention, the change happen with the

influence of time rather than the intervention itself.
It is a serious threat to pre-post studies or studies that
span over years of time.
The best way overcome this is to have a good
comparison group in the study.

 Hawthorne Effect
•

This positive effect, resulting from increased attention and recognition
of subjects
eg: productivity increased were made in physical working conditions
(increase in the number of breaks)

 Opposite Effect
 The negative effect, resulting in becoming demoralized or resentful and

hence perform more poorly than the treatment group.

eg: productivity decreased when the control group receive no treatment
at all
 Remedy
•

Provide the control or comparison group(s) with a special treatment
comparable to that received by the experimental group.

 Presence of regression threat
 Change is studied in a group that is extremely low or

high in its preintervention performance.

eg: a class of students of markedly low ability may
are given special help. Six months later, their
average score on test involving similar problem
has improved, but not necessarily because of the
special help

 A possibility where an experimental group may be

treated in ways that are unintended and not
necessarily part of the method, yet which give them
an advantage of one sort or another.
 Different individuals

implement different methods

different outcomes
 Some individuals have a personal bias in favor of one

method over the other.

Evaluate the individuals who implement each method

on pertinent (relevant) characteristic, and then try to
equate the treatment groups on these dimensions.
 To require that each method be taught by all teachers in

the study.

Some teachers may have different abilities to

implement the different methods.
 Use several different individuals to implement each

method, thereby reducing the chances of an
advantage to either method.
 Allow individuals to choose the method they wish to

implement.
 Have all methods used by all implementers, but with

their preferences known beforehand.

 Standardizing the conditions under which the study

occurs.
(Location, Instrumentation, Subject attitude and Implementation threats)

 Obtaining and using more information on the subjects

of the study.
(Subject characteristics, Mortality, Maturation and Regression threats)

 Obtaining and using more information on the details of

the study.
( Location, Instrumentation, History, Subject attitude and Implementation
threats)

 Choosing an appropriate design.

The degree to which results are generalizable, or

applicable, to groups and environments outside the
research setting.
The extent to which the results of a study can be

generalized determines the external validity of the
study.
External validity is related to generalizing. That's the

major thing you need to keep in mind.

A study that has a large, randomly selected sample or

a carefully matched sample is said to have external
validity.
Recall that validity refers to the approximate truth of

propositions, inferences, or conclusions
so external validity refers to the approximate truth of
conclusions the involve generalizations.
In simpler words, external validity is the degree to

which the conclusions in your study would hold for
other persons in other places and at other times.

Population refers to any set of people or events from

which the sample is selected and to which the study
results will generalize.
Population generalizability refers to the degree to which

a sample represents the population of interest.
However, if the results of a study only apply to the

group being studied and if that group is fairly small or is
narrowly defined, the usefulness of any findings is
seriously limited.

This is why trying to find a representative sample is so

important because researchers usually want the
results of an investigation to be as widely applicable
as possible.
Representativeness refers only to the essential, or

relevant, characteristics of a population.
In science there are two major approaches on how we

provide evidence for a generalization.

The first approach is the Sampling Model.
In the sampling model, you start by identifying the

population you would like to generalize to. Then, you
draw a fair sample from that population and conduct
your research with the sample. Finally, because the
sample is representative of the population, you can
automatically generalize your results back to the
population.

However, there are several problems with this

approach.
First, perhaps you don't know at the time of your

study who you might ultimately like to generalize to.
Second, you may not be easily able to draw a fair or

representative sample.
Third, it's impossible to sample across all times that

you might like to generalize to (like next year).

A non-random sample reduces the external validity of

the study.
Much medical research is done on the patients one

sees in the clinic, this is a non-random sample that is
not representative of a larger population. It will not
generalize because it is not a fatal flaw in the study.
A study with a non-random sample still identifies true

facts about the sample and perhaps those findings will
be true for others as well. It is best to define your
population first, and then obtain a random sample.

The sample size required depends on the

requirements of the study and size of the population.
As a rule the bigger the better. If the sample is too

small then the performance of a few individuals can
have a big effect on the data, and render the data less
representative of the population.

Researches should describe the sample as thoroughly

ad possible (in detail; age, gender, ethnicity and others) so that
interested others can judge for themselves the degree
that they want.
 Replication; repeats the study using different

groups of subjects in different situations.

 Have not been used;
 Educational researches may be unaware of the hazards involved in
generalizing when one does not have a random sample.
 It is simply not feasible for a researcher to invest the time, money or

other resources to obtain a random sample.

The degree to which the result of a study can be

extended to other settings or condition.
The researcher must ensure that the important aspect
must match in order to generalize the finding from
another study
What hold true for another
subject/material/condition/time doesn’t mean it will
remain true with the other
Hence, researcher must be careful in generalizing the
findings from another research with the other

Chapter 8 compilation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Chapter 8 compilation

Similar to Chapter 8 compilation (20)

Recently uploaded

Recently uploaded (20)

Chapter 8 compilation

Editor's Notes