validity and reliability

 Subject:
 Topic :
validity & reliability

 Submitted to:
DR.AYAZ KHAN
Submittes by: (gropu members)
 Abdul Rehman 47
 Affera Mujahid 01
 Iqra ilyas 16
 Ishrat Fatima 17
 Waqas Ahmad 59

 Very simply, validity is the extent to which a test
measures what it is supposed to measure. We can
divide the types of validity into logical and
empirical.

Content Validity:
Face Validity:
Criterion-Oriented or Predictive Validity:
Concurrent Validity:
Construct Validity:

 When we want to find out if the entire
content of the behavior/construct/area is
represented in the test we compare the test
task with the content of the behavior. This is
a logical method, not an empirical one.

Basically face validity refers to the degree to
which a test appears to measure what it
purports to measure.
Face validity is a measure of how
representative a research project is ‘at face
value,' and whether it appears to be a good
project.

 When you are expecting a future
performance based on the scores obtained
currently by the measure, correlate the
scores obtained with the performance. The
later performance is called the criterion and
the current score is the prediction.

 Concurrent validity is the degree to which
the scores on a test are related to the scores
on another, already established, test
administered at the same time, or to some
other valid criterion available at the same
time.

 Construct validity is the degree to which a
test measures an intended hypothetical
construct.

 Internal validity
 Extent to which an observed outcome
can be attributed to a planned intervention

 Refers to the occurrence of events that could
alter the outcome or the results of a study.
 Concurrent history – occurs during the study
 Eg. Studying the effectiveness of using
musical activities to teach mathematics
concepts. While one teacher uses the
standard curriculum, another teacher is using
the musical activities curriculum

 Pertains to any changes that occur in the subjects during
the course of the study that are not part of the study
and that might affect the results of the study.
 Biological ( growth processes)
 Eg. Weight gain or increase in height due to
breakfast or lunch program
 Psychological (learning or development)
 Eg. Effects of certain instructional techniques on
concept learning of sixth graders, attainment of
certain operational thought during that period has to
be considered

 Is concerned with the effects on the outcome
of a study of the inconsistent use of a
measurement instrument (what the
instrument is measuring changes during the
duration of the study).
 Eg. The effects of fatigue on an achievement
test

 Relates to the possible effects of a pretest on
the performance of participants in a study on
the posttest.
 May alert subjects to the fact that they are being
studied
 May affect performance on later administrations

 Refers to the tendency of extreme scores to
move (or regress) toward the mean score on
subsequent retesting.
 Eg. Students scoring below 25% (lowest extreme)
on an IQ test are given a posttest. A higher
posttest score is expected.

 Refers to the loss of subjects from a study
due to their initial nonavailability or
subsequent withdrawal from the study.
 Eg. More high-scoring people drop out from the
experimental group than from the control group

 Pertains to the possibility that groups in a
study may possess different characteristics
and that those differences may affect the
results.
 Differences in age, ability, gender or
racial/ethnic composition, or any of an almost
unlimited number of ways.

 Refers to the extent to which the results of a
research study are able to be generalized
confidently to a group larger than the group
that participated in the study. (Bracht &
Glass)

 Given that there is probably a causal
relationship from construct A to construct B,
how generalizable is this relationship across
persons, settings, and times?
 Interaction of selection and treatment (sample to
population)
 Interaction of setting and treatment
 Interaction of history and treatment (time)

 Reliability is the degree to which a test
consistently measures whatever it measures.
 Errors of measurement that affect reliability
are random errors and errors of measurement

 Test-retest reliability (same people different
time)
 Parallel forms reliability (different people,
same time, different test)
 Inter-rater reliability (different people,
same test)
 internal consistency reliability (different
question, same construct)
A. Average inter-item correlation
B. Split-half reliability

 Test-retest
An assessment or test of a person should give
the same results whenever you apply the test
 Test-retest reliability evaluates reliability
across time.
 This method is particularly used in
experiments

 Various questions for a personality test are
tried out with a class of students over
several years. This helps the researcher
determine those questions and combinations
that have better reliability

When multiple people are giving assessments of some
kind or are the subjects of some test, then similar
people should lead to the same resulting scores. It can
be used to calibrate people, for example those being
used as observers in an experiment.
Inter-rater reliability thus evaluates reliability across
different people.
Two major ways in which inter-rater reliability is used
are (a) testing how similarly people categorize items,
and (b) how similarly people score items.

This is the best way of assessing
reliability when you are using
observation, as observer bias very
easily creeps in. It does, however,
assume you have multiple observers,
which is not always the case.
Inter-rater reliability is also known
as inter-observer reliability

Two people may be asked to categorize
pictures of animals as being dogs or cats. A
perfectly reliable result would be that they
both classify the same pictures in the same
way.

 Used to assess the consistency of the results
of two tests constructed in the same way
from the same content domain.

 Used to assess the consistency of results
across items within a test.
Average inter-item correlation:
is a subtype of internal consistency
reliability. It is obtained by taking all of the
items on a test that probe the same construct
(e.g., reading comprehension), determining
the correlation coefficient for each pair of
items, and finally taking the average of all of
these correlation coefficients.

 is another subtype of internal consistency
reliability. The process of obtaining split-half
reliability is begun by “splitting in half” all
items of a test that are intended to probe
the same area of knowledge (e.g., World War
II) in order to form two “sets” of items.

 Test length
 Test-retest interval
 Variability of scores
 Guessing
 Variation within the test situation

validity and reliability

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to validity and reliability

Similar to validity and reliability (20)

Recently uploaded

Recently uploaded (20)

validity and reliability