2. Validity
• Validity can be defined as the degree to which a test measures what
it is supposed to measure. For example, does an intelligence test
really measure intelligence? Does a self-esteem scale really measure
self-esteem?
• In a research project there are several types of validity that may be
sought.
• These are content validity, construct validity (Convergent and
Discriminant), and criterion-related validity (Predictive and
concurrent). Few others are Face validity, Internal Validity, External
Validity, Conclusion validity.
3. Evaluating Validity
• This process requires empirical evidence. A measurement method
cannot be declared valid or invalid before it has ever been used and
the resulting scores have been thoroughly analyzed.
• It is an ongoing process. The conclusion that a measurement
method is valid generally depends on the results of many studies
done over a period of years. In fact, every new study using that
measurement method provides additional evidence for or against its
validity.
• Validity is not an all-or-none property of a measurement method. It
is possible for a measurement method to judged "somewhat valid"
or for one measure to be considered "more valid" than another.
• It is also the case that a measurement method might be valid for
one subject population or in one context, but not in another.
• For example, it would be fine to conclude that an English-language
achievement test is valid for children who are native English
speakers but not for children who are still in the process of learning
English.
4. Typesof Validity
There are four primary types of validity
1) Content Validity
2) Construct Validity: Convergent validity and divergent
validity
3) Criterion related Validity: Predictive and concurrent
Validity
Others are:
1) Internal Validity
2) External Validity
3) Face validity
4) Conclusion validity
5. Content Validity
• When we want to find out if the entire content
of the behavior/construct/area is represented in
the test we compare the test task with the
content of the behavior.
• This is a logical method, not an empirical one.
• Example, if we want to test knowledge on FMD
Vaccination it is not fair to have most questions
limited to the Vaccination of HS
6. Content validity
• Content validity occurs when the experiment provides
adequate coverage of the subject being studied.
• This includes measuring the right things as well as having an
adequate sample.
• Samples should be both large enough and be taken for
appropriate target groups.
• The perfect question gives a complete measure of all aspects
of what is being investigated. However in practice this is
seldom likely, for example a simple addition does not test the
whole of mathematical ability.
• Content validity is related very closely to good experimental
design. A high content validity question covers more of what is
sought. A trick with all questions is to ensure that all of the
target content is covered (preferably uniformly).
7. Construct validity
• Construct validity is the degree to which a test measures an
intended hypothetical construct.
• Many times psychologists assess/measure abstract attributes
or constructs.
• The process of validating the interpretations about that
construct as indicated by the test score is construct validation.
• This can be done experimentally, e.g., if we want to validate a
measure of anxiety.
• Example: We have a hypothesis that anxiety increases when
subjects are under the threat of an electric shock, then the
threat of an electric shock should increase anxiety scores
(note: not all construct validation is this dramatic!)
8. Construct validity
• The term construct in this instance is defined as a property
that is offered to explain some aspect of human behavior,
such as mechanical ability, intelligence, or introversion (Van
Dalen, 1979).
• The construct validity approach concerns the degree to which
the test measures the construct it was designed to measure.
• There are two parts to the evaluation of the construct validity
of a test.
• First and most important, the theory underlying the construct
to be measured must be considered.
• Second the adequacy of the test in measuring the construct is
evaluated (Mason and Bramble, 1989).
9. For example
• Suppose that a researcher is interested in measuring the introverted
nature of first year teachers.
• The researcher defines introverted as the overall lack of social skills
such as conversing, meeting and greeting people, and attending
faculty social functions.
• This definition is based upon the researcher’s own observations.
• A panel of experts is then asked to evaluate this construct of
introversion.
• The panel cannot agree that the qualities pointed out by the
researcher adequately define the construct of introversion.
• Furthermore, the researcher cannot find evidence in the research
literature supporting the introversion construct as defined here.
• Using this information, the validity of the construct itself can be
questioned. In this case the researcher must reformulate the
previous definition of the construct.
10. Construct validity…………..
• Once the researcher has developed a meaningful, useable construct,
the adequacy of the test used to measure it must be evaluated.
• First, data concerning the trait being measured should be gathered
and compared with data from the test being assessed.
• The data from other sources should be similar or convergent. If
convergence exists, construct validity is supported.
• After establishing convergence the discriminate validity of the test
must be determined.
• This involves demonstrating that the construct can be differentiated
from other constructs that may be somewhat similar.
• In other words, the researcher must show that the construct being
measured is not the same as one that was measured under a
different name.
11. Construct validity
• When the theoretical constructs of cause and effect
accurately represent the real-world situations they are
intended to model.
• This is related to how well the experiment is
operationalized.
• Construct validity is thus an assessment of the quality of
an instrument or experimental design.
It is of 2 types:
12. a) Convergent validity
• we examine the degree to which the operationalization is
similar to (converges on) other operationalizations that it
theoretically should be similar to
Occurs where measures of constructs that are expected
to correlate do so.
Eg. The convergent validity of a test of arithmetic skills,
we might correlate the scores on our test with scores on
other tests that purport to measure basic math ability,
where high correlations would be evidence of
convergent validity.
This is similar to concurrent validity
13. b) Discriminant validity
Here we examine the degree to which the
operationalization is not similar to (diverges from) other
operationalizations that it theoretically should be not be
similar to.
Occurs where constructs that are expected not to relate do
not, such that it is possible to discriminate between these
constructs.
Example: to show the discriminant validity of a test of arithmetic
skills, we might correlate the scores on our test with scores on
tests that of verbal ability, where low correlations would be
evidence of discriminant validity.
14. Discriminant validity
• Imagine, for example, that a researcher with a new measure
of self-esteem claims that self-esteem is independent of
mood; a person with high self-esteem can be in either a good
mood or a bad mood (and a person with low self-esteem can
too).
• Then this researcher should be able to show that his self-
esteem measure is not correlated (or only weakly correlated)
with a valid measure of mood.
• If these two measures were highly correlated, then we would
wonder whether his new measure really reflected self-esteem
as opposed to mood.
15. Discriminant validity
• Another example, there is a need for recognition scale that
measures the extent to which people like to be appreciated,
which is supposed to be largely independent of people’s
intelligence.
• Just because someone is intelligent does not mean that he or
she has a high need for recognition, and just because
someone is less intelligent does not mean that he or she has a
low need for cognition.
• In this case, one would expect to see that need for recognition
scores and intelligence test scores were not highly
correlated. Otherwise it would look like need for recognition
was just another measure of intelligence.
16. Cont.
• Convergent validity and Discriminant validity together
demonstrate construct validity
• Convergence and discrimination are often demonstrated
by correlation of the measures used within constructs.
17. Criterion-related validity
• This examines the ability of the measure to predict a variable
that is designated as a criterion.
• A criterion may well be an externally-defined 'gold standard'.
• Achieving this level of validity thus makes results more
credible.
• Criterion-related validity is related to external validity.
• This approach is concerned with detecting the presence or
absence of one or more criteria considered to represent traits
or constructs of interest.
• One of the easiest ways to test for criterion-related validity is
to administer the instrument to a group that is known to
exhibit the trait to be measured.
• This group may be identified by a panel of experts.
18. Criterion-related validity
• For example, suppose one wanted to develop an instrument
that would identify teachers who are good at dealing with
abused children.
• First, a panel of unbiased experts identifies 100 teachers out
of a larger group that they judge to be best at handling abused
children.
• The researcher develops 400 yes/no items that will be
administered to the whole group of teachers, including those
identified by the experts.
• The responses are analyzed and the items to which the expert
identified teachers and other teachers responding differently
are seen as those questions that will identify teachers who are
good at dealing with abused children.
19. Criterion-related validity
• For example,
• an IQ test should correlate positively with school
performance.
• An occupational aptitude test should correlate positively with
work performance.
• A new measure of self-esteem should correlate positively with
an old established measure.
20. Predictive validity
• It is a measure of how well a test predicts abilities.
• This measures the extent to which a future level of a variable
can be predicted from a current measurement.
• It involves testing a group of subjects for a certain
construct and then comparing them with results
obtained at some point in the future.
• we assess the operationalization's ability to predict something
it should theoretically be able to predict.
• For example, a political poll intends to measure future voting
intent.
• College entry tests should have a high predictive validity with
regard to final exam results.
21. Example
• For instance, we might theorize that a measure of math ability
should be able to predict how well a person will do in an
engineering-based profession.
• We could give our measure to experienced engineers and see
if there is a high correlation between scores on the measure
and their salaries as engineers.
• A high correlation would provide evidence for predictive
validity –
• it would show that our measure can correctly predict
something that we theoretically think it should be able to
predict.
22. Concurrent validity
• This measures the relationship between measures made with
existing tests. The existing tests is thus the criterion.
• It measures the test against a benchmark test and high
correlation indicates that the test has strong criterion
validity.
• For example a measure of creativity should correlate with
existing measures of creativity.
23. Face Validity
• Basically face validity refers to the degree to which a test
appears to measure what it purports to measure.
• Occurs where something appears to be valid.
• This of course depends very much on the judgment of the
observer.
• Measures often start out with face validity as the researcher
selects those which seem likely prove the point.
• Face validity is a measure of how representative a research
project is ‘at face value,' and whether it appears to be a good
project
24. Face validity (Example)
• In face validity we look at the operationalization and see whether
"on its face" it seems like a good translation of the construct.
• This is probably the weakest way to try to demonstrate construct
validity.
• For instance, you might look at a measure of math ability, read
through the questions, and decide that yes, it seems like this is a
good measure of math ability (i.e., the label "math ability" seems
appropriate for this measure).
• Of course, if this is all you do to assess face validity, it would clearly
be weak evidence because it is essentially a subjective judgment
call. (Note that just because it is weak evidence doesn't mean that it
is wrong. We need to rely on our subjective judgment throughout
the research process. It's just that this form of judgment won't be
very convincing to others.)
25. Example contd..
• We can improve the quality of face validity assessment
considerably by making it more systematic.
• For instance, if you are trying to assess the face validity of a
math ability measure, it would be more convincing if you sent
the test to a carefully selected sample of experts on math
ability testing and they all reported back with the judgment
that your measure appears to be a good measure of math
ability.
26. Internal Validity
• Internal validity occurs when it can be concluded that
there is a causal relationship between the variables being
studied.
• A danger is that changes might be caused by other
factors.
• It is a measure which ensures that a researcher's
experiment design closely follows the principle of cause
and effect.
• Indicate values of the dependent variables solely the
result of the manipulations of the independent variables.
27. ExternalValidity
• Occurs when the causal relationship discovered can
be generalized to other people, times and contexts.
• To what extent can an effect in research, be generalized to
populations, settings, treatment variables, and measurement
variables
• Correct sampling will allow generalization and hence give
external validity.
28. Conclusion validity
• Conclusion validity occurs when you can conclude that there is
a relationship of some kind between the two variables being
examined.
• This may be positive or negative correlation.
29. Criteria for Validity of a scale
1. Logical validation:
• According to this principle, a scale would be said to be valid if it
stands to our commonsense reason.
• Thus in a social distance scales for Americans, if greater distance is
shown towards English man and less towards Koreans, the scale
seems to be erroneous at the vary face as it can hardly be believed
that Americans would have shown greater preferences to Koreans.
• This method of validity is not very reliable because of too much
subjectivity.
• In the above case the distinction of attitudes is operant and great. At
time the distance is so small that it can not be located by
commonsense and in that case the validity cannot be tested in this
way. It is thus a crude method applicable only when the difference is
too great.
30. Criteria of Validity of a Scale
2. Jury opinion:
This is more reliable and frequently used method.
In this case we do not rely upon the commonsense of one
person but on the judgment of a number of person.
In this respect this method is superior too and less subject to
bias, than the previous method.
But it cannot be said to be completely free from bias.
A large number of people are not necessarily free from bias in
their opinion.
A more objective measurement may therefore be used for this
purpose.
•
31. Criteria of Validity of a Scale
3. Known group: According to this method the scale is
administered among the person who are known to hold a
particular opinion or belong to particular category and the
results are than compared with the known facts.
• Example- If attitude of a person toward communism is to be
measured we shall deliberately select person who are either
known conservative or staunch communists and implement
the scale and assess the results thus secured. If they tally with
the known results, the scale would be said to be valid.
• This method is very much similar to logical validity and
therefore has more or less the same weakness.
• The only difference in this case in that we have to rely upon
our knowledge of facts rather than on logical reasoning.
•
32. Criteria of Validity of a Scale
4. Independent Criteria: According to this method the scale is
tested on the basis of various variable involved. If all or most of
the tests show the same results the scale would be said to be
valid.
• Example- The social status of a person depends upon various
factors viz, economic status, education sociability etc.
• The scores obtained by person are tested on these various
accounts and if found correct the scale may be said to be valid.
• The main difficulty in this test series because of the fact that
independent criteria in themselves may not be good index,
but it is generally not so and method is therefore, thought to
be the most reliable one for testing the validity of a scale.