2. INTRODUCTION
1. Content validity
2. Criterion-related validity
3. Other forms of evidence for construct validity
4.Validity in scoring
5. Face validity
6. How to make more valid tests
3. Content
validity
It refers to how accurately an assessment or measurement tool
taps into various aspects of the specific construct in question. In
other words, do the questions really assess the construct in
question?
A test needs to be related to the content of the class (relevant
content).
HOW TO JUDGE IT? We need a specification of the skills or
structures that the test is meant to cover.
Not all the course content needs to appear in the test.
4. The
importance of
content
validity
Language specifications provide the test constructor with a basis
for making a principled selection of elements to include in the test.
A comparison between test specification and test content is the
basis for judgments related to content validity.
The greater a test’s content validity, the more likely is to be an
accurate measure of what it is supposed to measure.
5. ACTIVITY 1
1. According to the CEFR, which specification of the language
skills would you need to take into account to test an A1 student?
PLEASE USE YOUR GADGETS TO HAVE ACCES TO THE CEFR
LANGUAGE SPECIFICATION.
2. Do you think teachers care about those specifications while
testing a student?
6. Criterion-
related validity
It is the degree to which test results agree with those provided by
some independent and highly dependable assessment of the
candidate’s ability.
The independent assessment is the criterion measure against
which the test is validated
TWOTYPES:
CONCURRENT VALIDITY PREDICTICEVALIDITY
7. CONCURRENT
VALIDITY
It refers to the extent to which the results of a particular test, or
measurement correspond to those of a previously established
measurement for the same construct.
Is it possible to test everything you need to test in a short time?
This will always depend on how many functions are tested in the
component, and how representative they are among the complete
set of functions including in the objectives.
8. How the level
of agreement
is measured?
Using the “correlation coefficient”. This is a mathematical measure
of similarity.
Perfect agreement= 1
Total lack of agreement= 0
The level of agreement is regarded as satisfactory, depending on
the purpose of the test and the decisions that are made based on it.
9. PREDICITVE
VALIDITY
This topic concerns the degree to which a test can predict a
candidate’s future performance.
How helpful is it to use final outcomes as the criterion measure
when so many factors other than ability in English (such as
subject knowledge, intelligence, motivation, health and
happiness) will have contributed to every outcome?
Example: placement tests
10. Other forms
of evidence
for construct
validity
We cannot be sure that the items of the test are measuring what
we expect them to measure.
Construct validity: “construct” refers to any underlying ability
that is hypothesized in a theory of language ability.
It is important to establish if distinct abilities exist, if they can be
measured and if they are measured in a test.
Research is needed for evidence.
11. Another way of obtaining evidence about the construct validity of a
test is to investigate what test takers actually do when they respond
to an answer.
TWO PRINCIPAL METHODS:
THINK ALOUD RETROSPECTION
Test taker voice
their thoughts as
they respond to
the item.
They try to recollect
what their thinking
was, as to they
responded.
Problem: The very
voicing of thoughts may
interfere with what would
be the natural response of
the item
Problem: Thoughts may
be forgotten
12. VALIDITY IN
SCORING
It is worth pointing out that if a test is to have validity, not only the
item must be valid. Also the way in which the responses are scored
must be valid.
If the test is meant to test one specific skill and while grading
another one interferes with the scoring process, then the test is
not valid.
13. ACTIVITY 2
To discuss:
Do the PUCE language tests lack scoring validity?Yes or no?
So, why do teachers take away points when they are grading the
reading and comprehension part?
What is the reason to do so? Do you think this is fair for the
student?
14. FACE
VALIDITY
A test is said to have face validity if it looks as if it measures what
it is supposed to measure.
A test which does not have face validity may not be accepted by
candidates, teachers, education authorities or employers.
15. How to make
tests more
valid
Write explicit specifications for the test which take into account all
that is known about the constructs that are to be measured.
Make sure that you include a representative sample of the content
of these in the test.
Use direct testing.
Make sure that the scoring responses relates directly to what is
being tested.
Do everything possible to make the test reliable. If the test is not
reliable, it cannot be valid.
16. ACTIVITY 3
In pairs answer the following test. Then, check if it has validity
enough to say that the student knows all the content of the A1
level.
Would you say that the test is measuring the most important
content of the A1 level?
Which contents would you test instead of those and why?
17. Conclusion
It is important to take into account the language specifications.
It is necessary to do research to say that a test has content validity.