Principles of language assessment

Principles of Language
Assessment
Prepared By: Ameer Salman Hussein

Principles of Language Assessment
To say that this test is effective, dependable, and even
accurately measures what we want it to measure, the test
should meet five cardinal criteria for (testing a test). These
criteria are:
1. Practicality.
2. Reliability
3. Validity
4. Authenticity
5. Washback

Practicality
The test is practical when it:
1. Is not excessively expensive. (it should save both time and money)
2. Stays within appropriate time constraints. (A test that Requires
individual monitoring is impractical for a group of hundred students
and little number of examiners.)
3. Is relatively easy to administer. (it shouldn’t take hours to evaluate)
4. Has a scoring procedure that is specific and time-efficient (doesn’t
scored only by computer).

Reliability
Reliability is the degree of consistency of a measure. A test will be
reliable when it gives the same repeated result under the same
conditions.
There are FOUR sub-divisions of reliability, they are:
1. Student-Related reliability.
2. Rater-Reliability.
3. Test Administration Reliability.
4. Test Reliability.

1. Student-Related reliability:
The most common learner-related issue in reliability is
caused by temporary illness, fatigue, a "bad day"
anxiety, and other physical or psychological factors,
which may make an "observed" score deviate from
one's "true" score.

2. Rater-Reliability:
Human error, subjectivity, and bias may enter into the scoring process. Here, we
have to differentiate between two types of (rater-reliability), i.e., Inter-rater
reliability and Intra-rater reliability.
Inter-rater reliability occurs when two or more scorers yield inconsistent
scores of the same test, possibly for lack of attention to scoring criteria,
inexperience, inattention, or even preconceived biases.
Intra-rater reliability is common occurrence for classroom teachers because of
unclear scoring criteria, fatigue, bias toward particular "good and "bad" students,
or simple carelessness.
In tests of writing skills, rater reliability is particularly hard to achieve since writing
proficiency involves numerous traits that are difficult to define. The careful specification of an
analytical scoring instrument, however, can increase rater reliability.

3. Test Administration Reliability:
Unreliability may also result from the conditions in which the test is
administered, such as administration of aural test by means of
comprehension test tools. Other sources of unreliability are found in
photocopying variations, the amount of light in different parts of the
room, variations in temperature, and even the condition of desks
and chairs.

4. Test Reliability:
Sometimes the nature of the test itself can cause measurement
errors. If a test is too long. test-takers may become fatigued by
the time they reach the later items and hastily respond
incorrectly. Timed tests may discriminate against students who
do not perform well on a test with a time limit. Also, Poorly
written test may be a further source of test unreliability.

Validity
Validity can be defined as “the extent to which inferences made from
assessment results are appropriate, meaningful, and useful in terms of
the purpose of the assessment”. Simply, the valid test of reading
measures reading ability and not other one. To make a test for the
validity of writing ability, we have to pay attention to the
comprehensibility, rhetorical, discourse elements and the organization of
ideas rather than just collecting a number of words.

How is the validity of a test established?
There are several different kinds of evidence that may be examined to
support the validity. They are:
1. Content-Related Evidence.
2. Criterion-Related Evidence.
3. Construct-Related Evidence.
4. Consequential Validity.
5. Face Validity.

1. Content-Related Evidence.
Content validity refers to the extent to which the items on a test
are fairly representative of the entire subject that the test seeks
to measure.
For example, if you are trying to assess a person’s ability to
speak a second language in a conversational setting, asking the
learner to answer paper-and-pencil multiple-choice questions
requiring grammatical judgments lacks content validity

There are a few cases of understanding content validity:
- It is possible to contend, for example, that standard language proficiency tests,
with their context-reduced and academically oriented language, lack content
validity since they do not require the full spectrum of communicative
performance on the part of the learner
- Another way is to consider the difference between direct and indirect testing.
Direct testing involves the test-taker in actually performing the target task. While
in an indirect test, learners are not performing the task itself but rather a task that
is related in some way.
- The most important rule for achieving content validity in classroom assessment
is to test performance directly. Consider, for example, a listening/speaking class
that is doing a unit on greetings and exchanges that includes discourse for asking
for personal information (name, address, hobbies, etc.) with some form-focus on
the verb to be, personal pronouns, and question formation.

2. Criterion-Related Evidence.
A second form of evidence of the validity of a test may be
found in what is called criterion-related evidence, also
referred to as criterion-related validity, or the extent to which
the "criterion" of the test has actually been reached.
The most classroom-based assessment with teacher designed
tests fits the concept of criterion-referenced assessment. In the
case of teacher-made classroom assessments, criterion-related
evidence is best demonstrated through a comparison of results
of an assessment with results of some other measure of the
same criterion.

Criterion-related evidence usually falls into one of two
categories; i.e., concurrent and predictive validity.
A test has concurrent validity if its results are supported by
other concurrent performance beyond the assessment itself.
The predictive validity of an assessment becomes important
in the case of placement tests, admissions assessment
batteries, language aptitude tests, and the like.

3. Construct-Related Evidence:
A third kind of evidence that can support validity is construct-
related validity, commonly referred to as a construct validity. A
construct is any theory, hypothesis, or model that attempts to
explain observed phenomena in our universe of perceptions.
Constructs may or may not be directly or empirically measured
and their verification often requires inferential data.
In the field of assessment, construct validity asks, "Does this
test actually tap into the theoretical construct as it has been
defined?"
Construct validity is a major issue in validating large-scale
standardized tests of proficiency.

4. Consequential Validity:
Consequential validity encompasses all the consequences of a
test, including such considerations as its accuracy in measuring
intended criteria, its impact on the preparation of test-takers, its
effect on the learner, and the (intended and unintended) social
consequences of a test's interpretation and use.

5. Face Validity:
Face validity refers to the degree to which a test looks right,
and appears to measure the knowledge or abilities it claims to
measure, based on the subjective judgment of the examines
who take it, the administrative personnel who decide on its use,
and other psychometrically unsophisticated observers.

Face validity will likely be high if learners encounter:
• A well-constructed, expected format with familiar tasks.
• A test that is clearly practical within the allotted time limit.
• Items that are clear and uncomplicated.
• Directions that arc crystal clear.
• Tasks that relate to their course work (content validity).
• A difficulty level that presents a reasonable challenge.

Authenticity
Bachman and Palmer (1996) define authenticity as "the degree of
correspondence of the characteristics of a given language test task to the
features of a target language task," and then, they suggest an agenda for
identifying those target language tasks and for transforming them into
valid test items.
In a test, authenticity may be present in the following ways:
1. The language in the test is as natural as possible.
2. Items are contextualized rather than isolated.
3. Topics are meaningful (relevant, interesting) for the learner.
4. Some thematic organization to items is provided. such as through a
story line or episode.
5. Tasks represent, or closely approximate, real-world tasks.

Washback
➢ In large-scale assessment, washback generally refers to the effects
the tests have on instruction in terms of how students prepare for
their test courses and "teaching to the test" are examples of such
washback.
➢ Another form of washback that occurs more in classroom assessment
is the information that “washes back" to students in the form of
useful diagnoses of strengths and weaknesses.
➢ Washback also includes the effects of an assessment on teaching and
learning prior to the assessment itself, that is, on preparation for the
assessment.
➢ Finally, washback also implies that students have ready access to you
to discuss the feedback and evaluation you have given.

Thank You for Listening and
Attention

Principles of language assessment

More Related Content

What's hot

Similar to Principles of language assessment

Recently uploaded

Principles of language assessment