SlideShare a Scribd company logo
1 of 20
Research Methodology :https://www.facebook.com/groups/689682314444236/
Testing and Evaluation
By : Abla BEN BELLAL
1. DEFINITION OF TESTING:
It is a technique of obtaining infomation needed for evaluation purposes. (Tests,
quizzes, measuring instruments) are devices used to obtain such information.
A TEST is « a method of measuring a person’s ability on knowledge in a given
area Âť
2. DEFINITION OF ASSESSMENT :
 It is the process of collecting information or evidence of a learner’s learning
progress and achievement over a period of time in order to improve teaching
and learning.
 Assessment is typically used to describe processes to examine or measure
student learning that results from academic programs.
 Assessment is an ongoing process aimed at imroving student learning.
 Assessment is not based on one test or one task, nor it is expressed by mark or
grade, but rather in a report form with scales or levels as well as description and
comment from the teacher.
2.1. TYPES OF ASSESSMENT:
2.1.1. Formative Assessment : when teachers use it to check on the progress of
their students, to see how far they have mastered what they should have
learnt, and then use this information to modify their future teaching plans.
*Informal assessment is a part of formative assessment. It can take a
number of forms : Unplanned comments, verbal feedback to students,
observing students perform a task of work in small groups and so on.
2.1.2. Summative Assessment :is used at the end of the term, semester, or year
in order to measure what has been achieved both by groups and by
individuals.
Research Methodology :https://www.facebook.com/groups/689682314444236/
*Formal assessment is part of summative assessment. i.e. Exercises or
procedures which are systematic and give students and teachers an appraisal
of students’ achievement.
3. DEFINITION OF EVALUATION:
It is the process of making overall judgment about one’s work or a whole
school’s work.
Evaluation is typically broader concept than assessment as it focuses on the
overall, or summative experience.
when we ASSESS our students we commonly are interestedin“how and
how much our students have learnt” , but when we EVALUATE them we
are concerned with “how the learning process is developing” .
4. WHAT ARE THE MAIN REASONS FOR TESTING ?
a. Achievement/Attainment tests: usually more formal, designed to show
mastery of a particular syllabus (e.g. end-of-year tests, school-leaving exams,
public tests) though similar (re-syllabus) to progress tests. Rarely constructed
by classroom teacher for a particular class. Designed primarily to measure
individual progress rather than as a means of motivating or reinforcing
language.
b. Progress Tests: Most classroom tests take this form. Assess progress students
make in mastering material taught in the classroom. Often given to motivate
students. They also enable students to assess the degree of success of teaching
and learning and to identify areas of weakness & difficulty. Progress tests can
also be diagnostic to some degree.
c. Diagnostic Tests : can include Progress, Achievement and Proficiency tests,
enabling teachers to identify specific weaknesses/difficulties so that an
appropriate remedial programme can be planned. Diagnostic Tests are primarily
designed to assess students' knowledge & skills in particular areas before a
course of study is begun. Reference back to class-work. Motivation. Remedial
work.
Research Methodology :https://www.facebook.com/groups/689682314444236/
d. Placement Tests : sort new students into teaching groups so that they are
approx. the same level as others when they start. Present standing. General
ability rather than specific points of learning. Variety of tests necessary.
Reference forward to future learning. Results of Placement Tests are needed
quickly. Administrative load.
e. Proficiency Tests : measure students' achievements in relation to a specific
task which they are later required to perform (e.g. follow a university course in
the English medium; do a particular job). Reference forward to particular
application of language acquired: future performance rather than past
achievement. They rarely take into account the syllabus that students have
followed. Definition of operational needs. Practical situations. Authentic
strategies for coping. Common standard e.g. driving test regardless of previous
learning. Application of common standard whether the syllabus is known or
unknown.
f. Aptitude Tests: measure students probable performance. Reference forward
but can be distinguished from proficiency tests. Aptitude tests assess
proficiency in language for language use (e.g. will S experience difficulty in
identifying sounds or the grammatical structure of a new language?) while
Proficiency tests measure adequacy of control in L2 for studying other things
through the medium of that language.
5. TYPES OF TESTS :
Proficiencytests Achievement tests Diagnostic tests Placement tests
*They are designed to
measure students’ aility in
a language.
*The content is not based
on the content or
objectives of language
courses that people taking
may have followed.
*It is based on a
specification of what
candidates have to be able
to do in the language in
*Achievement tests are “more
formal”, whereas Hughes
(1989:8) assumes that this type
of tests will fully involve
teachers, for they will be
responsible for the preparation
of such tests and giving them to
the learners.
*Are directly related to language
courses, their purpose being to
establish how successful
individual students, or the
*Are used to identify
learners’ strengths and
weaknesses.
*They are intended to
ascertain what learning still
needs to take place.
a diagnostic test is helps us
evaluate our teaching ,the
syllabus, as well as the
material used in addition to
locating difficulties and
planning appropriate
*intended to provide
information that will
help to place students
at the stage of the
teaching programme
most appropriate to
their abilities.
*They are used to
assign students to
classes at different
levels.
placement tests are
Research Methodology :https://www.facebook.com/groups/689682314444236/
order to be considered
proficient.
* proficiency test is a test,
which measures how much
of a language a person
knows or has learnt. It is
not bound to any
curriculum or syllabus, but
is intended to check the
learners’ language
competence
The examples of such tests
could be the American
Testing of English as
Foreign Language test
(further in the text
TOEFL) that is used to
measures the learners’
general knowledge of
English in order to allow
them to enter any high
educational establishments
or to take up a job in the
USA
*proficiency test is used to
assess the general
knowledge or skill
commonly required to
entry into a group of
similar institution. Because
of the general nature of
proficiency decisions, a
proficiency test must be
designed so that the
general abilities and skills
of students are reflected in
a wide distribution of
scores. Thus, proficiency
decisions must be based on
the best obtainable
proficiency test scores as
well as other information
about students
courses themselves have been in
achieving objectives.
*An achievement test at the end
of the course to check the
acquisition of the material
covered during the study year
*There are of 2 types :
Final achievement tests :
administered at the end of a
course of a study. They may be
written and administered by
ministries of education, official
examining boards, or by
memebers of teaching
institutions. The content of
these tests must be related to the
courses with which they are
concerned.
*Progress achievement tests :
intended to measure the progress
that the students are making.
They contribute to formative
assessment.
*Achievement tests must be not
only very specifically designed
to measure the objectives of a
given course but also flexible
enough to help teachers readily
respond to what they learn from
the test about the students’
abilities, the students’ needs, and
the students’ learning of the
course objectives.
*Achievement tests are mainly
given at definite times of the
school year. Moreover, they
could be extremely crucial for
the students, for they are
intended either to make the
students pass or fail the test.
Alderson (ibid.) mentions two
usage types of achievement
tests: formative and summative.
The notion of a formative test
remedial teaching.
* diagnostic testing often
requires more
detailed information about
the very specific areas in
which students have
strengths and weaknesses.
*The purpose is to help
students and their
teachers to focus their
efforts where they will be
most effective.
* the most effective use of
a diagnostic test is to report
the performance level on
each objective (in a
percentage) to each student
so that he or she can decide
how and where to invest
time and energy most
profitably.
*They are designed to
determne the degree to
which the specific
objectives of the course
have been accomplished as
well as to assess students’
strengths and weaknesses to
correct individual
deficiencies before it’s too
late. These tests aim at
fostering achievement by
promoting strengths and
eliminating weaknesses of
students. In other words,
the purpose of this type of
tests is to diagnose
students’ problems during
the learning process to
focus their efforts where
they will be most effective.
designed to help
decide what each
student’s
appropriate level will
be within a specific
program, skill area, or
course.
*The purpose of such
tests is to reveal which
students have more of,
or less
of, a particular
knowledge or skill so
that students with
similar levels of
ability can be grouped
together
*the placement test
typically could be
represented in the form
of dictations,
interviews, grammar
tests, etc.
*a placement test is
designed and given in
order to use the
information of the
students’ knowledge
for putting the students
into groups according
to their level of the
language.
Research Methodology :https://www.facebook.com/groups/689682314444236/
denotes the idea that the teacher
will be able after evaluating the
results of the test reconsider
his/her teaching, syllabus design
and even slow down the pace of
studying to consolidate the
material if it is necessary in
future
Summative usage will deal
precisely with the students’
success or failure. The teacher
will immediately can take up
remedial activities to improve a
situation.
*students are tested to find out
how much each person has
learnt within the program.
Achievement decisions are about
the amount of learning that
students have done. They are
flexible to help teachers respond
to what they learn from the test
about students’ ability, students’
needs and students’ learning of
the course objective
6. PRINCIPLES OF LANGUAGE ASSESSMENT
There are five principles of language assessment; they are practicality, reliability,
validity, authenticity, and washback.
6.1. PRACTICALITY
An effective test is practical. This means that it:
 is not excessively expensive.
 A test that is prohibitively expensive is impractical.
 stays within appropriate time constraint.
A test of language proficiency that takes a student 10 hours to complete is
impractical.
 is relatively easy to administer
Research Methodology :https://www.facebook.com/groups/689682314444236/
A test that takes a few minutes for a student to take and several hours for an examiner
to evaluate for most classroom situation is impractical.
 has a scoring/evaluation procedure that is specific and time efficient.
A test that can be scored only by computer if the test takes place a thousand miles
away from the nearest computer is impractical.
Furthermore, for a test to be practical:
 administrative details should clearly be established before the test,
 students should be able to complete the test reasonably within the set time
frame,
 all materials and equipment should be ready,
 the cost of the test should be within budgeted limits,
 the scoring/evaluation system should be feasible in the teacher’s time frame.
Validity and reliability are not enough to build a test. Instead, the test should be
practical across time, cost, and energy. Dealing with time and energy, tests should be
efficient in terms of making, doing, and evaluating. Then, the tests must be affordable.
It is quite useless if a valid and reliable test cannot be done in remote areas because it
requires an inexpensive computer to do it (Heaton, 1975: 158-159; Weir, 1990: 34-35;
Brown, 2004: 19-20).
6.2. RELIABILITY
A reliability test is consistent and dependable. A number of sources of unreliability
may be identified:
¡
a. Students-related Reliability
A test yields unreliable results because of factors beyond the control of the test taker,
such as illness, fatigue, a “bad day”, or no sleep the night before.
b. Rater (scorer) Reliability
Research Methodology :https://www.facebook.com/groups/689682314444236/
Rater reliability sometime refers to the consistency of scoring by two or more scorers.
Human error, subjectivity, and bias may enter into the scoring process. Inter-
rater unreliability occurs when two or more scorers yield inconsistent score of the
same test, possibly for lack of attention to scoring criteria, inexperience, or
inattention. Intra-rater unreliability is because of unclear scoring criteria, fatigue, and
bias toward particular “good” and “bad” students.
¡ Test Administration Reliability
Unreliability may result from the condition in which the test is administered. For
example is the test of aural comprehension with a tape recorder. When a tape recorder
played items, the students sitting next to windows could not hear the tape accurately
because of the street noise outside the building.
c. Test Reliability
If a test is too long, test-takers may become fatigued by the time they reach the later
items and hastily respond incorrectly.
d. Test and test administration reliability can be achieved by making sure that all
students received the same quality of input. Part of achieving test reliability
depends on the physical context-making sure, for example, that every students has
a cleanly photocopied test sheet, sound amplification is clearly audible to everyone
in the room, video input is equally visible to all, lightning, temperature, and other
classroom conditions are equal (and optimal) for all students.
Reliability refers to consistency and dependability. A same test delivered to a same
student across time administration must yield same results. Factors affecting reliability
are (Heaton, 1975: 155-156; Brown, 2004: 21-22):
1. student-related reliability: students personal factors such as motivation,
illness, anxiety can hinder from their ‘real’ performance,
2. rater reliability: either intra-rater or inter-rater leads to subjectivity, error,
bias during scoring tests,
3. test administration reliability: when the same test administered in
different occasion, it can result differently,
Research Methodology :https://www.facebook.com/groups/689682314444236/
4. test reliability: dealing with duration of the test and test instruction. If a test
takes a long time to do, it may affect the test takers performance such as
fatigue, confusion, or exhaustion. Some test takers do not perform well in
the timed test. Test instruction must be clear for all of test takers since they
are affected by mental pressures.
Some methods are employed to gain reliability of assessment (Heaton, 1975:
156; Weir 1990: 32; Gronlund and Waugh, 2009: 59-64). They are:
1. test-retest/re-administer: the same test is administered after a lapse of
time. Two gained scores are then correlated.
2. parallel form/equivalent-forms method: administrating two cloned tests at
the same time to the same test takers. Results of the tests are then correlated.
3. split-half method: a test is divided into two, corresponding scores obtained,
the extent to which they correlate with each other governing the reliability of
the test as a whole.
4. test-retest with equivalent forms: mixed method of test-retest and parallel
form. Two cloned tests are administered to the same test takers in different
occasion.
5. intra-rater and inter-rater: employing one person to score the same test in
different time is called intra-rater. Some hits to minimize unreliability are
employing rubric, avoiding fatigue, giving score on the same numbers, and
suggesting students write their names at the back of test paper. When two
people score the same test, it is inter-rater. The tests done by test takers are
divided into two. A rubric and discussion must be developed first in order to
have the same perception. Two scores either from intra- or inter-rater are
correlated.
6.3. VALIDITY
Validity is the extent to which inferences made from assessment result are appropriate,
meaningful, and useful for the purpose of the assessment. It is the most complicated
yet the most important principle. Validity can be measured using statistical correlation
with other related measures.
Research Methodology :https://www.facebook.com/groups/689682314444236/
According to Bynom (Forum, 2001), validity deals with what is tested and degree to
which a test measures what is supposed to measure (Longman Dictionary, LTAL). For
example, if we test the students writing skills giving them a composition test on Ways
of Cooking, we cannot denote such test as valid, for it can be argued that it tests not
our abilities to write, but the knowledge of cooking as a skill.
A. Content-related Validity
A test is said to have content validity when it actually samples the subject matter about
which conclusion are to be drawn, and require the test–taker to perform the behavior
being measured. For example, speaking ability is tested using speaking performance, not
pencil and paper test. It can be identified when we can define the achievement being
measured.
It can be achieved by making a direct test performance. For example to test pronunciation
teacher should require the students to pronounce the target words orally.
There are two questions are used to applying content validity in classroom test:
1. Are classroom objectives identified and appropriately framed? The objective should
include a performance verb and specific linguistic target.
2. Are lessonobjectives represented in the form of test specification? A test should
have a structure that follows logically from the lesson or unit being tested. It can be
designed by dividing the objectives into sections, offering students a variety of item
types, and gives appropriate weight to each section.
B. Criterion-relatedValidity
The extent to which the “criterion” of the test has actually been reached. It can be best
demonstrated through a comparison of result of an assessment with result of some other
measure of the same criterion.
Criterion-related validity usually falls into two categories:
1. Concurrent Validity: if the test result supported by other concurrent performance
beyond assessment. (e.g.: high score in English final exam supported by actual
proficiency in English)
Research Methodology :https://www.facebook.com/groups/689682314444236/
2. Predictive Validity: to asses or predict the test-taker’s likelihood of future success.
(e.g.: placement test, admission assessment)
C. Construct-related Validity
Construct validity ask “Does the test actually touch into the theoretical construct
as it has been defined?”. An informal construct validation of the use of virtually every
classroom test is both essential and feasible. For example, the scoring analysis of
interview includes pronunciation, fluency, grammatical accuracy, vocabulary used and
sociolinguistics appropriateness. This is the theoretical construct of oral proficiency.
Construct validity is a major issue in validating large-scale standardized test of
proficiency.
D. Consequential Validity
It includes all the consequences of a test, such as its accuracy in measuring the
intended criteria, its impact on the test-takers preparation, its effect on the learner, and
the social consequences of the test interpretation and use. One aspect of consequential
validity which draws special attention is the effect of test preparation courses and
manual on performance.
E. Face Validity
Face validity is the extent to which students view the assessment as fair, relevant, and
useful for improving learning. It means that students perceive the test to be valid. It
will be perceived valid if it samples the actual content of what the learners has
achieved or expect to achieve. Nevertheless the psychological state of the test-taker
(confidence, anxiety) is an important aspect in their peak performance.
Test with high face validity has the following characteristics:
• Well constructed, expected format with familiar task.
• Clearly doable within the allotted time.
• Clear and uncomplicated test item.
• Crystal clear direction.
• Task that relate to students course work.
• A difficulty level that present a reasonable challenge.
Research Methodology :https://www.facebook.com/groups/689682314444236/
Another phrase associated with face validity is” biased for best”. Teachers can
make a test which is” biased for best” by offering students appropriate review and
preparation for the test, suggesting strategies that will be beneficial, or structuring the
test so that the best students will be modestly challenged and the weaker students will
not be overwhelmed.
The concept of face validity according to Heaton (1975: 153) and Brown (2004:
26) is that when a test item looks right to other testers, teachers, moderators, and test-
takers. In addition, it appears to measure the knowledge or abilities it claims to
measure. Heaton argues that if a test is examined by other people, some absurdities
and ambiguities can be discovered.
Face validity is important in maintaining test takers’ motivation and
performance (Heaton, 1975; 153; Weir, 1990: 26). If a test does not have face validity,
it may not be acceptable to students or teachers. If students do not take the test as
valid, they will show adverse reaction (poor study reaction, low motivation). In other
words, they will not perform in a way which truly reflects their abilities.
Brown (2004: 27) states that face validity will likely be high if learners
encounter:
1. a well-constructed, expected format with familiar tasks,
2. a test that is clearly doable within the allotted time limit,
3. items that are clear and uncomplicated,
4. directions that are crystal clear,
5. tasks that relate to their course work (content validity), and
6. a difficulty level that presents a reasonable challenge.
To examine face validity, no statistical analysis is needed. Judgmental
responses from experts, colleagues, or test takers may be involved. They can read
thoroughly to the whole items or they can just see at glance the items. Then, they can
relate to the ability that the test want to measure. If a speaking test appears in
vocabulary items, it may not have face validity.
6.4. AUTHENTICITY
Research Methodology :https://www.facebook.com/groups/689682314444236/
Authenticity is the degree of correspondence of the characteristics of a given
language test task to the features of a target language task. It also means a task that is
likely to be encountered in the “real world”.
Authenticity can be presented by:
• Using a natural language
• Contextualizing the test item
• Giving meaningful (relevant, interesting) topics for the learners.
• Providing thematic organization to the item (e.g. through story line or episode)
• Giving test which represent or closely approximate real world task.
6.5. WASHBACK
In general terms, washback means the effect of testing on teaching and learning. In
large-scale assessment, it refers to the effects that test have on instruction in the terms
of how the students prepare for the test. While in classroom assessment, washback
means the beneficial information that washesback to the students in the form of useful
diagnoses of strengths and weaknesses.
In enhancing washback, the teachers should comment generously and specifically
on test performance, respond to as many details as possible, praise strengths, criticize
weaknesses constructively, and give strategic hints to improve performance.
The teachers should serve classroom tests as learning device through which
washback is achieved. Students’ incorrect responses can become windows of insight
into further work. Their correct responses need to be praised, especially when they
represent accomplishments in a student’s inter-language.
Washback enhances a number of basic principles of language acquisition: Intrinsic
motivation, autonomy, self-confidence, language ego, inter-language, and strategic
investment, among others.
One way to enhance washback is to comment generously and specifically on test
performance. Washback implies that students have ready access to the teacher to
discuss the feedback and evaluation he/she has given.
Research Methodology :https://www.facebook.com/groups/689682314444236/
The effects of tests on teaching and learning are called washback. Teachers must be
able to create classroom tests that serve as learning devices through which washback is
achieved. Washback enhances intrinsic motivation, autonomy, self-confidence,
language ego, interlanguage, and strategic investment in the students. Instead of giving
letter grades and numerical scores which give no information to the students’
performance, giving generous and specific comments is a way to enhance washback
(Brown 2004: 29).
Heaton (1975: 161-162) mentions this as backwash effect which falls into
macro and micro aspects. In macro aspect, tests impact society and education system
such as development of curriculum. In micro aspect, tests impact individual student or
teacher such as improving teaching and learning process.
Washback can also be negative and positive (Saehu, 2012: 124-127). It is easy
to find negative wash back such as narrowing down language competencies only on
those involve in tests and neglecting the rest. While language is a tool of
communication, most students and teachers in language class only focus on language
competencies in the test. On the other hand, a test can be positive washback if it
encourages better teaching and learning. However, it is quite difficult to achieve. An
example of positive washback of a test is National Matriculation English Test in
China. It resulted that after the test was administered, students’ proficiency in English
for actual or authentic language use situation improved.
Washback can be strong or weak (Saehu, 2012: 122-123). An example of strong
effect of the test is national examination; meanwhile weak effect of the test is the
impact of formative test. Let us compare and decide how most students and teachers
react on those two kinds of test.
7. WAYS OF TESTING:
7.1. Direct and Indirect Testing:
Direct Indirect
*Hughes (1989:14) : the involvement of a skill that is
supposed to be tested. The following view means that
when applying the direct testing the teacher will be
interested in testing a particular skill, e.g. if the aim of
*Indirect testing. It differs from direct one in the way
that it measures a skill through some other skill. It
could mean the incorporation of various skills that
are connected with each other, e.g. listening and
Research Methodology :https://www.facebook.com/groups/689682314444236/
the test is to check listening comprehension, the students
will be given a test that will check their listening skills,
such as listening to the tape and doing the
accompanying tasks. Such type of test will not engage
testing of other skills.
*Testing is direct when it requires the learner to perform
precisely the skill that we wish to measure. If we want
to know how well learners can write compositions, we
get them to write compositions. The tasks and the texts
used should be as authentic as possible
*It is said that the advantages of direct testing is that it is
intended to test some certain abilities, and preparation
for that usually involves persistent practice of certain
skills. Nevertheless, the skills tested are deprived from
the authentic situation that later may cause difficulties
for the students in using them.
speaking skills.
*Indirect testing, regarding to (Hughes), tests the
usage of the language in real-life situation
*indirect testing is more effective than direct one, for
it covers a broader part of the language. It denotes
that the learners are not constrained to one particular
skill and a relevant exercise. They are free to
elaborate all four skills; what is checked is their
ability to operate with those skills and apply them in
various, even unpredictable situations. This is the
true indicator of the learner’s real knowledge of the
language.
7.2. Discrete point and Integrative Testing:
Discrete point Integrative
_discrete point test is a language test that is meant to test
a particular language item, e.g. tenses. The basis of that
type of tests is that we can test components of the
language (grammar, vocabulary, pronunciation, and
spelling) and language skills (listening, reading,
speaking, and writing) separately
_The integrative test intends to check several
language skills and language components together or
simultaneously. Hughes (1989:15) stipulates that the
integrative tests display the learners’ knowledge of
grammar, vocabulary, spelling together, but not as
separate skills or items.
7.3. Norm referenced and CriterionreferencedTesting:
They are not focused directly on the language items, but on the scores the students
can get
Norm Referenced(proficiency & placement tests) Criterion Referenced (achievement & diagnostic)
*Norm-referenced tests refer to standardized tests that
are designed to compare and rank test takers in relation
to one another. This type of tests reports whether test
takers performed better or worse than a hypothetical
average student. It is designed to measure global
language abilities, such as overall English language
proficiency and academic listening ability, in which
each student’s score is interpreted relative to the scores
of all other students who took the test.
*Criterion-referenced tests are designed to measure
students’ performance against a fixed set of criteria
or learning standards. That is to say, they are written
descriptions of what students are expected to know
and be able to do a lot at a specific stage of their
education. CRTs provide information on whether
students have attained a predetermined level of
performance called “mastery”.
Research Methodology :https://www.facebook.com/groups/689682314444236/
* Norm-referenced test that measures the knowledge of
the learner and compares it with the knowledge of
another member of his/her group. The learner’s score is
compared with the scores of the other students
*In NRTs, testers interpret each student’s performance
in relationship to the performances of other students in
the norm group. In other words, NRTs examine the
relationship of a given student’s performance to that of
all other students in percentile terms. Scores are
expressed with no references to the actual number of
test questions answered correctly .This means that
teachers are mainly concerned with the student’s
percentile score which informs them about the
proportion of students who scored above and below the
student in question.
*Tests are used to measure general abilities such as
language proficiency in English. This type of tests has
subtests that are general in nature. For example,
measuring listening comprehension, reading
comprehension and writing
*the purpose is to generate scores that spread the
students out along a continuum of general abilities.
Thus, any existing difference between individuals can
be distinguished since a student performance is
compared to others in the same group
*the test is very long and contains a variety of different
types of question content. The content is diverse and
students find difficulties to know exactly what will be
tested because the test is made up of a few subtests on
general language skills such as reading and listening
comprehension
* students know the general format of the questions but
not the language points or content to be tested by those
questions
* The aim of testing is not to compare the results of
the students. It is connected with the learners’
knowledge of the subject. As Hughes (1989:16) puts
it the criterion-referenced tests check the actual
language abilities of the students. They distinguish
the weak and strong points of the students. The
students either manage to pass the test or fail it.
* the primary focus in interpreting scores is on how
much of the material each student has learnt in
absolute terms. That is, teachers are concerned with
how much of the material the students know (The
Percentage). They care about the percentage of
questions the students answered correctly in
connection with the material at hand without
reference to students’ positions. A high percentage
score means that the test was very easy for students
who knew the material being tested.
* CRTs are designed to provide precise information
about each individual’s performance on well-defined
learning points. Subtests for a notional functional
language course might consist of a short interview
where ratings are made of students’ ability to
perform greetings, express opinions and so on.
*the purpose is to assess the amount of skill learnt by
each student. That is to say, the focus here is on a
student’s performance compared to the amount of
material known by that student, and not on scores’
distribution.
* a CRT consists of numerous and short subtests in
which each objective in the course will have its own
subtest. To save time and efforts, subtests will be
collapsed together which makes it difficult for an
outsider to identify the subtests.
* students can predict both the questions formats on
the test and the language points to be tested.
Teaching to such a test should help teachers and
students stay on track .Besides, the results should
provide a useful feedback on the effectiveness of
teaching and learning processes.
Research Methodology :https://www.facebook.com/groups/689682314444236/
7.4.Objective and Subjective Testing:
Objective Subjective
*A objective testing is one
that can’t be interpreted
differently because of
numerical values.
*when testing the students
objectively, the teacher
usually checks just the
knowledge of the topic
*A subjective testing is one that can possibly be interpreted differently
*The subjective test involves personal judgement of the examiner
*Testing subjectively could imply the teacher’s ideas and judgements. This
could be encountered during speaking test where the student can produce either
positive or negative impression on the teacher. Moreover, the teacher’s
impression and his/her knowledge of the students’ true abilities can seriously
influence assessing process. For example, the student has failed the test;
however, the teacher knows the true abilities of the student and, therefore, s/he
will assess the work of that student differently taking all the factors into
account.
7.5. Communicative Testing:
*It involves the knowledge of grammar and how it could be applied in written and oral language; the
knowledge when to speak and what to say in an appropriate situation; knowledge of verbal and non-
verbal communication. All these types of knowledge should be successfully used in a situation
*without a context the communicative language test would not function. The context should be as
closer to the real life as possible. It is required in order to help the student feel him/herself in the natural
environment.
*the student has to possess some communicative skills, that is how to behave in a certain situation, how
to apply body language, etc.
*Communicative language testing involves the learner’s ability to operate with the language s/he knows
and apply it in a certain situation s/he is placed in. S/he should be capable of behaving in real-life
situation with confidence and be ready to supply the information required by a certain situation.
Thereof, we can speak about communicative language testing as a testing of the student’s ability to
behave him/herself, as he or she would do in everyday life. We evaluate their performance.
8. A SHORT HISTORY OF LANGUAGE TESTING:
Spolsky (1975) identifies three stages in the recent history of language testing: 1) The
pre-scientific 2) the psychometric-structuralist and 3) the psycho-linguistic-
sociolinguistic.
8.1. The Pre-scientific Period:
Research Methodology :https://www.facebook.com/groups/689682314444236/
Language testing has its roots in pre-scientific stage in which no special skill or
expertise in testing is required. This is characterized by lack of concern for statistical
considerations or for such notions as objectivity and reliability (Heaton 1988, Weir
1990; Farhady et al., 1994). In its simplest form, this trend assumes that one can and
must rely completely on the subjective judgment of an experienced teacher, who can
identify after a few minutes of conversation, or after reading a student’s essay, what
mark to give him/her in order to specify the related language ability.
The pre-scientific movement is characterized by translation tests developed
exclusively by the classroom teachers. One problem that arises with these types of
tests is that they are relatively difficult to score objectively; thus, subjectivity becomes
an important factor in the scoring of such tests (Brown, 1996). It is inferred from
Hinofotis’s article (1981) that the pre-scientific movement ended with the onset of the
psychometric structuralist movement, but clearly such movements have no end in
language teaching and testing because, such teaching and testing practices are
indubitably going on in many parts of the world depending on the needs which specific
academic contexts demand.
8.2. The Psychometric Structuralist Period :
With the onset of the psychometric-structuralist movement of language testing,
language tests became increasingly scientific, reliable, and precise. In this era, the
testers and psychologists, being responsible for the development of modern theories
and techniques of educational measurement, were trying to provide objective
measures, using various statistical techniques to assure reliability and certain kind of
validity. According to Carrol (1972), psychometric-structuralist tests typically set out
to measure the discrete structural elements of language being taught in audio-lingual
and related teaching methods of the time. The standard tests, constructed according to
discrete point approach, were easy to administer and score and were carefully
constructed to be objective, reliable and valid. Therefore, they were considered as an
improvement on the testing practices of the pre-scientific movement (Brown, 1996).
Research Methodology :https://www.facebook.com/groups/689682314444236/
In psychometric structuralist period, there was a remarkable congruence between
American structuralist view of language and psychological theories and practical needs
of testers. On the theoretical side, both agreed that language learning was chiefly
concerned with the systematic acquisition of a set of habits; on the practical side,
testers wanted and structuralists knew how to deliver long lists of small items which
could be sampled and tested objectively.
However, the following triple objectives were achieved from discrete tests, which was
the result of the coalescence of the two fields.
1) diagnosing learner strengths;
2) prescribing curricula at particular skills;
3) developing scientific strategies to help learners overcome particular weakness
The psychometric-structuralist movement was important because for the first time
language test development followed scientific principles. In addition, Brown (1996)
maintains that psychometric-structuralist movement could be easily handled by trained
linguists and language testers. As a result, statistical analyses were used for the first
time. Interestingly, psychometric-structuralist tests are still very much in evidence
around the world, but they have been supplemented by what Carrol (1972)
called integrative tests.
8.3. The Integrative-Sociolinguistic Period :
With the attention of linguists inclined toward generativism and psychologist toward
cognition, language teachers adopted the cognitive-code learning approach for
teaching a second and/or foreign language. Language professionals began to believe
that language is more than the sum of the discrete elements being tested during the
psychometric-structuralist movement (Brown, 1996; Heaton 1991; Oller, 1979).
The criticism came largely from Oller (1979) who argued that competence is a unified
set of interacting abilities that cannot be tested apart and tested adequately. The claim
Research Methodology :https://www.facebook.com/groups/689682314444236/
was that communicative competence is so global that it requires the integration of all
linguistic abilities. Such global nature cannot be captured in additive tests of grammar,
reading, vocabulary, and other discrete points of language. According to Oller (1983),
if discrete items take language skill apart, integrative tests put it back together;
whereas discrete items attempt to test knowledge of language a bit at a time,
integrative tests attempt to assess a learner’s capacity to use many bits all at the same
time.
This movement has certainly its roots in the argument that language is creative.
Beginning with the work of sociolinguists like Hymes (1967), it was felt that the
development of communicative competence depended on more than simple
grammatical control of the language; communicative competence also hinges on the
knowledge of the language appropriate for different situations.
Tests typical of this movement were the cloze test and dictation, both of which assess
the students’ ability to manipulate language within a context of extended text rather
than in a collection of discrete-point questions. The possibility of testing language in
context led to further arguments that linguistic and extralinguistic elements of
language are interrelated and relevant to human experience and operate in
orchestration.
Consequently, the broader views of language, language use, language teaching, and
language acquisition have broadened the scope of language testing, and this brought
about a challenge that was articulated by Canale (1984) as the shift in emphasis from
language form to language use. This shift of focus placed new demands on language as
well as language testing.
Evaluation within a communicative approach must necessarily address, for example,
new content areas such as sociolinguistic appropriateness rules, new testing formats to
permit and encourage creative, open-ended language use, new test administration
procedures to emphasize interpersonal interaction in authentic situations, and new
scoring procedures of a manual and judgmental nature (Canale 1984, p. 79, cited in
Bachman, 1995).
Research Methodology :https://www.facebook.com/groups/689682314444236/
For both theory and practice, the challenge is thus to develop tests that reflect current
views of language and language use, in that they are capable of measuring a wide
range of abilities generally associated with ‘communicative competence’ or
‘communicative language ability’, and include tasks that themselves embody the
essential features of communicative language use (Bachman 1995, p. 296).

More Related Content

What's hot (20)

Difference between Acquisition and Learning #ELT1
Difference between Acquisition and Learning #ELT1Difference between Acquisition and Learning #ELT1
Difference between Acquisition and Learning #ELT1
 
Pakistani english
Pakistani englishPakistani english
Pakistani english
 
Language testing
Language testingLanguage testing
Language testing
 
whats is Grammar and TYPES OF GRAMMAR
whats is Grammar and TYPES OF GRAMMARwhats is Grammar and TYPES OF GRAMMAR
whats is Grammar and TYPES OF GRAMMAR
 
Language, dialect, and varieties
Language, dialect, and varietiesLanguage, dialect, and varieties
Language, dialect, and varieties
 
Standard Language
Standard LanguageStandard Language
Standard Language
 
Situational approach
Situational approachSituational approach
Situational approach
 
Structural approach
Structural approachStructural approach
Structural approach
 
The grammar translation method
The grammar translation methodThe grammar translation method
The grammar translation method
 
Function of language
Function of languageFunction of language
Function of language
 
Test Testing and Evaluation
Test Testing and EvaluationTest Testing and Evaluation
Test Testing and Evaluation
 
' Structural Approach'
' Structural Approach'' Structural Approach'
' Structural Approach'
 
Teaching pronunciation
Teaching pronunciationTeaching pronunciation
Teaching pronunciation
 
Situational language teaching
Situational language teachingSituational language teaching
Situational language teaching
 
Language Variation
Language VariationLanguage Variation
Language Variation
 
The Audio-Lingual Method
The Audio-Lingual MethodThe Audio-Lingual Method
The Audio-Lingual Method
 
GTM method
GTM method GTM method
GTM method
 
Syllabus
SyllabusSyllabus
Syllabus
 
Syllabus design
Syllabus designSyllabus design
Syllabus design
 
LANGUAGE AND IDENTITY
LANGUAGE AND IDENTITYLANGUAGE AND IDENTITY
LANGUAGE AND IDENTITY
 

Similar to Testing and evaluation

Principles of language assessment.pptx
Principles of language assessment.pptxPrinciples of language assessment.pptx
Principles of language assessment.pptxNOELIAANALIPROAOTROY1
 
Learning_activity1_Navarro Luzuriaga_Joseph AndrĂŠs.pptx
Learning_activity1_Navarro Luzuriaga_Joseph AndrĂŠs.pptxLearning_activity1_Navarro Luzuriaga_Joseph AndrĂŠs.pptx
Learning_activity1_Navarro Luzuriaga_Joseph AndrĂŠs.pptxjosephnavarro38
 
La notes (1 7 & 9)
La notes (1 7 & 9)La notes (1 7 & 9)
La notes (1 7 & 9)hakim azman
 
language and literature assessment
language and literature assessmentlanguage and literature assessment
language and literature assessmentIgnatius Joseph Estroga
 
Learning Assessmentm PPT.pptx
Learning Assessmentm PPT.pptxLearning Assessmentm PPT.pptx
Learning Assessmentm PPT.pptxMJSanchez8
 
PHYSICS ASSESSMENT General Types of Assessment and The Types of Scales
PHYSICS ASSESSMENT General Types of Assessment and The Types of ScalesPHYSICS ASSESSMENT General Types of Assessment and The Types of Scales
PHYSICS ASSESSMENT General Types of Assessment and The Types of ScalesMillathina Puji Utami
 
Roles of Assessment in Classroom Instruction
Roles of Assessment in Classroom InstructionRoles of Assessment in Classroom Instruction
Roles of Assessment in Classroom InstructionJames Robert Villacorteza
 
languagetestingpresentationintroducinglangauageandassessment-210119155750.pdf
languagetestingpresentationintroducinglangauageandassessment-210119155750.pdflanguagetestingpresentationintroducinglangauageandassessment-210119155750.pdf
languagetestingpresentationintroducinglangauageandassessment-210119155750.pdfAttallah Alanazi
 
Testing and assessment.pdf
Testing and assessment.pdfTesting and assessment.pdf
Testing and assessment.pdfmarkghatas
 
construction and administration of unit test in science subject
construction and administration of unit test in science subjectconstruction and administration of unit test in science subject
construction and administration of unit test in science subjectAlokBhutia
 
B 190313162555
B 190313162555B 190313162555
B 190313162555pawanbais1
 
Basic Assessment Concepts
Basic Assessment ConceptsBasic Assessment Concepts
Basic Assessment ConceptsAliAlZurfi
 
test construction in mathematics
test construction in mathematicstest construction in mathematics
test construction in mathematicsAlokBhutia
 
Principles of language assessment
Principles of language assessment Principles of language assessment
Principles of language assessment paenriquez2
 
Principles of language assessment
Principles of language assessmentPrinciples of language assessment
Principles of language assessmentJOSEGABRIELNUEZFIALL
 
continous assessment (LH) for Jinela Teachers.pdf
continous assessment (LH)   for Jinela Teachers.pdfcontinous assessment (LH)   for Jinela Teachers.pdf
continous assessment (LH) for Jinela Teachers.pdfbeyeneyewondwossenDi
 
Assessment of learning
Assessment of learningAssessment of learning
Assessment of learningKendral Flores
 
testing and evaluation
testing and evaluation testing and evaluation
testing and evaluation AqsaSuleman1
 

Similar to Testing and evaluation (20)

Principles of language assessment.pptx
Principles of language assessment.pptxPrinciples of language assessment.pptx
Principles of language assessment.pptx
 
Assessment
AssessmentAssessment
Assessment
 
Assessment
AssessmentAssessment
Assessment
 
Learning_activity1_Navarro Luzuriaga_Joseph AndrĂŠs.pptx
Learning_activity1_Navarro Luzuriaga_Joseph AndrĂŠs.pptxLearning_activity1_Navarro Luzuriaga_Joseph AndrĂŠs.pptx
Learning_activity1_Navarro Luzuriaga_Joseph AndrĂŠs.pptx
 
La notes (1 7 & 9)
La notes (1 7 & 9)La notes (1 7 & 9)
La notes (1 7 & 9)
 
language and literature assessment
language and literature assessmentlanguage and literature assessment
language and literature assessment
 
Learning Assessmentm PPT.pptx
Learning Assessmentm PPT.pptxLearning Assessmentm PPT.pptx
Learning Assessmentm PPT.pptx
 
PHYSICS ASSESSMENT General Types of Assessment and The Types of Scales
PHYSICS ASSESSMENT General Types of Assessment and The Types of ScalesPHYSICS ASSESSMENT General Types of Assessment and The Types of Scales
PHYSICS ASSESSMENT General Types of Assessment and The Types of Scales
 
Roles of Assessment in Classroom Instruction
Roles of Assessment in Classroom InstructionRoles of Assessment in Classroom Instruction
Roles of Assessment in Classroom Instruction
 
languagetestingpresentationintroducinglangauageandassessment-210119155750.pdf
languagetestingpresentationintroducinglangauageandassessment-210119155750.pdflanguagetestingpresentationintroducinglangauageandassessment-210119155750.pdf
languagetestingpresentationintroducinglangauageandassessment-210119155750.pdf
 
Testing and assessment.pdf
Testing and assessment.pdfTesting and assessment.pdf
Testing and assessment.pdf
 
construction and administration of unit test in science subject
construction and administration of unit test in science subjectconstruction and administration of unit test in science subject
construction and administration of unit test in science subject
 
B 190313162555
B 190313162555B 190313162555
B 190313162555
 
Basic Assessment Concepts
Basic Assessment ConceptsBasic Assessment Concepts
Basic Assessment Concepts
 
test construction in mathematics
test construction in mathematicstest construction in mathematics
test construction in mathematics
 
Principles of language assessment
Principles of language assessment Principles of language assessment
Principles of language assessment
 
Principles of language assessment
Principles of language assessmentPrinciples of language assessment
Principles of language assessment
 
continous assessment (LH) for Jinela Teachers.pdf
continous assessment (LH)   for Jinela Teachers.pdfcontinous assessment (LH)   for Jinela Teachers.pdf
continous assessment (LH) for Jinela Teachers.pdf
 
Assessment of learning
Assessment of learningAssessment of learning
Assessment of learning
 
testing and evaluation
testing and evaluation testing and evaluation
testing and evaluation
 

Recently uploaded

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 

Recently uploaded (20)

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdf
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 

Testing and evaluation

  • 1. Research Methodology :https://www.facebook.com/groups/689682314444236/ Testing and Evaluation By : Abla BEN BELLAL 1. DEFINITION OF TESTING: It is a technique of obtaining infomation needed for evaluation purposes. (Tests, quizzes, measuring instruments) are devices used to obtain such information. A TEST is ÂŤ a method of measuring a person’s ability on knowledge in a given area Âť 2. DEFINITION OF ASSESSMENT :  It is the process of collecting information or evidence of a learner’s learning progress and achievement over a period of time in order to improve teaching and learning.  Assessment is typically used to describe processes to examine or measure student learning that results from academic programs.  Assessment is an ongoing process aimed at imroving student learning.  Assessment is not based on one test or one task, nor it is expressed by mark or grade, but rather in a report form with scales or levels as well as description and comment from the teacher. 2.1. TYPES OF ASSESSMENT: 2.1.1. Formative Assessment : when teachers use it to check on the progress of their students, to see how far they have mastered what they should have learnt, and then use this information to modify their future teaching plans. *Informal assessment is a part of formative assessment. It can take a number of forms : Unplanned comments, verbal feedback to students, observing students perform a task of work in small groups and so on. 2.1.2. Summative Assessment :is used at the end of the term, semester, or year in order to measure what has been achieved both by groups and by individuals.
  • 2. Research Methodology :https://www.facebook.com/groups/689682314444236/ *Formal assessment is part of summative assessment. i.e. Exercises or procedures which are systematic and give students and teachers an appraisal of students’ achievement. 3. DEFINITION OF EVALUATION: It is the process of making overall judgment about one’s work or a whole school’s work. Evaluation is typically broader concept than assessment as it focuses on the overall, or summative experience. when we ASSESS our students we commonly are interestedin“how and how much our students have learnt” , but when we EVALUATE them we are concerned with “how the learning process is developing” . 4. WHAT ARE THE MAIN REASONS FOR TESTING ? a. Achievement/Attainment tests: usually more formal, designed to show mastery of a particular syllabus (e.g. end-of-year tests, school-leaving exams, public tests) though similar (re-syllabus) to progress tests. Rarely constructed by classroom teacher for a particular class. Designed primarily to measure individual progress rather than as a means of motivating or reinforcing language. b. Progress Tests: Most classroom tests take this form. Assess progress students make in mastering material taught in the classroom. Often given to motivate students. They also enable students to assess the degree of success of teaching and learning and to identify areas of weakness & difficulty. Progress tests can also be diagnostic to some degree. c. Diagnostic Tests : can include Progress, Achievement and Proficiency tests, enabling teachers to identify specific weaknesses/difficulties so that an appropriate remedial programme can be planned. Diagnostic Tests are primarily designed to assess students' knowledge & skills in particular areas before a course of study is begun. Reference back to class-work. Motivation. Remedial work.
  • 3. Research Methodology :https://www.facebook.com/groups/689682314444236/ d. Placement Tests : sort new students into teaching groups so that they are approx. the same level as others when they start. Present standing. General ability rather than specific points of learning. Variety of tests necessary. Reference forward to future learning. Results of Placement Tests are needed quickly. Administrative load. e. Proficiency Tests : measure students' achievements in relation to a specific task which they are later required to perform (e.g. follow a university course in the English medium; do a particular job). Reference forward to particular application of language acquired: future performance rather than past achievement. They rarely take into account the syllabus that students have followed. Definition of operational needs. Practical situations. Authentic strategies for coping. Common standard e.g. driving test regardless of previous learning. Application of common standard whether the syllabus is known or unknown. f. Aptitude Tests: measure students probable performance. Reference forward but can be distinguished from proficiency tests. Aptitude tests assess proficiency in language for language use (e.g. will S experience difficulty in identifying sounds or the grammatical structure of a new language?) while Proficiency tests measure adequacy of control in L2 for studying other things through the medium of that language. 5. TYPES OF TESTS : Proficiencytests Achievement tests Diagnostic tests Placement tests *They are designed to measure students’ aility in a language. *The content is not based on the content or objectives of language courses that people taking may have followed. *It is based on a specification of what candidates have to be able to do in the language in *Achievement tests are “more formal”, whereas Hughes (1989:8) assumes that this type of tests will fully involve teachers, for they will be responsible for the preparation of such tests and giving them to the learners. *Are directly related to language courses, their purpose being to establish how successful individual students, or the *Are used to identify learners’ strengths and weaknesses. *They are intended to ascertain what learning still needs to take place. a diagnostic test is helps us evaluate our teaching ,the syllabus, as well as the material used in addition to locating difficulties and planning appropriate *intended to provide information that will help to place students at the stage of the teaching programme most appropriate to their abilities. *They are used to assign students to classes at different levels. placement tests are
  • 4. Research Methodology :https://www.facebook.com/groups/689682314444236/ order to be considered proficient. * proficiency test is a test, which measures how much of a language a person knows or has learnt. It is not bound to any curriculum or syllabus, but is intended to check the learners’ language competence The examples of such tests could be the American Testing of English as Foreign Language test (further in the text TOEFL) that is used to measures the learners’ general knowledge of English in order to allow them to enter any high educational establishments or to take up a job in the USA *proficiency test is used to assess the general knowledge or skill commonly required to entry into a group of similar institution. Because of the general nature of proficiency decisions, a proficiency test must be designed so that the general abilities and skills of students are reflected in a wide distribution of scores. Thus, proficiency decisions must be based on the best obtainable proficiency test scores as well as other information about students courses themselves have been in achieving objectives. *An achievement test at the end of the course to check the acquisition of the material covered during the study year *There are of 2 types : Final achievement tests : administered at the end of a course of a study. They may be written and administered by ministries of education, official examining boards, or by memebers of teaching institutions. The content of these tests must be related to the courses with which they are concerned. *Progress achievement tests : intended to measure the progress that the students are making. They contribute to formative assessment. *Achievement tests must be not only very specifically designed to measure the objectives of a given course but also flexible enough to help teachers readily respond to what they learn from the test about the students’ abilities, the students’ needs, and the students’ learning of the course objectives. *Achievement tests are mainly given at definite times of the school year. Moreover, they could be extremely crucial for the students, for they are intended either to make the students pass or fail the test. Alderson (ibid.) mentions two usage types of achievement tests: formative and summative. The notion of a formative test remedial teaching. * diagnostic testing often requires more detailed information about the very specific areas in which students have strengths and weaknesses. *The purpose is to help students and their teachers to focus their efforts where they will be most effective. * the most effective use of a diagnostic test is to report the performance level on each objective (in a percentage) to each student so that he or she can decide how and where to invest time and energy most profitably. *They are designed to determne the degree to which the specific objectives of the course have been accomplished as well as to assess students’ strengths and weaknesses to correct individual deficiencies before it’s too late. These tests aim at fostering achievement by promoting strengths and eliminating weaknesses of students. In other words, the purpose of this type of tests is to diagnose students’ problems during the learning process to focus their efforts where they will be most effective. designed to help decide what each student’s appropriate level will be within a specific program, skill area, or course. *The purpose of such tests is to reveal which students have more of, or less of, a particular knowledge or skill so that students with similar levels of ability can be grouped together *the placement test typically could be represented in the form of dictations, interviews, grammar tests, etc. *a placement test is designed and given in order to use the information of the students’ knowledge for putting the students into groups according to their level of the language.
  • 5. Research Methodology :https://www.facebook.com/groups/689682314444236/ denotes the idea that the teacher will be able after evaluating the results of the test reconsider his/her teaching, syllabus design and even slow down the pace of studying to consolidate the material if it is necessary in future Summative usage will deal precisely with the students’ success or failure. The teacher will immediately can take up remedial activities to improve a situation. *students are tested to find out how much each person has learnt within the program. Achievement decisions are about the amount of learning that students have done. They are flexible to help teachers respond to what they learn from the test about students’ ability, students’ needs and students’ learning of the course objective 6. PRINCIPLES OF LANGUAGE ASSESSMENT There are five principles of language assessment; they are practicality, reliability, validity, authenticity, and washback. 6.1. PRACTICALITY An effective test is practical. This means that it:  is not excessively expensive.  A test that is prohibitively expensive is impractical.  stays within appropriate time constraint. A test of language proficiency that takes a student 10 hours to complete is impractical.  is relatively easy to administer
  • 6. Research Methodology :https://www.facebook.com/groups/689682314444236/ A test that takes a few minutes for a student to take and several hours for an examiner to evaluate for most classroom situation is impractical.  has a scoring/evaluation procedure that is specific and time efficient. A test that can be scored only by computer if the test takes place a thousand miles away from the nearest computer is impractical. Furthermore, for a test to be practical:  administrative details should clearly be established before the test,  students should be able to complete the test reasonably within the set time frame,  all materials and equipment should be ready,  the cost of the test should be within budgeted limits,  the scoring/evaluation system should be feasible in the teacher’s time frame. Validity and reliability are not enough to build a test. Instead, the test should be practical across time, cost, and energy. Dealing with time and energy, tests should be efficient in terms of making, doing, and evaluating. Then, the tests must be affordable. It is quite useless if a valid and reliable test cannot be done in remote areas because it requires an inexpensive computer to do it (Heaton, 1975: 158-159; Weir, 1990: 34-35; Brown, 2004: 19-20). 6.2. RELIABILITY A reliability test is consistent and dependable. A number of sources of unreliability may be identified: ¡ a. Students-related Reliability A test yields unreliable results because of factors beyond the control of the test taker, such as illness, fatigue, a “bad day”, or no sleep the night before. b. Rater (scorer) Reliability
  • 7. Research Methodology :https://www.facebook.com/groups/689682314444236/ Rater reliability sometime refers to the consistency of scoring by two or more scorers. Human error, subjectivity, and bias may enter into the scoring process. Inter- rater unreliability occurs when two or more scorers yield inconsistent score of the same test, possibly for lack of attention to scoring criteria, inexperience, or inattention. Intra-rater unreliability is because of unclear scoring criteria, fatigue, and bias toward particular “good” and “bad” students. ¡ Test Administration Reliability Unreliability may result from the condition in which the test is administered. For example is the test of aural comprehension with a tape recorder. When a tape recorder played items, the students sitting next to windows could not hear the tape accurately because of the street noise outside the building. c. Test Reliability If a test is too long, test-takers may become fatigued by the time they reach the later items and hastily respond incorrectly. d. Test and test administration reliability can be achieved by making sure that all students received the same quality of input. Part of achieving test reliability depends on the physical context-making sure, for example, that every students has a cleanly photocopied test sheet, sound amplification is clearly audible to everyone in the room, video input is equally visible to all, lightning, temperature, and other classroom conditions are equal (and optimal) for all students. Reliability refers to consistency and dependability. A same test delivered to a same student across time administration must yield same results. Factors affecting reliability are (Heaton, 1975: 155-156; Brown, 2004: 21-22): 1. student-related reliability: students personal factors such as motivation, illness, anxiety can hinder from their ‘real’ performance, 2. rater reliability: either intra-rater or inter-rater leads to subjectivity, error, bias during scoring tests, 3. test administration reliability: when the same test administered in different occasion, it can result differently,
  • 8. Research Methodology :https://www.facebook.com/groups/689682314444236/ 4. test reliability: dealing with duration of the test and test instruction. If a test takes a long time to do, it may affect the test takers performance such as fatigue, confusion, or exhaustion. Some test takers do not perform well in the timed test. Test instruction must be clear for all of test takers since they are affected by mental pressures. Some methods are employed to gain reliability of assessment (Heaton, 1975: 156; Weir 1990: 32; Gronlund and Waugh, 2009: 59-64). They are: 1. test-retest/re-administer: the same test is administered after a lapse of time. Two gained scores are then correlated. 2. parallel form/equivalent-forms method: administrating two cloned tests at the same time to the same test takers. Results of the tests are then correlated. 3. split-half method: a test is divided into two, corresponding scores obtained, the extent to which they correlate with each other governing the reliability of the test as a whole. 4. test-retest with equivalent forms: mixed method of test-retest and parallel form. Two cloned tests are administered to the same test takers in different occasion. 5. intra-rater and inter-rater: employing one person to score the same test in different time is called intra-rater. Some hits to minimize unreliability are employing rubric, avoiding fatigue, giving score on the same numbers, and suggesting students write their names at the back of test paper. When two people score the same test, it is inter-rater. The tests done by test takers are divided into two. A rubric and discussion must be developed first in order to have the same perception. Two scores either from intra- or inter-rater are correlated. 6.3. VALIDITY Validity is the extent to which inferences made from assessment result are appropriate, meaningful, and useful for the purpose of the assessment. It is the most complicated yet the most important principle. Validity can be measured using statistical correlation with other related measures.
  • 9. Research Methodology :https://www.facebook.com/groups/689682314444236/ According to Bynom (Forum, 2001), validity deals with what is tested and degree to which a test measures what is supposed to measure (Longman Dictionary, LTAL). For example, if we test the students writing skills giving them a composition test on Ways of Cooking, we cannot denote such test as valid, for it can be argued that it tests not our abilities to write, but the knowledge of cooking as a skill. A. Content-related Validity A test is said to have content validity when it actually samples the subject matter about which conclusion are to be drawn, and require the test–taker to perform the behavior being measured. For example, speaking ability is tested using speaking performance, not pencil and paper test. It can be identified when we can define the achievement being measured. It can be achieved by making a direct test performance. For example to test pronunciation teacher should require the students to pronounce the target words orally. There are two questions are used to applying content validity in classroom test: 1. Are classroom objectives identified and appropriately framed? The objective should include a performance verb and specific linguistic target. 2. Are lessonobjectives represented in the form of test specification? A test should have a structure that follows logically from the lesson or unit being tested. It can be designed by dividing the objectives into sections, offering students a variety of item types, and gives appropriate weight to each section. B. Criterion-relatedValidity The extent to which the “criterion” of the test has actually been reached. It can be best demonstrated through a comparison of result of an assessment with result of some other measure of the same criterion. Criterion-related validity usually falls into two categories: 1. Concurrent Validity: if the test result supported by other concurrent performance beyond assessment. (e.g.: high score in English final exam supported by actual proficiency in English)
  • 10. Research Methodology :https://www.facebook.com/groups/689682314444236/ 2. Predictive Validity: to asses or predict the test-taker’s likelihood of future success. (e.g.: placement test, admission assessment) C. Construct-related Validity Construct validity ask “Does the test actually touch into the theoretical construct as it has been defined?”. An informal construct validation of the use of virtually every classroom test is both essential and feasible. For example, the scoring analysis of interview includes pronunciation, fluency, grammatical accuracy, vocabulary used and sociolinguistics appropriateness. This is the theoretical construct of oral proficiency. Construct validity is a major issue in validating large-scale standardized test of proficiency. D. Consequential Validity It includes all the consequences of a test, such as its accuracy in measuring the intended criteria, its impact on the test-takers preparation, its effect on the learner, and the social consequences of the test interpretation and use. One aspect of consequential validity which draws special attention is the effect of test preparation courses and manual on performance. E. Face Validity Face validity is the extent to which students view the assessment as fair, relevant, and useful for improving learning. It means that students perceive the test to be valid. It will be perceived valid if it samples the actual content of what the learners has achieved or expect to achieve. Nevertheless the psychological state of the test-taker (confidence, anxiety) is an important aspect in their peak performance. Test with high face validity has the following characteristics: • Well constructed, expected format with familiar task. • Clearly doable within the allotted time. • Clear and uncomplicated test item. • Crystal clear direction. • Task that relate to students course work. • A difficulty level that present a reasonable challenge.
  • 11. Research Methodology :https://www.facebook.com/groups/689682314444236/ Another phrase associated with face validity is” biased for best”. Teachers can make a test which is” biased for best” by offering students appropriate review and preparation for the test, suggesting strategies that will be beneficial, or structuring the test so that the best students will be modestly challenged and the weaker students will not be overwhelmed. The concept of face validity according to Heaton (1975: 153) and Brown (2004: 26) is that when a test item looks right to other testers, teachers, moderators, and test- takers. In addition, it appears to measure the knowledge or abilities it claims to measure. Heaton argues that if a test is examined by other people, some absurdities and ambiguities can be discovered. Face validity is important in maintaining test takers’ motivation and performance (Heaton, 1975; 153; Weir, 1990: 26). If a test does not have face validity, it may not be acceptable to students or teachers. If students do not take the test as valid, they will show adverse reaction (poor study reaction, low motivation). In other words, they will not perform in a way which truly reflects their abilities. Brown (2004: 27) states that face validity will likely be high if learners encounter: 1. a well-constructed, expected format with familiar tasks, 2. a test that is clearly doable within the allotted time limit, 3. items that are clear and uncomplicated, 4. directions that are crystal clear, 5. tasks that relate to their course work (content validity), and 6. a difficulty level that presents a reasonable challenge. To examine face validity, no statistical analysis is needed. Judgmental responses from experts, colleagues, or test takers may be involved. They can read thoroughly to the whole items or they can just see at glance the items. Then, they can relate to the ability that the test want to measure. If a speaking test appears in vocabulary items, it may not have face validity. 6.4. AUTHENTICITY
  • 12. Research Methodology :https://www.facebook.com/groups/689682314444236/ Authenticity is the degree of correspondence of the characteristics of a given language test task to the features of a target language task. It also means a task that is likely to be encountered in the “real world”. Authenticity can be presented by: • Using a natural language • Contextualizing the test item • Giving meaningful (relevant, interesting) topics for the learners. • Providing thematic organization to the item (e.g. through story line or episode) • Giving test which represent or closely approximate real world task. 6.5. WASHBACK In general terms, washback means the effect of testing on teaching and learning. In large-scale assessment, it refers to the effects that test have on instruction in the terms of how the students prepare for the test. While in classroom assessment, washback means the beneficial information that washesback to the students in the form of useful diagnoses of strengths and weaknesses. In enhancing washback, the teachers should comment generously and specifically on test performance, respond to as many details as possible, praise strengths, criticize weaknesses constructively, and give strategic hints to improve performance. The teachers should serve classroom tests as learning device through which washback is achieved. Students’ incorrect responses can become windows of insight into further work. Their correct responses need to be praised, especially when they represent accomplishments in a student’s inter-language. Washback enhances a number of basic principles of language acquisition: Intrinsic motivation, autonomy, self-confidence, language ego, inter-language, and strategic investment, among others. One way to enhance washback is to comment generously and specifically on test performance. Washback implies that students have ready access to the teacher to discuss the feedback and evaluation he/she has given.
  • 13. Research Methodology :https://www.facebook.com/groups/689682314444236/ The effects of tests on teaching and learning are called washback. Teachers must be able to create classroom tests that serve as learning devices through which washback is achieved. Washback enhances intrinsic motivation, autonomy, self-confidence, language ego, interlanguage, and strategic investment in the students. Instead of giving letter grades and numerical scores which give no information to the students’ performance, giving generous and specific comments is a way to enhance washback (Brown 2004: 29). Heaton (1975: 161-162) mentions this as backwash effect which falls into macro and micro aspects. In macro aspect, tests impact society and education system such as development of curriculum. In micro aspect, tests impact individual student or teacher such as improving teaching and learning process. Washback can also be negative and positive (Saehu, 2012: 124-127). It is easy to find negative wash back such as narrowing down language competencies only on those involve in tests and neglecting the rest. While language is a tool of communication, most students and teachers in language class only focus on language competencies in the test. On the other hand, a test can be positive washback if it encourages better teaching and learning. However, it is quite difficult to achieve. An example of positive washback of a test is National Matriculation English Test in China. It resulted that after the test was administered, students’ proficiency in English for actual or authentic language use situation improved. Washback can be strong or weak (Saehu, 2012: 122-123). An example of strong effect of the test is national examination; meanwhile weak effect of the test is the impact of formative test. Let us compare and decide how most students and teachers react on those two kinds of test. 7. WAYS OF TESTING: 7.1. Direct and Indirect Testing: Direct Indirect *Hughes (1989:14) : the involvement of a skill that is supposed to be tested. The following view means that when applying the direct testing the teacher will be interested in testing a particular skill, e.g. if the aim of *Indirect testing. It differs from direct one in the way that it measures a skill through some other skill. It could mean the incorporation of various skills that are connected with each other, e.g. listening and
  • 14. Research Methodology :https://www.facebook.com/groups/689682314444236/ the test is to check listening comprehension, the students will be given a test that will check their listening skills, such as listening to the tape and doing the accompanying tasks. Such type of test will not engage testing of other skills. *Testing is direct when it requires the learner to perform precisely the skill that we wish to measure. If we want to know how well learners can write compositions, we get them to write compositions. The tasks and the texts used should be as authentic as possible *It is said that the advantages of direct testing is that it is intended to test some certain abilities, and preparation for that usually involves persistent practice of certain skills. Nevertheless, the skills tested are deprived from the authentic situation that later may cause difficulties for the students in using them. speaking skills. *Indirect testing, regarding to (Hughes), tests the usage of the language in real-life situation *indirect testing is more effective than direct one, for it covers a broader part of the language. It denotes that the learners are not constrained to one particular skill and a relevant exercise. They are free to elaborate all four skills; what is checked is their ability to operate with those skills and apply them in various, even unpredictable situations. This is the true indicator of the learner’s real knowledge of the language. 7.2. Discrete point and Integrative Testing: Discrete point Integrative _discrete point test is a language test that is meant to test a particular language item, e.g. tenses. The basis of that type of tests is that we can test components of the language (grammar, vocabulary, pronunciation, and spelling) and language skills (listening, reading, speaking, and writing) separately _The integrative test intends to check several language skills and language components together or simultaneously. Hughes (1989:15) stipulates that the integrative tests display the learners’ knowledge of grammar, vocabulary, spelling together, but not as separate skills or items. 7.3. Norm referenced and CriterionreferencedTesting: They are not focused directly on the language items, but on the scores the students can get Norm Referenced(proficiency & placement tests) Criterion Referenced (achievement & diagnostic) *Norm-referenced tests refer to standardized tests that are designed to compare and rank test takers in relation to one another. This type of tests reports whether test takers performed better or worse than a hypothetical average student. It is designed to measure global language abilities, such as overall English language proficiency and academic listening ability, in which each student’s score is interpreted relative to the scores of all other students who took the test. *Criterion-referenced tests are designed to measure students’ performance against a fixed set of criteria or learning standards. That is to say, they are written descriptions of what students are expected to know and be able to do a lot at a specific stage of their education. CRTs provide information on whether students have attained a predetermined level of performance called “mastery”.
  • 15. Research Methodology :https://www.facebook.com/groups/689682314444236/ * Norm-referenced test that measures the knowledge of the learner and compares it with the knowledge of another member of his/her group. The learner’s score is compared with the scores of the other students *In NRTs, testers interpret each student’s performance in relationship to the performances of other students in the norm group. In other words, NRTs examine the relationship of a given student’s performance to that of all other students in percentile terms. Scores are expressed with no references to the actual number of test questions answered correctly .This means that teachers are mainly concerned with the student’s percentile score which informs them about the proportion of students who scored above and below the student in question. *Tests are used to measure general abilities such as language proficiency in English. This type of tests has subtests that are general in nature. For example, measuring listening comprehension, reading comprehension and writing *the purpose is to generate scores that spread the students out along a continuum of general abilities. Thus, any existing difference between individuals can be distinguished since a student performance is compared to others in the same group *the test is very long and contains a variety of different types of question content. The content is diverse and students find difficulties to know exactly what will be tested because the test is made up of a few subtests on general language skills such as reading and listening comprehension * students know the general format of the questions but not the language points or content to be tested by those questions * The aim of testing is not to compare the results of the students. It is connected with the learners’ knowledge of the subject. As Hughes (1989:16) puts it the criterion-referenced tests check the actual language abilities of the students. They distinguish the weak and strong points of the students. The students either manage to pass the test or fail it. * the primary focus in interpreting scores is on how much of the material each student has learnt in absolute terms. That is, teachers are concerned with how much of the material the students know (The Percentage). They care about the percentage of questions the students answered correctly in connection with the material at hand without reference to students’ positions. A high percentage score means that the test was very easy for students who knew the material being tested. * CRTs are designed to provide precise information about each individual’s performance on well-defined learning points. Subtests for a notional functional language course might consist of a short interview where ratings are made of students’ ability to perform greetings, express opinions and so on. *the purpose is to assess the amount of skill learnt by each student. That is to say, the focus here is on a student’s performance compared to the amount of material known by that student, and not on scores’ distribution. * a CRT consists of numerous and short subtests in which each objective in the course will have its own subtest. To save time and efforts, subtests will be collapsed together which makes it difficult for an outsider to identify the subtests. * students can predict both the questions formats on the test and the language points to be tested. Teaching to such a test should help teachers and students stay on track .Besides, the results should provide a useful feedback on the effectiveness of teaching and learning processes.
  • 16. Research Methodology :https://www.facebook.com/groups/689682314444236/ 7.4.Objective and Subjective Testing: Objective Subjective *A objective testing is one that can’t be interpreted differently because of numerical values. *when testing the students objectively, the teacher usually checks just the knowledge of the topic *A subjective testing is one that can possibly be interpreted differently *The subjective test involves personal judgement of the examiner *Testing subjectively could imply the teacher’s ideas and judgements. This could be encountered during speaking test where the student can produce either positive or negative impression on the teacher. Moreover, the teacher’s impression and his/her knowledge of the students’ true abilities can seriously influence assessing process. For example, the student has failed the test; however, the teacher knows the true abilities of the student and, therefore, s/he will assess the work of that student differently taking all the factors into account. 7.5. Communicative Testing: *It involves the knowledge of grammar and how it could be applied in written and oral language; the knowledge when to speak and what to say in an appropriate situation; knowledge of verbal and non- verbal communication. All these types of knowledge should be successfully used in a situation *without a context the communicative language test would not function. The context should be as closer to the real life as possible. It is required in order to help the student feel him/herself in the natural environment. *the student has to possess some communicative skills, that is how to behave in a certain situation, how to apply body language, etc. *Communicative language testing involves the learner’s ability to operate with the language s/he knows and apply it in a certain situation s/he is placed in. S/he should be capable of behaving in real-life situation with confidence and be ready to supply the information required by a certain situation. Thereof, we can speak about communicative language testing as a testing of the student’s ability to behave him/herself, as he or she would do in everyday life. We evaluate their performance. 8. A SHORT HISTORY OF LANGUAGE TESTING: Spolsky (1975) identifies three stages in the recent history of language testing: 1) The pre-scientific 2) the psychometric-structuralist and 3) the psycho-linguistic- sociolinguistic. 8.1. The Pre-scientific Period:
  • 17. Research Methodology :https://www.facebook.com/groups/689682314444236/ Language testing has its roots in pre-scientific stage in which no special skill or expertise in testing is required. This is characterized by lack of concern for statistical considerations or for such notions as objectivity and reliability (Heaton 1988, Weir 1990; Farhady et al., 1994). In its simplest form, this trend assumes that one can and must rely completely on the subjective judgment of an experienced teacher, who can identify after a few minutes of conversation, or after reading a student’s essay, what mark to give him/her in order to specify the related language ability. The pre-scientific movement is characterized by translation tests developed exclusively by the classroom teachers. One problem that arises with these types of tests is that they are relatively difficult to score objectively; thus, subjectivity becomes an important factor in the scoring of such tests (Brown, 1996). It is inferred from Hinofotis’s article (1981) that the pre-scientific movement ended with the onset of the psychometric structuralist movement, but clearly such movements have no end in language teaching and testing because, such teaching and testing practices are indubitably going on in many parts of the world depending on the needs which specific academic contexts demand. 8.2. The Psychometric Structuralist Period : With the onset of the psychometric-structuralist movement of language testing, language tests became increasingly scientific, reliable, and precise. In this era, the testers and psychologists, being responsible for the development of modern theories and techniques of educational measurement, were trying to provide objective measures, using various statistical techniques to assure reliability and certain kind of validity. According to Carrol (1972), psychometric-structuralist tests typically set out to measure the discrete structural elements of language being taught in audio-lingual and related teaching methods of the time. The standard tests, constructed according to discrete point approach, were easy to administer and score and were carefully constructed to be objective, reliable and valid. Therefore, they were considered as an improvement on the testing practices of the pre-scientific movement (Brown, 1996).
  • 18. Research Methodology :https://www.facebook.com/groups/689682314444236/ In psychometric structuralist period, there was a remarkable congruence between American structuralist view of language and psychological theories and practical needs of testers. On the theoretical side, both agreed that language learning was chiefly concerned with the systematic acquisition of a set of habits; on the practical side, testers wanted and structuralists knew how to deliver long lists of small items which could be sampled and tested objectively. However, the following triple objectives were achieved from discrete tests, which was the result of the coalescence of the two fields. 1) diagnosing learner strengths; 2) prescribing curricula at particular skills; 3) developing scientific strategies to help learners overcome particular weakness The psychometric-structuralist movement was important because for the first time language test development followed scientific principles. In addition, Brown (1996) maintains that psychometric-structuralist movement could be easily handled by trained linguists and language testers. As a result, statistical analyses were used for the first time. Interestingly, psychometric-structuralist tests are still very much in evidence around the world, but they have been supplemented by what Carrol (1972) called integrative tests. 8.3. The Integrative-Sociolinguistic Period : With the attention of linguists inclined toward generativism and psychologist toward cognition, language teachers adopted the cognitive-code learning approach for teaching a second and/or foreign language. Language professionals began to believe that language is more than the sum of the discrete elements being tested during the psychometric-structuralist movement (Brown, 1996; Heaton 1991; Oller, 1979). The criticism came largely from Oller (1979) who argued that competence is a unified set of interacting abilities that cannot be tested apart and tested adequately. The claim
  • 19. Research Methodology :https://www.facebook.com/groups/689682314444236/ was that communicative competence is so global that it requires the integration of all linguistic abilities. Such global nature cannot be captured in additive tests of grammar, reading, vocabulary, and other discrete points of language. According to Oller (1983), if discrete items take language skill apart, integrative tests put it back together; whereas discrete items attempt to test knowledge of language a bit at a time, integrative tests attempt to assess a learner’s capacity to use many bits all at the same time. This movement has certainly its roots in the argument that language is creative. Beginning with the work of sociolinguists like Hymes (1967), it was felt that the development of communicative competence depended on more than simple grammatical control of the language; communicative competence also hinges on the knowledge of the language appropriate for different situations. Tests typical of this movement were the cloze test and dictation, both of which assess the students’ ability to manipulate language within a context of extended text rather than in a collection of discrete-point questions. The possibility of testing language in context led to further arguments that linguistic and extralinguistic elements of language are interrelated and relevant to human experience and operate in orchestration. Consequently, the broader views of language, language use, language teaching, and language acquisition have broadened the scope of language testing, and this brought about a challenge that was articulated by Canale (1984) as the shift in emphasis from language form to language use. This shift of focus placed new demands on language as well as language testing. Evaluation within a communicative approach must necessarily address, for example, new content areas such as sociolinguistic appropriateness rules, new testing formats to permit and encourage creative, open-ended language use, new test administration procedures to emphasize interpersonal interaction in authentic situations, and new scoring procedures of a manual and judgmental nature (Canale 1984, p. 79, cited in Bachman, 1995).
  • 20. Research Methodology :https://www.facebook.com/groups/689682314444236/ For both theory and practice, the challenge is thus to develop tests that reflect current views of language and language use, in that they are capable of measuring a wide range of abilities generally associated with ‘communicative competence’ or ‘communicative language ability’, and include tasks that themselves embody the essential features of communicative language use (Bachman 1995, p. 296).