testing and evaluation

Submitted to: Mam Sobia Rasheed
Submitted by: Aqsa Suleman
Topic: Testing and evaluation of
language

Dialogue conversation between two
students
BATOOL
Meerab

Batool: hi
Meerab: hello
Batool: what are you doing?
Meerab: I am preparing for my assessment.
Batool: when will be your assessment?
Meerab: tomorrow , I have to attempt a quiz.
Batool: do you like to attempt a test?
Meerab: why not?
Batool: I think its wastage of time.
Meerab: no, I think its a way of testing our
knowledge.
Batool: how did your knowledge be tested?
Meerab: because it directly checks the abilities of
candidate who is attempting that test. And these
abilities are shown by tackling with obstacles
and be focused to your work , how
student/attempter arrange/organize the paper.
It shows his/her aesthetic sense as well.

Batool: yes, your point is valid and I am getting your point now.
Meerab: yes my dear, its very important I think.
Batool: now I am satisfied with testing and evaluation.
Meerab: I am very glad that you got the point.
Batool: thanks a lot
Meerab: my pleasure.
Batool: good bye
Meerab: good bye.

What’s your opinion about
testing?
Do you like testing or not?

9
Types of assessment
 Formative Assessment : when teachers use it to check on the progress of their students, to see
how far they have mastered what they should have learnt, and then use this information to
modify their future teaching plans. Informal assessment is a part of formative assessment. It
can take a number of forms : Unplanned comments, verbal feedback to students, observing
students perform a task of work in small groups and so on.
 Summative Assessment : is used at the end of the term, semester, or year in order to measure
what has been achieved both by groups and by individuals.
 *Formal assessment is part of summative assessment. i.e. Exercises or procedures which are
systematic and give students and teachers an appraisal of students’ achievement.

Definition of
evaluation
DEFINITION OF EVALUATION: It is the process of making overall
judgment about one’s work or a whole school’s work. Evaluation is typically
broader concept than assessment as it focuses on the overall, or summative
experience.
when we ASSESS our students we commonly are interested in "how and how
much our students have learnt” , but when we EVALUATE them we are
concerned with “how the learning process is developing” .

Types of tests
 Proficiency tests
Proficiency testing is also called inter-laboratory comparison. As
this term implies, proficiency testing compares the measuring
results obtained by different laboratories. In a proficiency test one
or more artifacts are sent around between a number of
participating laboratories.
measure students' achievements in relation to a specific task which
they are later required to perform (e.g. follow a university course
in the English medium; do a particular job). Reference forward to
particular application of language acquired: future performance
rather than past achievement. They rarely take into account the
syllabus that students have followed. Definition of operational
needs. Practical situations. Authentic strategies for coping.
Common standard e.g. driving test regardless of previous learning.
Application of common standard whether the syllabus is known or
unknown.
11

Progress Tests: Most classroom tests take this
form. Assess progress students make in mastering
material taught in the classroom. Often given to
motivate students. They also enable students to
assess the degree of success of teaching and
learning and to identify areas of weakness &
difficulty. Progress tests can also be diagnostic to
some degree.
12

Continue
Diagnostic Tests : can include Progress, Achievement and
Proficiency tests, enabling teachers to identify specific
weaknesses/difficulties so that an appropriate remedial program can
be planned. Diagnostic Tests are primarily designed to assess
students' knowledge & skills in particular areas before a course of
study is begun. A diagnostic procedure is an examination to identify
an individual's specific areas of weakness and strength in order
determine a condition, disease or illness.
Reference back to class-work
Motivation
Remedial work.
13

14
Aptitude Tests: measure students probable performance. Reference forward but can be
distinguished from proficiency tests. Aptitude tests assess proficiency in language for language
use (e.g. will S experience difficulty in identifying sounds or the grammatical structure of a new
language?) while Proficiency tests measure adequacy of control in L2 for studying other things
through the medium of that language.
Achievement tests :
An Achievement test is an assessment of developed knowledge or skill. The most common
type of achievement test is a standardized test, such as the SAT, required for college entry in
the United States. Achievement tests are developed to measure skills and knowledge learned
in a given grade level, usually through planned instruction, such as training or classroom
instruction. Achievement tests are often contrasted with aptitude tests.

 Placement tests
A test usually given to a student entering an educational
institution to determine specific knowledge or proficiency
in various subjects for the purpose of assignment to
appropriate courses or classes.
15
Continue

Communicative testing
A communicative test is one which requires the students to complete an
authentic task – in other words, a task which is a realistic reflection of a
learner's experiences in the outside world. As a result, the test may include a
reading task, a writing task, and a speaking task.
16

17
Tablet
project
Discrete point test is a language test that is meant to test a particular
language item, e.g. tenses. The basis of that type of tests is that we can test
components of the language (grammar, vocabulary, pronunciation, and
spelling) and language skills (listening, reading, speaking, and writing)
separately.
The integrative test intends to check several language skills and language
components together or simultaneously. Hughes (1989) stipulates that the
integrative tests display the learners’ knowledge of grammar, vocabulary,
spelling together, but not as separate skills or items.

18
Objective means making an unbiased,
balanced observation based on facts which can
be verified.
Subjective means making assumptions,
making interpretations based on
personal opinions without any
verifiable facts.
Objective observations or assessments can be
used before arriving at any decisions.
Subjective observations or information
should not be used while taking any
important decisions.
Objective information can be found in
Scientific journals, research papers, textbooks,
news reporting, encyclopedias etc.
Subjective observations can be found
in biographies, blogs, editorials of
newspapers etc.
An Objective observation or assessment is
made after necessary information is verified
A Subjective assessment is made
without verifying the necessary
information.
An objective statement is provable and can be
easily measured
A subjective statement is relative to the
person in concern
This is a method of stating or storytelling the
truth in a systematic manner from all
perspectives
Any subjective information is derived
from the opinion, or interpretation of
a character and may depend on
Objective
vs.
subjective
tests

Principles of language
ASSESSMENT: There are five principles of language
assessment; they are practicality, reliability, validity, authenticity,
and wash back.
PRACTICALITY
An effective test is practical. This means that it:
o is not excessively expensive.
o A test that is prohibitively expensive is impractical.
o stays within appropriate time constraint.
o A test of language proficiency that takes a student 10 hours to
complete is impractical.
o is relatively easy to administer.
19

This requirement means:
The cost of the test is not too expensive.
It’s impractical when the test need a lot money (and time also) while the cost
counted cheaply.
Time of the test notice students appropriate.
The test takers need six hours in doing a test. That case is impractical test.
Easy to administer.
It’s practical when a placement test proctoring by 2 or 3 persons in one class with a
hundred test takers.
The procedure of scoring/evaluation consider the time efficient and specific.
The examiner must give clearly procedure of scoring and have enough time to
evaluate and time to announce the result.
21
Practicality

Reliability
22
Reliable is dependable, trustworthy, and consistent. A test must have that criterion.
For instance, If I intend a test and students passed with a good score then tomorrow I
come to that class with the same test, it’s reliable if student’s score mostly same with
yesterday test.
The illustration, if my son uses a tape measure to measure my height and finds that I
am 178 centimeters tall one time, I would expect to be about the same height if he
measures me again 30 minutes later. I would reasonably assume that the scale that he
is using to measure me was designed to measure height does not turn out to be
measuring weight. Likewise, reliability could make feasible and convenient.
Nonetheless, the reliability of a test might undermine by some issues that may
constitute unreliability of a test.

23
Student related reliability
This issue related to student’s condition in which caused whether
psychology or physical factors such as illness, mental readiness, anxiety
and so forth. Motivation and clear instruction would help students to
overcome these factors.
Rater reliability
There are two categories that related to this issue which are inter-rater
reliability means inconsistent scores to the same test between one
students with others because it might be lack of care to score criteria,
lack of responsibility or might presupposed biases. And intra-rater
reliability occurs when student don’t have the criteria of teacher’s score
that must be raise and teacher’s judgment based on good or bad
students.

24
Continue
Test administration reliability
Unreliability might occurs when recognizing the test because environment factors, class
condition and test form. For instance street noise, chair problem, desk problem, light
problem, etc.
Test reliability
Measurement errors could happen when the test running. Time constraint necessary
concern to that test. If the test takes long time to do, sometimes students may become
fatigue to the difficult items then reach others and when the time tickling to the last
students may be just quick respond incorrectly.

25
That’s a lot of money
 Criterion-related evidence
Two ways to find out the criteria: Concurrent validity in which student’s scores
(standard) and supported by their proficiency of language and for some kind
of tests such as placement test, language aptitude test need predictive validity in
order to assess a test taker’s likelihood future success.
 Construct-related evidence
A construct is any hypothesis, theory or pattern that endeavor to explain
observed phenomena in our universe of perceptions. However, it is not
directly use to measure a test but need inferential data as verification to match
both issues.
 Content-related evidence
Content related how far the test content could adequate as representative to
address the issues. Identify the achievement that you want to measure and
consider direct testing (students interact teacher’s question by spoken the target
language) and indirect testing (teacher question and interact students to write
the target language).

￮ Validity
We already discussed two principles in which indispensable to be ignoring
by the assessors. The next principle is validity in which many researcher
and assessor arguably could be the representative of other principles. As
defined by Fulcher and Davidson (2007) question of validity impact on our
daily lives and how we interact with people and the world around us; it is
about we observe all kinds of behavior, hear what people say to us and
make inference that lead to action or beliefs. Investigation to validity
Brown had defined some evidences that could invoked to support the test
valid. These are:
26

Consequential validity
As a teacher or assessor of test hope that yield of a test consequence to
students in both students preparation towards the test and affect to their social
life.
Face validity
Face validity refers to subjective judgment by the examinees which a test looks
and appears. If students perceive the instrument or design of the test
convenient and easy to digest than the test is valid.
27

28
Red
Is the color of
blood, danger
and courage.
Red
Is the color of
blood, danger
and courage.
Authenticity
The next principle is authenticity, a base that covered the design a
form of test, including the features, appropriate language, and the
implication of the test. The tendency of this principle may be
students feasible recognized the language related to fact or not just
perception. It might present in to the following ways:
 Use natural language;
 The items prior contextualized;
 Meaningful topics (relevant, interesting)
 The items organize in thematic way (through a story line or
episode)

Washback
A fifth major of principle of language testing is washback, generally refers to the
influence of testing on teaching and learning. Washback occurs more in
classroom assessment when information could ‘washes back’ to students and it
useful to identify strengths and weaknesses. It’s challenging for the teacher to
achieve that washback. Many teacher, because inattention or fatigue instead just
give a letter grade or score. The way to enhance washback by comment
generously and specifically on test performance, such as: give complement for the
strengths, constructive criticism for weaknesses, emphasized certain elements that
might improve their test performance and so forth.
29

30
Ways of testing
Direct Language Testing
A test is said to be direct when the test actually requires the candidate to
demonstrate ability in the skill being sampled. It is a performance test.
Indirect testing
An indirect test measures the ability or knowledge that underlies the skill we
are trying to sample in our test.

Norm referenced and Criterion
referenced Testing:
Norm Referenced (proficiency & placement tests)
Norm-referenced tests refer to standardized tests that
are designed to compare and rank test takers in relation
to one another. This type of tests reports whether test
takers performed better or worse than a hypothetical
average student. It is designed to measure global
language abilities, such as overall English language
proficiency and academic listening ability, in which each
student’s score is interpreted relative to the scores of all
other students who took the test.
31
Criterion Referenced (achievement &
diagnostic)
Criterion-referenced tests are designed to
measure students’ performance against a
fixed set of criteria or learning standards.
That is to say, they are written
descriptions of what students are
expected to know and be able to do a lot
at a specific stage of their education.
CRTs provide information on whether
students have attained a predetermined
level of performance called “mastery.

32
Spolsky (2001) states that in the course of the first 2000 years that human abilities have
been assessed formally, tests and examinations have become progressively more
powerful. A century ago, a strong attack on examinations was launched by showing
their “inevitable uncertainty”, but a growing testing industry has managed a
stubborn defense. More recently, appreciation of the complexity of fundamental notions
like “language proficiency” and acceptance of the resulting impossibility of finding an
interpretable single measure is leading to a realization that assessment of language
knowledge is multipart and intricate, more likely to be served by profiles than by simple
scores.
According to Davis (1990), what makes language testing important within applied linguistics
is that unlike other subjects offered to students in education, a language has no obvious
content. Therefore, assessing language produces complications regarding what is to be
tested, i.e., the issue of validity, and how the testing is to be done i.e., the issue of reliability.

According to Shohamy (2001a), tests are frequently used as the measurement
instruments designed to elicit specific behavior, directly or indirectly. In fact,
as the instruments of educational policy, they are used broadly and can be
very powerful.
Spolsky (2001) states that if we accept the development of examinations on
classical Confucian doctrine during the Han Dynasty as the beginning of
formal testing, we have 2000 years of history from which to derive our
understanding of the process and on which to base our assessment of the
current state of the art.

He further mentions five major purposes for test using as
follows:
 using tests as a competitive selection device
 using tests in order to provide information on the quality
of the “product” to those who are paying for an education
system
 using tests to process and certify that an individual has
achieved a specific level of technical or professional skill
 using tests for prediction or prognosis of the probable
results of training
 using tests as an integral part of all good teaching
Perhaps the most common use of language tests is to
pinpoint strengths and weaknesses in the learned abilities
of the students.

Another important use of language tests is the decision of who
should be allowed to participate in a particular program of
instruction (Henning 1987). Based on Farhady (1996), tests are
applied to make decisions about people’s lives. Therefore, fair
decisions will be impossible if tests do not provide accurate
information. On the other hand, specific samples of behavior
can be obtained by tests which distinguish it from other types of
measurement (Mousavi, 1999). Overall, any technique and
procedures to assess and measure a factor or some ability is
called a test.

Pre scientific movement
36
Language testing has its roots in pre-scientific stage in which no special skill or expertise in
testing is required. In its simplest form, this trend assumes that one can and must rely
completely on the subjective judgment of an experienced teacher, who can identify after a
few minutes of conversation, or after reading a student’s essay, what mark to give him/her
in order to specify the related language ability.
The pre-scientific movement is characterized by translation tests developed exclusively by
the classroom teachers. One problem that arises with these types of tests is that they are
relatively difficult to score objectively; thus, subjectivity becomes an important factor in
the scoring of such tests (Brown, 1996). It is inferred from Hinofotis’s article (1981) that
the pre-scientific movement ended with the onset of the psychometric structuralist
movement, but clearly such movements have no end in language teaching and testing
because, such teaching and testing practices are indubitably going on in many parts of the
world depending on the needs which specific academic contexts demand.

37
With the onset of the psychometric-structuralist movement of language testing, language tests
became increasingly scientific, reliable, and precise. In this era, the testers and psychologists,
being responsible for the development of modern theories and techniques of educational
measurement, were trying to provide objective measures, using various statistical techniques
to assure reliability and certain kind of validity. According to Carrol (1972), psychometric-
structuralist tests typically set out to measure the discrete structural elements of language
being taught in audio-lingual and related teaching methods of the time. The standard tests,
constructed according to discrete point approach, were easy to administer and score and
were carefully constructed to be objective, reliable and valid. Therefore, they were
considered as an improvement on the testing practices of the pre-scientific movement
(Brown, 1996).

38
In psychometric structuralist period, there was a remarkable congruence between American structuralist
view of language and psychological theories and practical needs of testers. On the theoretical side, both
agreed that language learning was chiefly concerned with the systematic acquisition of a set of habits; on
the practical side, testers wanted and structuralist knew how to deliver long lists of small items which
could be sampled and tested objectively.
However, the following triple objectives were achieved from discrete tests, which was the result of the
coalescence of the two fields.
1) diagnosing learner strengths;
2) prescribing curricula at particular skills;
3) developing scientific strategies to help learners overcome particular weakness

The psychometric-structuralist movement was important because for the first time
language test development followed scientific principles. In addition, Brown
(1996) maintains that psychometric-structuralist movement could be easily
handled by trained linguists and language testers. As a result, statistical analyses
were used for the first time. Interestingly, psychometric-structuralist tests are still
very much in evidence around the world, but they have been supplemented by
what Carrol (1972) called integrative tests.
39

40
Integrative- Sociolinguistic Movement
With the attention of linguists inclined toward generativist and psychologist
toward cognition, language teachers adopted the cognitive-code learning
approach for teaching a second and/or foreign language. Language professionals
began to believe that language is more than the sum of the discrete elements
being tested during the psychometric-structuralist movement (Brown, 1996;
Heaton 1991; Oller, 1979).
The criticism came largely from Oller (1979) who argued that competence is a
unified set of interacting abilities that cannot be tested apart and tested
adequately. The claim was that communicative competence is so global that it
requires the integration of all linguistic abilities. Such global nature cannot be
captured in additive tests of grammar, reading, vocabulary, and other discrete
points of language. According to Oller (1983), if discrete items take language
skill apart, integrative tests put it back together; whereas discrete items attempt to
test knowledge of language a bit at a time, integrative tests attempt to assess a
learner’s capacity to use many bits all at the same time.

44
Percentage Percentile
A mathematical unit of measurement
that displays the answer out of a total of
100.
Percentile is a value from which the
values of percentages below it are
found.
The unit of percentage is denoted by % The unit of percentile is denoted by xth,
for example, 30th,
It does not have quartiles. It has quartiles.
A percentage can be written in the form
of ratios.
A percentile cannot be written in the
form of ratios.
Percentages can also be written in the
form of decimals.
Percentiles, on the other hand, cannot
be written in the form of decimals.
Percentages are not based on the rank
of numbers.
A percentile is based on the rank of
numbers.
It is based on one case. It is based on the comparison of one
case with several cases.
It does not rely on the normal
distribution.
Percentile relies on the normal
distribution.
Percentage VS.
percentile

The key difference between percentage and percentile is the percentage is a
mathematical value presented out of 100 and percentile is the per cent of
values below a specific value. The percentage is a means of comparing
quantities. A percentile is used to display position or rank.
Which is better?
In other words, a percentile rank is a way of rank ordering people compared
to others in a sample. In the example above, a raw score of 34 means that she
answered 85% of the questions correctly (percentage) AND scored higher
than 60% of everyone else who took the test (percentile).
45

Example
46
Percentage Example: If a student scores 55 marks in their math's exams
out of a total score of 100, then he or she has scored 55% aggregate in
their math's exams. 55% is the percentage scored by the student in the
math exam.
Percentile: A percentile is defined as the percentage of values found
under the specific values.

Now you can use any emoji as an icon!
And of course it resizes without losing quality and you can change the color.
How? Follow Google instructions
https://twitter.com/googledocs/status/730087240156643328
✋👆👉👍👤👦👧👨👩👪💃🏃💑❤😂
😉😋😒😭👶😸🐟🍒🍔💣📌📖🔨🎃🎈
🎨🏈🏰🌏🔌🔑 and many more...
47

testing and evaluation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to testing and evaluation

Similar to testing and evaluation (20)

More from AqsaSuleman1

More from AqsaSuleman1 (6)

Recently uploaded

Recently uploaded (20)

testing and evaluation