Videos (Summary)

Testing summary (videos)
Diego Ulloa
Lorena Salazar

Pre – testing
J. D. Brown

What is Pre – testing?
For this author, Pre testing is also called Piloting
 It is when you take a group of language items and try them out with a group
of students.
 It allows us to know which items will be better for all our students. It is done
when bottom and top students discriminate the items.

Why is it important to do Pre – Testing?
Because it helps us to know which items are useful and which aren’t.
Most teacher tend to think that all items are good but by pretesting they are able to
prove which ones really are and which one are not.

How do you select the items?
It helps to understand how well your items are working
1) It is important to develop more items that the ones that are going to be
needed (if you need 40 items for the final test, would be better if you
develop about 60 items)
2) Then administrate it to the students that are going to take the test. (Ideally
different students that the ones who are going to take the test)
3) Then write the items on a sheet of paper in columns (see example on the
video)
4) Then we are able to calculate the item difficulty which represents the
student’s percentage that did wrong on a specific item.
5) Then we can calculate the item discrimination what involves diving the
student into three groups (according to their scores)
Doing this you can calculate the item difficulty to each group.

Then you are able to choose the items you are going to test. You do it according to
the item discrimination and the item difficulty.

Diego Ulloa
Lorena Salazar

Vocabulary
John Read

At first, Vocabulary was not consider an important thing in assessing
 Imbibed vocabulary: when you look at the vocabulary within the context of a
larger construct

Vocabulary is useful for:
- Placing students in Language
- Measuring progress
- Diagnoses

Vocabulary knowledge is important because it helps in fluent reading
comprehension

Measuring vocabulary size:
1) Take samples of a word frequency list
2) Then divide the text to know if those words are known or not
3) Then we make a list of the total vocabulary size of the native speaker or
the 2nd lg learner

Vocabulary tasks:
1) Descript vocabulary
2) Yes / no
3) Multiple choice
4) Matching task
5) Gap filling

Depth of vocabulary knowledge
It goes beyond that only testing student’s vocabulary meaning to assess other
aspects of word knowledge
Word associated format: the test taker is presented with the target word and then a
series of other words which are related (students have to select the words that are
related to the target word)
Vocabulary knowledge scale: students must write how well they know the word’s
meaning. They are also asked to give synonyms or translation and also use the
word in sentence.

Vocabulary is good for lexical units like idioms, phrasal verbs, collocations,
formulating sequences and lexical phrases

Diego Ulloa
Lorena Salazar

Reading
Caroline Clapham
- By taking just a passage it would test only identification of facts such as
addresses dates and names. So it would be better to think of abilities that go into
reading skills:

 items that test inferencing,
 a whole string of skills and
 strategies used in reading

- You should have a list where one specifies the skills needed in reading and
then try samples of that list:

 skimming,
 scanning,
 looking for small pieces of information, and
 looking for the general meaning of a paragraph

How should you choose the reading passages?
- Considering the purpose of the students:

 Use those sorts of texts that people will read when they are using English
(contextualization). These texts do not need to be totally authentic but look
like that at least.
 Use several texts: it makes the reading text far more reliable.
o You can vary things like the genre the topic some students may be more
familiar with.
o short texts for intensive reading (when looking for specific facts)
o long texts for skimming and scanning for extensive reading (when looking
for general meaning)

How should you test these passages?
All depends on both students and circumstances. It’s better if you have more than
one sort of method:

 Multiple Choice Questions (MCQ): very good for testing reading
comprehension but very difficult to write, so you can use this alternative only
if you are a trained and experienced writer in MCQ. It is compulsory to pre-
test them to see they are working well.

Diego Ulloa
Lorena Salazar

 Short Answer Questions: very revealing by limiting the answer to say--for
instance, three words. It is also a fairly easy to mark.
 Selection of Headings: very good alternative if you are trying to get
understanding of overall meaning of a passage only if they are provided in a
“matching” way.
 Gapped Summary: for advanced readers, in which the test writer
summarizes a text with gaps and students should fill in those gaps.
 Information Transfer: students get the text, understand it and then transfer
what they have read into a chart.
 True or False Questions: they can work well if treated carefully because of
the 50%-getting-right-item issue.

Diego Ulloa
Lorena Salazar

Speaking
Glenn Fulcher
A speaking test is essentially composed of three parts:

1. Task: something that we give to the learners; it elicits language from them.
2. Rating Scale: it is used to grade the sample of language (the task).
3. Rater: It may be the interlocutor or different people that grade that sample
of language.

Where does the rating scale come from? Why do they seem to be pretty
important?
They have been designed in a number of different ways that suit teacher’s own
purposes:

 Intuitional: The different levels of the rating scale are created on the base of
the teachers/interlocutors’ experiences. Fulcher calls this “armchair method”
because it is based more on one’s own intuition)
 Empirical:

I. By collecting samples of language from the students to show them to
teachers and come to an agreement on which one is the best.
II. By collecting samples of language but analyzing what the students
are saying perhaps using discourse analysis or conversation
analysis and then use those descriptors to generate the rating
scales actually to write the different bands.

What about the raters?
Sometimes the rater is a separate persona or the interlocutor itself, but in the first
case it allows them to concentrate on the rating process and reliability.
Reliability is divided into two:

 Intra-rater reliability: if an individual rater can agree with him or herself in
rating the same sample of language over a period of time.
 Inter-rater reliability: if different raters can agree with each other when
they are marking or grading the same sample of language.

Is speaking the most difficult ability to test?
Testing speaking brings its own special problems:

Diego Ulloa
Lorena Salazar

 In terms of validity:
o Rating Scales: they have to be well-written in terms of the definition
and specification of what we want to measure.
o Task: we need to be able to make sure that the kind of language
that we are listing can actually be rated using the rating scale
previously created, so the rating scale and the task design has to go
hand in hand.
o Generalisability: concerned with the use of the understanding of the
score that the student is given, i.e., the interpretation of the score out
of the context itself and making predictions about what the student is
or is not capable of doing in the future or in a non-test context.

 In terms of practicality:
o Speaking tests are difficult to organize, so they need to have enough
trained raters.

Diego Ulloa
Lorena Salazar

Listening
Gary Buck
o Listening comprehension is a process of constructing meaning where two
different sort of knowledge play a great role on this: linguistic and
nonlinguistic knowledge.
o Listening comprehension is quite difficult because people do not speak in
sentences as in written language form: we find many pauses and hesitation;
we do not finish our “oral sentences”.
o Listening takes time in real time. This means that the processes must be
automatic and then the information needs to be memorized.

We find two problems when testing listening comprehension:

1. We cannot examine a piece of listening comprehension.
2. There is interference. It is possible that people interpret the listening in
different ways.

There are two types of listening
The interactive listening (when there is a speaker and a listener who change roles)
The non interactive listening (when there is only a listener. Ex: listen to radio)

There are two purposes in listening interaction:
1. Convey information
2. Establish relationship (social interaction – transactional use to provide
information)

What type of item task should we use for listening?

 Dictation: It is the easiest way but it tests a very narrow range of language
skills. (It is easy to make and to mark)
 Statement evaluation activities: in which we give people a statement and
they evaluate if it is true or not. (it also tests a very narrow range of skills)

Listening for longer discourses or a deeper level: we use comprehension
questions and information transfer items.

How can we do that communicative?

Diego Ulloa
Lorena Salazar

There are two ways to do it communicative:

1. Comprehending explicit linguistic information.
2. Comprehension in terms of interpreting the meaning in a broad context.

Listening comprehension can be defined as understanding samples of realistic
language and the process is automatic and in real time. This means to also
understand any inferences and implications of the content or context.

Diego Ulloa
Lorena Salazar

Item Writing
Charles Stansfield
What makes a good item writer?

 They have to be an excellent writer in the language in which they are
writing.
 They have to have teaching experience.

Are there any principles of good item writing?

 Those principles vary according to the paper that one is developing.
 In general, good writing items starts with looking at:
o test specifications
o the audience itself
o the purpose for the test
o the test content
o item writing guidelines
o sample items

What do you look for when reviewing an item?
This process is an integrated one that typically involves four issues:

 Content review: the item is compared with the specifications we are looking
to see whether the reviewer agrees with the content classification of the
item.
 Key check: It tends to be ignored. It is actually specified by the item writer
and submitted with the content classification, and the reviewer will consider
whether there are any other possible keys that should be listed in the key list
of acceptable responses.
 Bias review: there are two things we look for here.
o Content Bias: it is the nature of the item going to favor a particular
group of examinees in terms of familiarity with that context or the
content. It has to be neutral.
o Sensitivity Bias: it has to do with wording of the item that is likely to
be offensive or insensitive to any particular group of examinees.
 Editorial review: It includes re-write and re-word items, but the issues are
mainly mechanics, punctuation, spelling, grammar, clarity and style of
writing.

How many times should an item writer be reviewed?

Diego Ulloa
Lorena Salazar

 As many as needed; every item writer needs multiple reviews.
 The item review process consists of two horizontal stages:
o When the items go out to more than one reviewer and a set of
comments get back from them.
o When once those items are already compared, they have to be sent
to a different set of reviewers and then a set of comments get back
from those new reviewers.

Diego Ulloa
Lorena Salazar

Testing Writing
Liz Hamp-Lyons
What is writing assessment?

 The most important characteristic is that in this form of assessment the test
taker actually writes as opposed to completing multiple choice items or
some other non-productive form.
 We also expect to have:
o a focused topic
o specification of audience
o specification of purpose for writing
o the method by which the writing is going to be judged
o people or any computer system to make those judgments
o a scale for reporting the performance of the writers
o means of validity for the assessment instrument

How reliable are writing tests?
It depends on how carefully one works on ensuring reliability:

 Having more than one item though it is difficult because it takes so long to
produce a text.
 Having more than one judge, and a third when there is a disagreement
between those two judges.
 Scoring different dimensions of the text (grammar, organization, content,
etc)
 Having more than one writing occasion though it does not happen very often
because it is so time consuming.

What do we have to remember in order to end up with a writing test that is as
reliable as it can be and still keep validity that this kind of assessment has?

 In terms of validity: it is a need to have people thinking about educational
goals, curriculum and syllabus to assess what has been taught.
 In terms of reliability:
o The teacher has the opportunity to have more than one item.
o Teachers can work with other teachers to exchange pieces of writing
and discuss them.

If I’m looking at writing test from the point of view of the teacher or future test taker
what should I look for about the test? what information should be provided to me?

Diego Ulloa
Lorena Salazar

From the teacher’s viewpoint, the more information provided and positive the
assessment is the better, because teachers tend to pick out what students cannot
do rather than what they actually can do.

Videos (Summary)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Videos (Summary)

Similar to Videos (Summary) (20)

More from Diego ElCretino

More from Diego ElCretino (20)

Recently uploaded

Recently uploaded (20)

Videos (Summary)