2. Team-Talk
Reflect as a team….
1. What are our personal definitions of assessment?
2. What role does assessment playin the
learning/teaching process?
3. Whenwe thinkof assessment….
• What do we feelveryconfident about, and feel we do effectively?
• What do we findfrustrating, need more support behind, or want to knowmore
about?
3. The ABCs of Assessment
“Assess: Etymology: Middle English, probably from Medieval Latin
assessus, past participle of assidEre, from Latin, to sit beside, assist
in the office of a judge” (From MerriamWebster.com)
The Act of Assessment…
In Everyday Life, assessments are
judgments we make about people or things
based on data we gather through our
experiences and senses,
and compare to our personal standards and
criteria of what is correct, or successful.
4. In Education, assessments are
judgments we make about
students and their performance
based on data we gather through
instruments and/or observation,
and compare against given
standards
The ABCs of Assessment
The Act of Assessment…
5. •Provide feedback to students that will lead to personal
goal-setting and working towards success.
The ABCs of Assessment
The Act of Assessment…
The Goals of the act of assessment are to:
•Provide accurate information to teachers, students
and parents about current levels of student success
•Provide data to inform instruction in a way that leads
to success for all students
6.
7. Evaluation:
This concept involves looking
at all the factors that influence
the learning process, ex:
syllabus objectives, course
design, materials,
methodology, teacher
performance and assessment.
8. Assessment:
It involves measuring the
performance of our students
and the progress that they are
making. It helps us to be able
to diagnose the problems they
have and to provide them with
useful feedback.
9. COMPARISON BETWEEN ASSESSMENT AND
EVALUATION
Assessment Evaluation
Purpose To improve future
performance
To judge the merit or worth of a
performance against a pre-defined standard
Setting Criteria Both the assessee and the
assessor choose the criteria.
The evaluator determines the criteria
Control
The assessee --- who can
choose to make use of
assessment feedback
The evaluator --- who is able to make a
judgment which impacts the evaluatee
Depth of
Analysis
Thorough analysis by
answering questions such as
why and how to improve
future performance
Calibration against a standard
Response Positive outlook of
implementing and action plan
Closure with failure or success
10. This assessment can be of
three kinds:
1) Informal assessment
2) Formal assessment (testing)
3) Self assessment
11. Informal assessment:
It is the observation of everyday
performance. It is a way of collecting
information about our students’
performance in normal classroom
conditions .It is done without establishing
test conditions such as in the case of
formal assessment. We intuitively assess
them when speaking, writing, reading or
listening. We can see which students are
doing well and which students are having
difficulties. We are also aware of their
attitudes and effort.
12. Formal Assessment:
This is synonymous of “testing”. And there
are two possible interpretations of it.
1) It refers to what are often called
examinations.
This examinations are often external (KET,
PET, etc). They are administered to many
students under standardized conditions.
They assess a broad range of language.
They are marked objectively or under
standardized subjective marking schemes
and are likely to be administered at the end
of a course.
13. 2) Other authors include all types of
language tests under this term. This tests
include the kind of tests commonly
administered in class by the teacher, in
order to assess learning. These tests are
not so formal as the examinations of
external bodies and their scope of action is
limited to the context in hand. These tests
are often administered to one class, for
purposes internal to the class; they focus on
a narrow range of language; they are
assessed either objectively or subjectively;
they are done to assist teaching and are
often backward looking.
14. Self Assessment:
It refers when the students themselves
assess their own progress.
Dickinson(1997) says it is particularly
appropriate:
a) as a complement to self instruction.
b) to build autonomous and self directed
language learners.
c) It could give learners an opportunity to
reflect on his/her learning in order to
improve it.
15. In formal assessment, we also have the
terms summative and formative introduced
by Scriven (1967:43)
A) Formative: this refers to forms of
assessment which aim to evaluate the
effectiveness of learning at a time during
the course (quizzes), in order to make
future learning more effective.
16. b) Summative: the administration of this
test may result in some judgement on the
learner, such as ‘pass’ or ‘fail’. The
amount of contents assessed are usually
several.
17. Formal Assessment can also refer to test
types according to purpose.
The main types are listed below:
1) aptitude tests
2) placement tests
3) diagnostic tests
4) progress tests
5) achievement tests
6) proficiency tests
18. Aptitude Tests:
These are designed to predict who will be a
successful language learner and are based
on the factors which are thought to
determine an individual’s ability to acquire
a second or foreign language.
They are usually large scale tests taking a
long time to administer and with a number
of components, each testing a different
facet of language. They are also forward-
looking tests, concerned with future
language learning.
19. Placement tests:
These tests are used to make decisions
regarding the students’ placement into
appropriate groups. They tend to be quick
to administer and to mark. They are
usually administered at the start of a new
phase or language course. As a result,
students are often put into homogenous
groups for language study according to
their present language ability.
20. Diagnostic Tests:
These tests are usually syllabus based and
they aim to determine the students’ areas
of strength and weaknesses in relation to
the contents to be covered in the course.
21. Progress Tests:
These tests are usually written and
administered by a class teacher, and look
back over recent work, perhaps the work
of the last lesson or week. They usually
therefore test a small range of language.
(pop quizzes)
22. Achievement Tests:
This tests come at the end of a relatively
long period of learning, and whose content
derives from the syllabus that has been
taught over the period of time. They are
usually large scale tests, covering a wide
range of language and skills. These tests
can be used for a variety of purposes,
including promotion to a more advanced
course, certification, or as an entry
qualification to a job.
23. Proficiency Tests:
These tests are based on a theory of
language proficiency and the specific
language abilities to constitute language
proficiency. They are often related to
specific academic or professional situations
where English is needed. (PET, FCE, CAE,
IELTS, TOEFL, etc)
24.
25. West (1990) gives a good summary of these
principles of testing. The principles can be
described in pairs:
1) Competence v/s Performance
2) Usage v/s Use
3) Direct v/s Indirect Assessment
4) Discreet Point v/s Integrative Assessment
5) Objective v/s Subjective Assessment
6) Receptive v/s Productive Skills
7) Backward and Forward-looking Assessment
8) Contextualized v/s Disembodied Language
9) Criterion Referenced and Norm-Referenced
Assess.
10) Reliability v/s Validity
26. The opposition between members of a
pair indicates some sort of tension that
exists in language testing in general;
generally the more that one test confirms
to one of the pair, the less likely it is to
exhibit characteristics of the other part of
the pair. Thus, the more reliable a tets
(multiple choice), the less valid it is likely
to be ( it tests only discrete items). This
opposition corresponds with the
differences between second & third
generation testing.
27. Competence v/s Performance:
Chomsky drew this distinction between the
ideal knowledge all mature speakers hold
in their minds (competence) and the flawed
realization of it that comes out in language
use (performance).
Third generation testing is often called
“performance testing”
28. Usage v/s Use:
Widdowson distinguished between language use
and language usage.
For example, learners whose instruction has
consisted of grammatical rules, will be required to
produce sentences to illustrate the rules. These
sentences are for Widdowson, examples of
usage. Examples of usage can show the
learner’s current state of competence, but will not
necessarily indicate anything about the learner’s
possible performance. He argues that
performance teaching and testing require
examples of language use, not usage.
29. Direct v/s Indirect Assessment:
Testing that assesses competence without
eliciting peformance is known as indirect
testing. Multiple choice testing fits this
decription, since language is assessed
without any production of language use
form the learner. Conversely, direct tests
use examples of performance as an
indicator of communicative competence.
These tests use testing tasks of the same
type as language tasks in the real world.
30. Discrete Point v/s Integrative
Assessment:
Indirect assessment is usually carried out through
a battery of many items, each one of which only
tests one small part of the language. Each item is
known as a discrete-point item. The theory is that
if there are enough of them, they give a good
indication of the learner’s underlying competence.
Thus, testers require items which test the ability
to combine knowledge of different parts of the
language, these items are known as integrative
or global. Ex: answering a letter, filling in a form,
etc.
31. Objective v/s Subjective Assessment:
Objective assessment refers to test items that
can be marked clearly as right or wrong, as in a
multiple choice item. Subjective assessment
requires that an assessor makes a judgement
according to some criteria and experience. Most
integrative test elements require subjective
assessment. The difficulty in subjective
assessment arises in trying to achieve some
agreement over marks, both between different
markers and with the same marker at different
times.
32. Receptive v/s Productive Skills:
The receptive skills (reading and listening)
tend themselves to objective marking. The
productive skills (speaking and writing) are
generally resistant to objective marking. So
third generation testers are placing great
emphasis on achieving a high degree of
standardisation between assessors
through training in the application of band
descriptions or rubrics.
33. Backward and Forward-looking Assessment:
Competence based tests look backwards
at a usage-based syllabus to see what
degree has been assimilated by the
learner. Third generation tests are better
linked to the future use of language
(looking forward), and their assessments
of real language use also show mastery of
a performance based syllabus.
34. Contextualised v/s Disembodied Language:
Disembodied language has little or no context.
This is more evident in items of multiple choice,
based on language usage. The items bear little
relevance to each other and act as examples of
disembodied language with no purpose other as
part of a test.
Integrative items need a full context in order to
function. The closer the items in an integrative
test are to simulating real world language tasks,
the fuller the context must be.
35. Criterion referenced and Norm Referenced
Assessment:
Norm-referenced tests compare students with an
average mark or a passing score, in order to
make some type of pass/fail judgement of them.
The problem with this type of testing is that it is
not clear what the norm refers to. To know that a
learner is a 4,0 in English and that is a pass, tells
us nothing of what he/she can actually do with
the language. The fact that a 3,9 student is part
of the “fail” ones and that he knows the same or
probably more than the 4,0 one is not taken into
account.
36. Criterion-referenced assessment
compares students not against each other,
but with success in performing a task. The
results of a criterion-referenced test can
be expressed by continuing the sentence
“he/she is able to…..” where the ability
may refer to some small or larger
integrative language task. Often these
tests lead to a profile of language ability,
where the learner is seen as capable of
completing certain tasks to the given
standards, but not others.
38. QUALITIES OF MEASUREMENT
DEVICES
• Validity
Does it measure what it is supposed to measure?
• Reliability
How representative is the measurement?
• Practicality
Is it easy to construct, administer, score and interpret?
• Backwash
What is the impact of the test on the teaching/learning
process?
39. VALIDITY
The term validity refers to whether or not a test
measures what it intends to measure.
On a test with high validity the items will be closely linked
to the test’s intended focus. For many certification and
licensure tests this means that the items will be highly
related to a specific job or occupation. If a test has poor
validity then it does not measure the job-related content
and competencies it ought to.
There are several ways to estimate the validity of a test,
including content validity, construct validity, criterion-
related validity (concurrent & predictive) and face validity.
40. VALIDITY
• Content”: related to objectives and their sampling.
• “Construct”: referring to the theory underlying the
target.
• “Criterion”: related to concrete criteria in the real
world. It can be concurrent or predictive.
• “Concurrent”: correlating high with another measure
already validated.
• “Predictive”: Capable of anticipating some later
measure.
• “Face”: related to the test overall appearance.
41. RELIABILITY
Reliability is the extent to which an experiment,
test, or any measuring procedure shows the same
result on repeated trials. Without the agreement
of independent observers able to replicate
research procedures, or the ability to use research
tools and procedures that produce consistent
measurements, researchers would be unable to
satisfactorily draw conclusions, formulate
theories, or make claims about the generalizability
of their research. For researchers, four key types
of reliability are:
42. RELIABILITY
• “Equivalency”: related to the co-occurrence of
two items
• “Stability”: related to time consistency
• “Internal”: related to the instruments
• “Inter-rater”: related to the examiners’
criterion
• “Intra-rater”: related to the examiners’
criterion
44. 2. STABILITY RELIABILITY
Stability reliability (sometimes called test, re-
test reliability) is the agreement of measuring
instruments over time.
45. 3. INTERNAL CONSISTENCY
Internal consistency is the extent to which tests or
procedures assess the same characteristic, skill or
quality.
46. 4. INTER-RATER RELIABILITY
Inter-rater reliability is the extent to which two or more
individuals (coders or raters) agree. Inter-rater reliability
assesses the consistency of how a measuring system is
implemented. For example, when two or more teachers use a
rating scale with which they are rating the students’ oral
responses in an interview (1 being most negative, 5 being
most positive). If one researcher gives a "1" to a student
response, while another researcher gives a "5," obviously
the inter-rater reliability would be inconsistent. Inter-
rater reliability is dependent upon the ability of two or
more individuals to be consistent. Training, education and
monitoring skills can enhance inter-rater reliability.
47. 4. INTRA-RATER RELIABILITY
Intra-rater reliability is a type of reliability
assessment in which the same assessment is completed
by the same rater on two or more occasions. These
different ratings are then compared, generally by
means of correlation. Since the same individual is
completing both assessments, the rater's subsequent
ratings are contaminated by knowledge of earlier
ratings.
48. SOURCES OF ERROR
• Examinee (is a human being)
• Examiner (is a human being)
• Examination (is designed by and for
human beings)
49. RELATIONSHIP BETWEEN VALIDITY &
RELIABILITY
Validity and reliability are closely
related.
A test cannot be considered valid
unless the measurements
resulting from it are reliable.
Likewise, results from a test can be
reliable and not necessarily valid.
50. PRACTICALITY
It refers to the economy of time, effort and
money in testing. In other words, a test should
be…
• Easy to design
• Easy to administer
• Easy to mark
• Easy to interpret (the results)
51. BACKWASH EFFECT
Backwash effect (also known as washback) is
the influence of testing on teaching and
learning. It is also the potential impact that the
form and content of a test may have on
learners’ conception of what is being assessed
(language proficiency) and what it involves.
Therefore, test designers, delivers and raters
have a particular responsibility, considering that
the testing process may have a substantial
impact, either positive or negative.
52. LEVELS OF BACKWASH
It is believed that backwash is a subset of a test’s
impact on society, educational systems and
individuals. Thus, test impact operates at two levels:
• The micro level (the effect of the test on individual
students and teachers)
• The macro level (the impact of the test on society
and the educational system)
Bachman and Palmer (1996)
53. Desirable characteristics for tests:
Apart from validity and reliability, we have
three extra characteristics to pay attention
to:
1) UTILITY
2) DISCRIMINATION
3) PRACTICALITY
54. Utility:a test which provides a lot of
feedback to assist in the planning of the
rest of a course or future courses.
Discrimination: the ability of a test to
discriminate between stronger and weaker
students.
Practicality: the efficiency of the test in
physical terms. (Does it require a lot of
equipment? Does it take a lot of time to
set, administer or mark?)
55. Students’ Assignment:
Make an assessment of the test given by
your teacher (Appendix 3.1) by answering
the following questions.
1) Does it test performance or
competence?
2) Does it ask for language use or usage?
3) Is it direct or indirect testing?
4) Is it discrete point or integrative testing?
5) Is it objectively or subjectively marked?
6) What skills does it test?
7) Is it backward or forward looking?
56. 8) Is language contextualised or
disembodied?
9) Is it criterion-referenced or norm-
referenced?
10) Would it have low/high reliability?
11) Comment on its validity.
12) Comment on its utility, discrimination
and practicality.
58. Formative Assessment
Formative assessments are assessments
that are given during a sequence of
instruction.
The goals of formative assessments are to…
2. provide feedback to the teacher so that s/he can modify
instruction to meet the needs of the learner.
(remediation/extension)
1. provide feedback to the learner so that s/he can set
personal goals and work towards success.
59. Summative Assessment
2. provide feedback to the teacher so that s/he knows how
effective his/her instruction was.
Summative assessments are assessments
that are given at the end of a sequence of
instruction. They are primarily used to assign
grades, or levels of accomplishment
The goals of summative assessments are to…
1. provide information about the student’s level of
success with the objectives/standards s/he was taught.
60. Criterion-Referenced
The goal of criterion-referenced assessments is to
provide information about a learner’s level of mastery
of objectives and standards in a given subject area.
Criterion-Referenced assessments are
assessments that are given to determine a
student’s success based on his/her
knowledge and mastery of standards and
objectives.
61. Norm-Referenced
The goal of norm-referenced assessments is to…
Compare a student’s level of success against that of
other students in his/her age/grade level, or “norm”
group.
Norm-Referenced assessments are
assessments that are given to determine a
student’s level of mastery with objectives and
standards as compared to others in
his/her norm group..
62. Screening Assessments
The goals of screening assessments are to…
1. Provide information about general areas of student
weaknesses and strengths .
2. provide feedback to the teacher so that s/he can
modify instruction to meet the needs of the learner.
Screening assessments are assessments
that are given to assess the performance
level of all students in relation to the
Standards.
63. Diagnostic Assessments
The goals of diagnostic assessments are to…
1. Provide information about specific areas of student
weaknesses and strengths .
2. provide feedback to the teacher so that s/he can
modify instruction to meet the needs of the learner.
Diagnostic assessments are assessments
that are given to identify the specific area(s)
of need for a struggling student identified by
the screening instrument.
64. Progress-Monitoring
The goals of progress-monitoring assessments are to…
1. Provide information about student progress towards
achieving mastery of Standards and Benchmarks.
2. provide feedback to the teacher so that s/he can
modify instruction to meet the needs of the learner.
Progress-Monitoring assessments are
assessments that are administered throughout
the year to measure the progress of all
students towards achieving the Standards and
Benchmarks.
65. The Purposes of Assessment
• To influence public perceptions of
educational effectiveness
• To help evaluate teachers
• To improve instructional intentions
from Popham’s Classroom Assessment: What Teachers Need to Know
Three New Roles for Assessment in Today’s
Education
66. Assessment “Rumors”
• Rumor: assessment is all about
test-giving
– Truth: test-giving is only one of many,
many
ways to assess learning
• Rumor: assessment is always
formal
– Truth: insight about a student’s
learning can
be found anytime you look for it
• Rumor: assessment is separate
from Tomlinson’s “Learning to Love Assessment”
67. Assessment “Rumors” con’t.
• Rumor: assessment is “the end”
for a teacher
– Truth: studying assessment (what
worked,
what didn’t) is the beginning of better
instruction
• Rumor: assessment is about
finding weaknesses
– Truth: assessment can be used to
also
accentuate student positives
from Tomlinson’s “Learning to Love Assessment”
68. Formative Assessment
from Chappuis and Chappuis, “The Best Value in Formative Assessment”
Where am I going?
Discuss with students their learning targets,
written in student-friendly language
Show students strong and weak examples
of the type of performance or product they
are expected to create
Use Rubrics!
for rubric help: www.rubistar.org; www.rubrics.com
69. Formative Assessment
from Chappuis and Chappuis, “The Best Value in Formative Assessment”
Where am I now?
Administer a non-graded quiz partway
through the learning to help understand
who needs to work on what
Have students identify their own strengths
and weaknesses
Have students keep a list of the learning
goals and check off the ones they have
mastered
70. Formative Assessment
from Chappuis and Chappuis, “The Best Value in Formative Assessment”
How can I close the gap?
Give students feedback
Have students graph or describe their
progress on learning goals
Ask students to comment on their progress:
What changes have you noticed? What is
easy that used to be hard?
71. Formative Assessment
More Examples of Formative
Assessment in the Classroom
Use classroom discussion
Ask students reflective, thought-provoking questions
Think-pair-share
Have students write their understanding of a concept
before and after instruction
Ask students to summarize the main ideas they’ve
taken away from a lecture or assigned reading
Interview students individually or in small groups
Assign brief, in-class writing assignments
Use portfolios or collections of student work
from Boston’s “The Concept of Formative Assessment”
72. Formative Assessment
from Chappuis and Chappuis, “The Best Value in Formative Assessment”
What is Feedback?
Feedback offers descriptive information about the work
a student is doing.
Feedback avoids marks or comments that judge or
grade.
Effective feedback focuses on the intended learning,
identifying both strengths and areas for improvement.
Feedback models the type of thinking students should
engage in when they self-assess.
73. Formative Assessment
from Chappuis and Chappuis, “The Best Value in Formative Assessment”
Examples of Feedback
“you have interpreted the
bar graph correctly, but
you need to make sure the
marks on the x and y axes
are placed at equal
intervals.”
“the good stories we have
been reading have a
beginning, middle, and
end. I see that your story
has a beginning and a
middle. Can you draw and
write an ending?”
“what you have written is an hypothesis because it is a
proposed explanation. You can improve it by writing is as an
‘if…then…’ statement.”
74. Formative Assessment
from Chappuis and Chappuis, “The Best Value in Formative Assessment”
Advantages of Formative Assessment
The timeliness of results enables teachers
to adjust instruction quickly, while learning
is still in progress.
The students who are assessed are the
ones who benefit from the adjustments.
The students can use the results to adjust
and improve their own learning.
75. Formative Assessment
from Guskey’s “The Rest of the Story”
Using the Results of Formative Assessment
Using formative assessments and giving feedback to students is
very important, but equally important is what happens after the
assessments. How will teachers and students use the
results?
The answer: Teachers must plan and use corrective activities.
Effective corrective activities give students an alternative
pathway to learning success.
Corrective activities will present the concept differently, will
engage the student differently, and will provide the student with a
successful learning experience.
76. Formative Assessment
from Guskey’s “The Rest of the Story”
Types of Corrective Activities
Reteaching
Individual tutoring
Peer tutoring
Cooperative teams
Textbooks
Alternative textbooks
Alternative materials, study guide, workbooks
Academic games
Learning kits
Computer activities
Learning centers
77. Reflective Assessment
from Ellis’ Teaching, Learning, and Assessment Together: The Reflective
Classroom
What is reflective assessment?
Reflective assessments will ask students to do more
than just repeat information. Reflective
assessments want truth, meaning, purpose, and
utility.
Reflective assessments will allow students to go
deeper into the meaning of their learning. This type
of assessment is created to help students figure out
what has meaning and why.
78. Reflective Assessment
Class 1: They excitedly entered the
museum running. After some time,
this same group came back to the
front, still at high speed. They put
their coats on left the building. One
child exclaimed that she saw every
exhibit in the museum! Obviously,
they were in a hurry, but they did
manage to “cover it all.”
Class 2: They and their teacher are
in no hurry. They all gathered
around several exhibits of ancient
pottery. Each child had a sketch
pad and pencil. They took careful
drawings of what they viewed. This
group of students did not manage to
see every exhibit at the museum.
A Field Trip to the Museum…
Which class is the reflective classroom?
“All experiences teach us something, but only experiences of
quality teach us something worthwhile.”
from Ellis’ Teaching, Learning, and Assessment Together: The Reflective
Classroom
79. Reflective Assessment
from Ellis’ Teaching, Learning, and Assessment Together: The Reflective
Classroom
Reflective Assessment Strategies
Strategy Activity
I Learned Statements Statements of personal learning during
closure of lesson
Clear and Unclear Windows Students sort what is and is not clear at
the time
The Week in Review Students assess the week’s activities
Pyramid Rehearsal through gradually increasing
group size
Talk About It Articulate learning out loud to self or
another
Learning Illustrated Translate understanding into visual
representation
80. Reflective Assessment
from Ellis’ Teaching, Learning, and Assessment Together: The Reflective
Classroom
Reflective Assessment Strategies, con’t
Strategy Activity
Questioning Author Students construct questions about
the content and skills
Post It Up Students post their understanding of
the main point
I Can Teach Extending understanding through
teaching
Thank You Specifically acknowledge the influence
of another person
Parents on Board Invite parents to help
Get a Job Make a real world connection to an
actual work experience
81. Think about…
1. What kind of decisions do you
make based on the results of
students test performance?
82. Think about…
2. Do you tell students, before they take
a test, how you will judge their test
performance?. If not why not? How do
you report students’ test results, and
how do you use those results?
83. Think about…
3. Have you ever had a student or a
group of students who just couldn’t do
well on tests? What is your explanation of
their difficulty? What did you or could you
do about it?
84. Testing
•Testing is one component in the evaluation
process which provides information to make
decisions.
•The test should be about:
1. What has been done?
2. How has it been done ?– performance
3. What should I do with the results?