CLASSROOM
ASSESSMENT
Assessment for 21st Century Learning
Types of Assessment

Formative

Summative

• Occurs during instruction

• Occurs after instruction

• Not graded

• Graded

• Designed to provide

• Designed to provide

information needed to
adjust teaching and
learning while they are
still occurring
• Assessment FOR
Learning

information about the
amount of learning that
has occurred at a
particular point
• Assessment OF Learning
Formative
• On-going (usually daily)
• Multiple opportunities to reach the criteria
• Allows for practice and improvement
• Formative assessments point out areas of incomplete

learning to students and teachers
• Formative assessments give teachers time to adapt their
instruction
• Can be informal (observations, verbal interactions) or
formal (work samples, paper and pencil tasks)
Summative
• Formal (rubric or scoring sheet)
• Meaningful descriptive feedback is important
• Provide a means of determining what has been learned of
•
•
•
•

the purposes of reporting
Does not provide an opportunity to correct or improve
performance
Provide numbers for statistical review of achievement
Must be reliable and valid
Examples: State assessments, interim assessments, end
of unit assessments,
en
ts
as

da
ily

Un
g

ra
d

ed

eg
Th

0%

se
ss
m

io
n
ua
t
ra
d

of
-c
En
d-

Response
Counter

0%
ex
am

CT

eA

Th

ha

pt
e

rt

es
t

Which of the following is an example of
a formative assessment?
1. End-of-chapter
test
2. The ACT
3. The graduation
exam
0% 0%
4. Ungraded daily
assessments
Exit Slips are written responses to questions
the teacher poses at the end of a lesson or a
class to assess student understanding of key
concepts. This is an example of a summative
assessment.
1. True
2. False

se

0%
Fa
l

Tr

Response
Counter

ue

0%
Types of Summative Assessments
Norm-Referenced Tests

Criterion-Referenced Test

• Made to compare test

• Intended to measure how

takers to each other.
• Most appropriate when
one wishes to make
comparisons across large
numbers of students or
important decisions
regarding student
placement and
advancement.

well a person has learned
a specific body of
knowledge and skills
• Most appropriate for
quickly assessing what
concepts and skills
students have learned
from a segment of
instruction.
Norm-Referenced Tests
• determine individual performance in comparison to others;

standardized, comparisons among people
• it is inappropriate to use NRTs to determine the

effectiveness of educational programs and to provide
diagnostic information for individual students
• items cover a broad range of content and often represent

a mismatch between what is taught locally and what is
taught in other states
Criterion-Referenced Tests
• determine individual performance in comparison to some

standard or criterion
• items based on standards given to students (i.e.,

objectives); most students should answer correctly
The SAT, a college entrance exam, compares individual student
performance to the performance of a sample of students. The
SAT is what type of test?

1. Norm-Referenced
2. Criterion-

Referenced
0%

nRe
Cr
it

er
io

-R
No
rm

Response
Counter

fe

ef
er

re
n

ce

en
ce

d

d

0%
The goal of the driving test is to see whether the test taker is
skilled enough to be granted a driver's license, not to see whether
one test taker is more skilled than another test taker. This type of
test is called what?

1. Norm-Referenced
2. Criterion-

Referenced
0%

nRe
Cr
it

er
io

-R
No
rm

Response
Counter

fe

ef
er

re
n

ce

en
ce

d

d

0%
Concerns with Summative Assessment

Reliability
• The consistency of the

test as a measurement
• Same results, time
and again

Validity
• How well a test

measures what it says
it measures
Reliability
• Another way to think of reliability is to imagine a kitchen

scale. If you weigh five pounds of potatoes in the morning,
and the scale is reliable, the same scale should register
five pounds for the potatoes an hour later.
• Likewise, instruments such as classroom tests and
national standardized exams should be reliable – it should
not make any difference whether a student takes the
assessment in the morning or afternoon; one day or the
next.
A test designed to assess student learning in math class
is given to a group of students twice, with the second
administration coming a week after the first. The test
would be considered reliable if …
1. Students scored

0%
...

th
e

gh
e

sa
m

ro

...
ro
gh
e

co
re
ts
s
St
ud
en

St
ud
en

ts
s

co
re

d

d

hi
d
co
re
ts
s
St
ud
en

Response
Counter

0%
.. .

0%

hi

higher on the first
exam than the second.
2. Students scored the
same on both exams.
3. Students scored
higher on the second
exam than the first.
Validity
• Refers to the accuracy of an assessment -- whether or not

it measures what it is supposed to measure.
• Even if a test is reliable, it may not provide a valid
measure.
Validity
• Let’s imagine a bathroom scale that consistently tells you

that you weigh 130 pounds. The reliability (consistency) of
this scale is very good, but it is not accurate (valid)
because you actually weigh 145 pounds (perhaps you reset the scale in a weak moment)!
• Since teachers, parents, and school districts make
decisions about students based on assessments (such as
grades, promotions, and graduation), the validity inferred
from the assessments is essential -- even more crucial
than the reliability.
A history teacher designs a unit assessment. The
questions are written with complicated wording and
phrasing. Which of the following is true?

u.
.

0%

..
Th

is
t

es
ti

ca
be
e,

sr
el

ia

bl

id
,b
sv
al

es
ti

es
ti
is
t

Th

0%

ec
au
se
.

bl
re
lia
sN
OT

va
l
sN
OT
es
ti
is
t
Th

Response
Counter

0%
e,
...

0%
id
,b

4.

is
t

3.

Th

2.

This test is NOT valid, because
the test could be one of reading
comprehension rather than
history.
This test is NOT reliable, because
students will do poorly due to the
complicated wording.
This test is valid, because the
students should be able to
comprehend the question wording
regardless of its level.
This test is reliable, because all
students will do poorly, regardless
of their knowledge of history.

e.
..

1.
Evaluation of Summative Assessments
Mean

Definition:

Applicability:

Limitations

Median

the middle number
arithmetic average of a
when the set is sorted
set of numbers
in numerical order.
The mean is used
for normal
distributions.

The median is
generally used for
skewed distributions.

largely influenced by
outliers

better suited for
skewed distributions
Data Distribution
• Normal Distribution
• mean = median = mode
• Symmetry about the center
• 50% of values less than the mean and 50% greater than the mean
• Bell curve
Data Distribution
• Negative Skew
• The long "tail" is on the
negative side of the
peak.
• skewed to the left

• Positive skew
• the long tail is on the
positive side of the
peak
• skewed to the right
Other Evaluations of Summative
Assessments
• Standard Deviation
• Square root of variance in scores (how the scores are arranged
around the average score)
• High SD means more space between scores, low SD means
clustered scores
Which graph has a higher standard
deviation?
1. Red
2. Blue
Other Evaluations of Summative
Assessments
• Percentile
• The value below which a certain percent of values fall
A student scored in 74% on her ACT
exam. Which of the following is true?
1. She answered 74% of

he
r

eq
as

on

ew
Sh
e

go
t

a

C

co
r
Ts
AC
He
r

0%
.. .

ua
l

e.
..
th
of
74
%
er
ed
an
sw
Sh
e

Response
Counter

0%
.. .

0%

AC
T

the test questions
correctly.
2. Her ACT score was
equal to or better than
74% of students
taking the ACT exam
3. She got a C on her
ACT exam.
What it all means…
• Is the assessment measuring what we want it to?
• What are the instructional decisions to be made based on

this assessment information?
• How will those changes be made?
The use of clickers in this presentation is a
form of what kind of assessment?
1. Summative

2. Formative

m

at
iv

e

0%

Fo
r

m
at
Su
m

Response
Counter

ive

0%

ESE444/544 - Types of Assessment

  • 1.
  • 2.
    Types of Assessment Formative Summative •Occurs during instruction • Occurs after instruction • Not graded • Graded • Designed to provide • Designed to provide information needed to adjust teaching and learning while they are still occurring • Assessment FOR Learning information about the amount of learning that has occurred at a particular point • Assessment OF Learning
  • 3.
    Formative • On-going (usuallydaily) • Multiple opportunities to reach the criteria • Allows for practice and improvement • Formative assessments point out areas of incomplete learning to students and teachers • Formative assessments give teachers time to adapt their instruction • Can be informal (observations, verbal interactions) or formal (work samples, paper and pencil tasks)
  • 4.
    Summative • Formal (rubricor scoring sheet) • Meaningful descriptive feedback is important • Provide a means of determining what has been learned of • • • • the purposes of reporting Does not provide an opportunity to correct or improve performance Provide numbers for statistical review of achievement Must be reliable and valid Examples: State assessments, interim assessments, end of unit assessments,
  • 5.
    en ts as da ily Un g ra d ed eg Th 0% se ss m io n ua t ra d of -c En d- Response Counter 0% ex am CT eA Th ha pt e rt es t Which of thefollowing is an example of a formative assessment? 1. End-of-chapter test 2. The ACT 3. The graduation exam 0% 0% 4. Ungraded daily assessments
  • 6.
    Exit Slips arewritten responses to questions the teacher poses at the end of a lesson or a class to assess student understanding of key concepts. This is an example of a summative assessment. 1. True 2. False se 0% Fa l Tr Response Counter ue 0%
  • 7.
    Types of SummativeAssessments Norm-Referenced Tests Criterion-Referenced Test • Made to compare test • Intended to measure how takers to each other. • Most appropriate when one wishes to make comparisons across large numbers of students or important decisions regarding student placement and advancement. well a person has learned a specific body of knowledge and skills • Most appropriate for quickly assessing what concepts and skills students have learned from a segment of instruction.
  • 8.
    Norm-Referenced Tests • determineindividual performance in comparison to others; standardized, comparisons among people • it is inappropriate to use NRTs to determine the effectiveness of educational programs and to provide diagnostic information for individual students • items cover a broad range of content and often represent a mismatch between what is taught locally and what is taught in other states
  • 9.
    Criterion-Referenced Tests • determineindividual performance in comparison to some standard or criterion • items based on standards given to students (i.e., objectives); most students should answer correctly
  • 10.
    The SAT, acollege entrance exam, compares individual student performance to the performance of a sample of students. The SAT is what type of test? 1. Norm-Referenced 2. Criterion- Referenced 0% nRe Cr it er io -R No rm Response Counter fe ef er re n ce en ce d d 0%
  • 11.
    The goal ofthe driving test is to see whether the test taker is skilled enough to be granted a driver's license, not to see whether one test taker is more skilled than another test taker. This type of test is called what? 1. Norm-Referenced 2. Criterion- Referenced 0% nRe Cr it er io -R No rm Response Counter fe ef er re n ce en ce d d 0%
  • 12.
    Concerns with SummativeAssessment Reliability • The consistency of the test as a measurement • Same results, time and again Validity • How well a test measures what it says it measures
  • 13.
    Reliability • Another wayto think of reliability is to imagine a kitchen scale. If you weigh five pounds of potatoes in the morning, and the scale is reliable, the same scale should register five pounds for the potatoes an hour later. • Likewise, instruments such as classroom tests and national standardized exams should be reliable – it should not make any difference whether a student takes the assessment in the morning or afternoon; one day or the next.
  • 14.
    A test designedto assess student learning in math class is given to a group of students twice, with the second administration coming a week after the first. The test would be considered reliable if … 1. Students scored 0% ... th e gh e sa m ro ... ro gh e co re ts s St ud en St ud en ts s co re d d hi d co re ts s St ud en Response Counter 0% .. . 0% hi higher on the first exam than the second. 2. Students scored the same on both exams. 3. Students scored higher on the second exam than the first.
  • 15.
    Validity • Refers tothe accuracy of an assessment -- whether or not it measures what it is supposed to measure. • Even if a test is reliable, it may not provide a valid measure.
  • 16.
    Validity • Let’s imaginea bathroom scale that consistently tells you that you weigh 130 pounds. The reliability (consistency) of this scale is very good, but it is not accurate (valid) because you actually weigh 145 pounds (perhaps you reset the scale in a weak moment)! • Since teachers, parents, and school districts make decisions about students based on assessments (such as grades, promotions, and graduation), the validity inferred from the assessments is essential -- even more crucial than the reliability.
  • 17.
    A history teacherdesigns a unit assessment. The questions are written with complicated wording and phrasing. Which of the following is true? u. . 0% .. Th is t es ti ca be e, sr el ia bl id ,b sv al es ti es ti is t Th 0% ec au se . bl re lia sN OT va l sN OT es ti is t Th Response Counter 0% e, ... 0% id ,b 4. is t 3. Th 2. This test is NOT valid, because the test could be one of reading comprehension rather than history. This test is NOT reliable, because students will do poorly due to the complicated wording. This test is valid, because the students should be able to comprehend the question wording regardless of its level. This test is reliable, because all students will do poorly, regardless of their knowledge of history. e. .. 1.
  • 18.
    Evaluation of SummativeAssessments Mean Definition: Applicability: Limitations Median the middle number arithmetic average of a when the set is sorted set of numbers in numerical order. The mean is used for normal distributions. The median is generally used for skewed distributions. largely influenced by outliers better suited for skewed distributions
  • 19.
    Data Distribution • NormalDistribution • mean = median = mode • Symmetry about the center • 50% of values less than the mean and 50% greater than the mean • Bell curve
  • 20.
    Data Distribution • NegativeSkew • The long "tail" is on the negative side of the peak. • skewed to the left • Positive skew • the long tail is on the positive side of the peak • skewed to the right
  • 21.
    Other Evaluations ofSummative Assessments • Standard Deviation • Square root of variance in scores (how the scores are arranged around the average score) • High SD means more space between scores, low SD means clustered scores
  • 22.
    Which graph hasa higher standard deviation? 1. Red 2. Blue
  • 23.
    Other Evaluations ofSummative Assessments • Percentile • The value below which a certain percent of values fall
  • 24.
    A student scoredin 74% on her ACT exam. Which of the following is true? 1. She answered 74% of he r eq as on ew Sh e go t a C co r Ts AC He r 0% .. . ua l e. .. th of 74 % er ed an sw Sh e Response Counter 0% .. . 0% AC T the test questions correctly. 2. Her ACT score was equal to or better than 74% of students taking the ACT exam 3. She got a C on her ACT exam.
  • 25.
    What it allmeans… • Is the assessment measuring what we want it to? • What are the instructional decisions to be made based on this assessment information? • How will those changes be made?
  • 26.
    The use ofclickers in this presentation is a form of what kind of assessment? 1. Summative 2. Formative m at iv e 0% Fo r m at Su m Response Counter ive 0%

Editor's Notes

  • #17 http://fcit.usf.edu/assessment/basic/basicc.html